ICSI431/ICSI531 Data Mining

Instructor Feng Chen
Office LI-96J
Number (518) 442-4270
Email fchen5@albany.edu
Office Hour
Monday: 4:15PM to 5:15PM
Wednesday: 11:00AM to 12:00PM

TA
Yizhen Chen
Office CS lounge
Number  
Email jonkiky@gmail.com
Office Hour
Tuesday 10:00AM T0 12:00PM
Thursday 10:00AM T0 12:00PM

TA
Rui Wang
Office CS lounge
Number  
Email rwang3@albany.edu
Office Hour
Thursday 4:45PM to 6:45pm
Saturday 12:00PM to 2:00PM

TA
Chunpai Wang
Office CS 095L
Number  
Email cwang25@albany.edu
Office Hour
Tuesday 11:45AM to 13:45PM
Friday 2:00PM to 4:00PM

TA
Yuhan Zhang
Office CS lounge
Number  
Email yzhang38@albany.edu
Office Hour
Monday 12:40PM to 2:40PM
Wednesday 12:40PM to 2:40PM

Class Time and Location MoWe 2:45PM - 4:05PM, LC 19
Class Website http://www.cs.albany.edu/~fchen/course/2016-ICSI-431-531/

Course Postings

Course Description:

A course on data mining (finding patterns in data) algorithms and their application to interesting data types and situations. We cover algorithms that addresses the five core data mining tasks: prediction, classification, regression, clustering, and associations. Course projects will involve advanced topics such as algorithm developments for handling large data sets, sequential, spatial, and streaming data. Prerequisite(s): A Csi 310.

TextBook

Data Mining, The Textbook
Charu C. Aggarwal
Springer, 2005
ISBN: 978-3-319-14141-1
Introduction to Data Mining
Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Addison-Wesley, 2005
ISBN-10: 0321321367
ISBN-13: 978-0321321367

Course Description:

The schedule indicates the concepts and material to be covered in each week under the column labeled "Topics". Each topci with "*" mark will be presented by a six- member team.

Week Date Lecture Topics Presentation Read Due (To be announced)
1 1/20 Introduction   Ch1  
2 1/25 Introduction   Ch1  
1/27 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3  
3 2/1 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3
2/3 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3  
4 2/8 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3
2/10 Exploring Data Ch2  
5 2/15 Exploring Data Ch3
2/17 Exploring Data Ch3  
6 2/22 Classification - Introduction - Decision Tree   Ch4  
2/24 Classification - Support Vector Machines Ch4
7 2/29 Classification - Support Vector Machines - Continued Ch4, Ch5
3/2 Classification - Support Vector Machines - Continued   Ch4, Ch5
8 3/7 Classification - Support Vector Machines - Continued Ch4, Ch5  
3/9 Classification - Support Vector Machines - Continued   Ch4, Ch5
9 3/14 Spring Break      
3/16 Spring Break      
10 3/21 Clustering - K-means Ch8, Ch9
3/23 Clustering - Cluster Analysis Ch8, Ch9
11 3/28 Midterm Exam (Concepts, Close Book)      
3/30 Clustering - Cluster Analysis - Continued;
Course Project Discussion;
Ch8, Ch9  
4/4 Clustering - Cluster Analysis - Continued;
Course Project Discussion;
Ch8, Ch9
12 4/6 Association Rule Mining: Support, Confidence Ch4, Ch5
4/11 Association Rule Mining - Continued   Ch4, Ch5
13 4/13 Recommendaiton System   Reference materials on blackboard
4/18 Recommendaiton System - Continued   Reference materials on blackboard
14 4/20 Sequential Data: Markov Model Reference materials on blackboard
4/25 Sequential Data: Markov Model Reference materials on blackboard
15 4/27 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
5/2 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard
16 5/4 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
17 5/13 Project Presentation    

Course Project Requirement

Course Project teams:

to be announced

References for Lecture Topics:

1. Decision Tree

[1] Decision Tree Lecture Slides: http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_classification.ppt (http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_classification.pdf)

[2] Decision Tree 7 minutes tutorial video: https://www.youtube.com/watch?v=a5yWr1hr6QY

 

2. Logistic Regression

[1] Machine Learning with Python - Logistic Regression: http://aimotion.blogspot.com/2011/11/machine-learning-with-python-logistic.html

[2] A Tutorial in Logistic Regression: http://www.statpt.com/logistic/demaris_1995.pdf

 

Examinations and Assignments:

There are around 12 homework assignments. Homework assignments are due at the start of class. If you have an excused absence from a class, turn in the homework assignment prior to the class session. All assignments must have your name, student ID and course name/ number. 

Late Submission Policy: 

Assignments must be submitted before the class on the specified due date (Monday of designated week). A penalty of 30% will be deducted from your score for the first 24-hour period if your assignment is late. A penalty of 70% will be deducted from your score for >= 24-hour period. Assignments submitted more than 3 days late will not be assessed and will score as a zero (0). Weekend days will be counted. For assignments, you are encouraged to type your answers. 

Policy on Cheating: 

Cheating in an exam will result in an E grade for the course. Further, the students involved will be referred to the Dean's o ce for disciplinary action.

Homework problems are meant to be individual exercises; you must do these by yourself. Any of the following actions will be considered as cheating.

  1. A solution which is identical to or nearly identical to the solution submitted by another student in the class
  2. A solution which is identical to or nearly identical to the solution provided by the instructor in a previous o ering of CSI 431/531
  3. A solution which is identical to or nearly identical to a solution available on the Internet.

Cheating in a homework exercise will result in the following penalty for all the students involved.

  1. The homework in which cheating occurred will be assigned a grade of ZERO.
  2. The homework in which cheating occurred will be assigned a grade of ZERO.

Students who cheat in two or more homeworks will receive an E grade for the course. The names of such students will also be forwarded to the Dean's oce for disciplinary action.

Attendance:

Class attendance is required and checked. Each case of missing class without a proper explanation will cause 20% less from your final numerical grade. If you miss a class, it is your responsibility to find out the material covered in the class. There will absolutely no makeup classes. Only in specific, unavoidable situations students are allowed to excuse absences from class: 1) personal emergencies, including, but not limited to, illness of the student or of a dependent of the student, or death in the family [Require doctor's note]; 2) religious observances that prevent the student from attending class; 3) participation in University-sponsored activities, approved by the appropriate University authority, such as intercollegiate athletic competitions, activities approved by academic units, including artistic performances, academic field trips, and special events connected with coursework; 4) government-required activities, such as military assignments, jury duty, or court appearances; and 5) any other absence that the professor approves.  

Grading:

Homework Assignments : 35% | Exam: 30% | Presentation: 5% | Final Project (3-member team): 25% | Class Discussion and Participation: 5%


Course Project Groups:

Group 1- 5 TA: Yizhen Chen
Group 6-10 TA: Rui Wang
Group 11-15 TA: Yuhan Zhang
Group 16-22 TA: Chunpai Wang
Group Name Members Project Title Presentation Date
1 Erkang Xie (UG),
Ziyun Zeng (UG)
2 Mounika Ryakala (Grad),
Yutthana Srisakunkhunakorn (Grad),
Manish Chandra (Grad)
3 Navita Jain (Grad)
4 Amit Pal Singh (Grad),
Navodit Ranjan (Grad),
Maxx Sawyer (UG)
5 Komal Narwekar (Grad),
Shilpa Ramesh (Grad)
6 Pooja Patel (Grad),
Dhaval Lad (Grad),
Andrew Desbiens (UG)
7 Arun Sharma (Grad),
Vimalkumar Chellam (Grad)
8 Kevin Smith (Grad),
James Martine (UG),
Zane Coonrad (UG)
9 Ashish Yeshwant Jadhav (Grad),
Ravikiran Pathade (Grad),
Marcus Seixas (UG)
10 Abhishek Gupta (Grad),
Manas Gaur (Grad),
Rafael Leitao Oliveira (UG)
11 Shanmugar Rathinasamy Mariappan (Grad),
Varun Chandrasekar (Grad),
Michael Seredensky (UG)
12 Siqi Wang (UG),
Tianqi Zhao (UG),
Xiaojun Feng (UG)
13 Sayali Thorat (Grad),
Steven Cifareli (UG),
Brian Ethier (UG)
14 Ashish Agarwala (Grad),
Meley Kifleyesus (Grad),
Zachary Carciu (UG)
15 Neel Patel (Grad),
Nisarg Shah (Grad),
Julia Turner(UG)
16 Abhiram Mocharla (Grad),
Shivam Awasthi (Grad),
Priyanka Dagar (UG)
17 Aatman Togadia (Grad),
Smit Shilu (Grad),
Akanksha Atrey (UG)
18 Estesham Ahmed Quadri Syed (Grad),
Lokesh Rishi (Grad),
James Pica (UG)
19 Chen Zhao (Grad),
Andrew Janucik (Grad),
Anthony Cochetti (UG)
20 Jamie Lee (Grad),
Randolf DeSouto (UG),
William Thomas (UG)
21 Nathaniel Gottschalt (UG),
Reena Sharma (Grad),
Garikapati Geethika (Grad)
22 Ananya Subburathinam (Grad),
Namrata Galatage (Grad),
Chandana Ravella (Grad)