ICSI431/ICSI531 Data Mining

Instructor Feng Chen
Office UAB-426
Number 518-442-2602
Email fchen5@albany.edu
Office Hour
Monday: 4:15PM to 5:15PM
Wednesday: 11:00AM to 12:00PM

TA
Sayali Thorat
Office UAB433
Number  
Email sthorat@albany.edu
Office Hours
Mon. & Wed. 10:30am-12:30pm

TA
Yun Wang
Office UAB419
Number  
Email ywang23@albany.edu
Office Hour
Mon. & Wed 6:00pm - 8:00pm

TA
Siqian Zhao
Office UAB433
Number  
Email szhao2@albany.edu
Office Hour
Tue. & Thur. 9:00am - 11:00am

TA
Chunpai Wang
Office UAB433
Number  
Email cwang25@albany.edu
Office Hour
Tue & Thur. 12:00pm - 2:00pm

TA
Mansi N Patel
Office UAB433
Number  
Email mnpatel@albany.edu
Office Hour
Tue. & Thur. 3:15pm - 5:15pm

Class Time and Location MoWe 2:45PM - 4:05PM, ES 241
Class Website http://www.cs.albany.edu/~fchen/course/2017-ICSI-431-531/

Course Postings

Course Description:

A course on data mining (finding patterns in data) algorithms and their application to interesting data types and situations. We cover algorithms that addresses the five core data mining tasks: prediction, classification, regression, clustering, and associations. Course projects will involve advanced topics such as algorithm developments for handling large data sets, sequential, spatial, and streaming data. Prerequisite(s): A Csi 310.

TextBook

Data Mining, The Textbook
Charu C. Aggarwal
Springer, 2005
ISBN: 978-3-319-14141-1
Introduction to Data Mining
Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Addison-Wesley, 2005
ISBN-10: 0321321367
ISBN-13: 978-0321321367

Course Description:

The schedule indicates the concepts and material to be covered in each week under the column labeled "Topics". Each topci with "*" mark will be presented by a six- member team.

Week Date Lecture Topics Presentation Read Due (To be announced)
1 1/20 Introduction   Ch1  
2 1/25 Introduction   Ch1  
1/27 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3  
3 2/1 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3
2/3 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3  
4 2/8 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3
2/10 Exploring Data Ch2  
5 2/15 Exploring Data Ch3
2/17 Exploring Data Ch3  
6 2/22 Classification - Introduction - Decision Tree   Ch4  
2/24 Classification - Support Vector Machines Ch4
7 2/29 Classification - Support Vector Machines - Continued Ch4, Ch5
3/2 Classification - Support Vector Machines - Continued   Ch4, Ch5
8 3/7 Classification - Support Vector Machines - Continued Ch4, Ch5  
3/9 Classification - Support Vector Machines - Continued   Ch4, Ch5
9 3/14 Spring Break      
3/16 Spring Break      
10 3/21 Clustering - K-means Ch8, Ch9
3/23 Clustering - Cluster Analysis Ch8, Ch9
11 3/28 Midterm Exam (Concepts, Close Book)      
3/30 Clustering - Cluster Analysis - Continued;
Course Project Discussion;
Ch8, Ch9  
4/4 Clustering - Cluster Analysis - Continued;
Course Project Discussion;
Ch8, Ch9
12 4/6 Association Rule Mining: Support, Confidence Ch4, Ch5
4/11 Association Rule Mining - Continued   Ch4, Ch5
13 4/13 Recommendaiton System   Reference materials on blackboard
4/18 Recommendaiton System - Continued   Reference materials on blackboard
14 4/20 Sequential Data: Markov Model Reference materials on blackboard
4/25 Sequential Data: Markov Model Reference materials on blackboard
15 4/27 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
5/2 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard
16 5/4 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
17 5/13 Project Presentation    

Course Project Requirement

Course Project teams:

to be announced

References for Lecture Topics:

1. Decision Tree

[1] Decision Tree Lecture Slides: http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_classification.ppt (http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_classification.pdf)

[2] Decision Tree 7 minutes tutorial video: https://www.youtube.com/watch?v=a5yWr1hr6QY

 

2. Logistic Regression

[1] Machine Learning with Python - Logistic Regression: http://aimotion.blogspot.com/2011/11/machine-learning-with-python-logistic.html

[2] A Tutorial in Logistic Regression: http://www.statpt.com/logistic/demaris_1995.pdf

 

Examinations and Assignments:

There are around 12 homework assignments. Homework assignments are due at the start of class. If you have an excused absence from a class, turn in the homework assignment prior to the class session. All assignments must have your name, student ID and course name/ number. 

Late Submission Policy: 

Assignments must be submitted before the class on the specified due date (Monday of designated week). A penalty of 30% will be deducted from your score for the first 24-hour period if your assignment is late. A penalty of 70% will be deducted from your score for >= 24-hour period. Assignments submitted more than 3 days late will not be assessed and will score as a zero (0). Weekend days will be counted. For assignments, you are encouraged to type your answers. 

Policy on Cheating: 

Cheating in an exam will result in an E grade for the course. Further, the students involved will be referred to the Dean's o ce for disciplinary action.

Homework problems are meant to be individual exercises; you must do these by yourself. Any of the following actions will be considered as cheating.

  1. A solution which is identical to or nearly identical to the solution submitted by another student in the class
  2. A solution which is identical to or nearly identical to the solution provided by the instructor in a previous o ering of CSI 431/531
  3. A solution which is identical to or nearly identical to a solution available on the Internet.

Cheating in a homework exercise will result in the following penalty for all the students involved.

  1. The homework in which cheating occurred will be assigned a grade of ZERO.
  2. The homework in which cheating occurred will be assigned a grade of ZERO.

Students who cheat in two or more homeworks will receive an E grade for the course. The names of such students will also be forwarded to the Dean's oce for disciplinary action.

Attendance:

Class attendance is required and checked. Each case of missing class without a proper explanation will cause 20% less from your final numerical grade. If you miss a class, it is your responsibility to find out the material covered in the class. There will absolutely no makeup classes. Only in specific, unavoidable situations students are allowed to excuse absences from class: 1) personal emergencies, including, but not limited to, illness of the student or of a dependent of the student, or death in the family [Require doctor's note]; 2) religious observances that prevent the student from attending class; 3) participation in University-sponsored activities, approved by the appropriate University authority, such as intercollegiate athletic competitions, activities approved by academic units, including artistic performances, academic field trips, and special events connected with coursework; 4) government-required activities, such as military assignments, jury duty, or court appearances; and 5) any other absence that the professor approves.  

Grading:

Homework Assignments : 35% | Exam: 30% | Presentation: 5% | Final Project (3-member team): 25% | Class Discussion and Participation: 5%