ICSI431/ICSI531 Data Mining

Instructor Feng Chen
Office LI-96J
Number (518) 442-4270
Email fchen5@albany.edu
Office Hour
Monday: 4:15PM to 5:15PM
Wednesday: 11:00AM to 12:00PM

Lin Zhang
Office CS lounge
Email lzhang22@albany.edu
Office Hour
Monday: 5:45PM to 7:45PM
Wednesday: 11:40AM to 2:40PM

Jonathan Song
Office CS lounge
Email njsong@albany.edu
Office Hour
Monday: 12:30PM to 2:30PM
Thursday: 1:00PM to 3:00PM

Huang, Trey
Office CS lounge
Email thuang2@albany.edu
Office Hour
Friday: 4PM to 6PM
Saturday: 4AM to 6PM

Makkar, Nippun
Office CS lounge
Email nmakkar@albany.edu
Office Hour
Monday 5:45PM T0 7:45PM
Wednsday 1:00PM T0 3:00PM

Nian, Xiaohu
Office CS lounge
Email xnian@albany.edu
Office Hour
Monday 12:30PM T0 2:30PM
Wednsday 12:30PM T0 2:30PM

Class Time and Location MoWe 2:45PM - 4:05PM, LC 05
Class Website http://www.cs.albany.edu/~fchen/course/2015-ICSI-431-531/

Course Postings

Course Description:

A course on data mining (finding patterns in data) algorithms and their application to interesting data types and situations. We cover algorithms that addresses the five core data mining tasks: prediction, classification, regression, clustering, and associations. Course projects will involve advanced topics such as algorithm developments for handling large data sets, sequential, spatial, and streaming data. Prerequisite(s): A Csi 310.


Introduction to Data Mining
Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Addison-Wesley, 2005
ISBN-10: 0321321367
ISBN-13: 978-0321321367
Data Mining: Concepts and Techniques (2nd Edition)
Jiawei Han, Micheline Kamber
Publisher: Morgan Kaufmann, 2011
ISBN-10: 0123814790
ISBN-13: 978-0123814791

Course Description:

The schedule indicates the concepts and material to be covered in each week under the column labeled "Topics". Each topci with "*" mark will be presented by a six- member team.

Week Date Lecture Topics Presentation Read Due (To be announced)
1 1/21 Introduction   Ch1  
2 1/26 Introduction - Continued   Ch2,Ch3  
1/28 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3  
3 2/2 Class Canceled   Ch2
2/4 Class Canceled   Ch2  
4 2/9 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2
2/11 Data Collection (Twitter, Craigslist, and Foursquare) Ch2  
5 2/16 Exploring Data Ch3
2/18 Exploring Data Ch3  
6 2/23 Classification - Introduction - Decision Tree   Ch4  
2/25 Classification - Support Vector Machines Ch4 HW4 1st Submission; HW3 2nd Submission
7 3/2 Classification - Support Vector Machines - Continued Ch4, Ch5
3/4 Classification - Support Vector Machines - Continued   Ch4, Ch5 HW5 1st submission
8 3/9 Classification - Support Vector Machines - Continued Ch4, Ch5  
3/11 Classification - Support Vector Machines - Continued   Ch4, Ch5 HW3 3rd submission; HW4 2nd submission; HW6 1st Submission
9 3/16 Spring Break      
3/18 Spring Break      
10 3/23 Clustering - K-means Ch8, Ch9
3/25 Clustering - Cluster Analysis Ch8, Ch9 HW4 3rd submission; HW5 and HW6 2nd submission;
11 3/30 Midterm Exam (Concepts, Close Book)      
4/1 Clustering - Cluster Analysis - Continued;
Course Project Discussion;
Ch8, Ch9  
4/3 No Class HW7 1st submission;
12 4/6 Association Rule Mining: Support, Confidence Ch6, Ch7
4/8 Association Rule Mining - Continued   Ch6, Ch7 HW5 and HW6 3rd submission;
13 4/13 Recommendaiton System   Reference materials on blackboard
4/15 Recommendaiton System - Continued   Reference materials on blackboard Posting of take-home quiz problem; HW7 2nd submission;
14 4/20 Sequential Data: Markov Model Reference materials on blackboard Submission of take-home quiz
4/22 Sequential Data: Dynamic Time Wrapping. Reference materials on blackboard
15 4/27 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
4/29 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard HW7 3rd submission;
16 5/4 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
5/6 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
15 5/11 Project Presentation      
5/13 Project Presentation      
5/15 No Class Submission of course project report

Course Project Requirement

Course Project teams:

Team Member
Team Project Title
Presentation Schedule
1 Nicholas Brown
Darshana Rane
Vinny Cerchia
5/11, order 1 (8 minutes)
2 Yizhen Chen
Wentao Liu
Hanyu xue
5/6, order 3 (8 minutes)
3. Phuc Bui
Hang Lin
Xuanyi Lin
The prediction of transportation 5/11, order 13 (8 minutes)
4 Samarth Shah
Gaurav Ghosh
Summit Hotwan
SomePlaceElse 5/11, order 1 (8 minutes)
5 Anthony Paradiso
Aashish Chaudhary
Eric Zeissler
5/11, order 4 (8 minutes)
6 Sam Pellino
Priya Balachandran
Saurabh Saxena
5/11, order 2 (8 minutes)
7 Vaibhav Kapse
Subhash Chandra Kilari
Rahul Srivastava
Fraud Detection 5/11, order 14 (8 minutes)
8 Paul Tomch
David Vadney
Daniel Hono
5/11, order 4 (8 minutes)
9 Lili Guo
Rui Wang
Yang Vincent
5/11, order 16 (8 minutes)
10 Ryan Dubowsky
Margaret Dubowsky
5/11, order 3 (8 minutes)
11 apurva kulkarni
prafull soni
akhil chaturvedi
5/11, order 2 (8 minutes)
12 Aaron champagne
Oguz aranay
Rafael Veras
Jonathan Shepard
5/11, order 7 (8 minutes)
13 Akash Shashikant Gawade
Shivam Agrawal
Harshad Bhanushali
Potential Car Buyers 5/11, order 12 (8 minutes)
14 Baojian Zhou
Zeyang Wu
Russell Sean
5/11, order 11 (8 minutes)
15 Congzhou Wang
Steven Heiple
Sushant Obeja
'College Culture': A Twitter-based System for Recommending Colleges to High School Students 5/11, order 10 (8 minutes)
16 Justine Buddie
Mike Scalera
Greg R Scalera
5/11, order 15 (8 minutes)
17 Dhruv Patel
Lars Hansen
Yuhan Zhang
Partify 5/11, order 5 (8 minutes)
18 Kushagra Sharma
Dhiraj Tanwar
Bilal Khan
5/11, order 5 (8 minutes)
19 Josh Gibbons
David Noftsier
Lin Yun
5/11, order 9 (8 minutes)
20 Kanakamedala Rajesh
Chenna Rohith Raj
Estimation of Crime within a city based on previous Crime rate 5/11, order 6 (8 minutes)
21 Botla Sai Prasanna Kumar
Bangaru Bhavana
Mangu Vamsee Jagannath
recommendation systems for movies 5/11, order 8 (8 minutes)

References for Lecture Topics:

1. Decision Tree

[1] Decision Tree Lecture Slides: http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_classification.ppt (http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_classification.pdf)

[2] Decision Tree 7 minutes tutorial video: https://www.youtube.com/watch?v=a5yWr1hr6QY


2. Logistic Regression

[1] Machine Learning with Python - Logistic Regression: http://aimotion.blogspot.com/2011/11/machine-learning-with-python-logistic.html

[2] A Tutorial in Logistic Regression: http://www.statpt.com/logistic/demaris_1995.pdf


Examinations and Assignments:

There are around 12 homework assignments. Homework assignments are due at the start of class. If you have an excused absence from a class, turn in the homework assignment prior to the class session. All assignments must have your name, student ID and course name/ number. 

Late Submission Policy: 

Assignments must be submitted before the class on the specified due date (Monday of designated week). A penalty of 30% will be deducted from your score for the first 24-hour period if your assignment is late. A penalty of 70% will be deducted from your score for >= 24-hour period. Assignments submitted more than 3 days late will not be assessed and will score as a zero (0). Weekend days will be counted. For assignments, you are encouraged to type your answers. 

Policy on Cheating: 

Cheating in an exam will result in an E grade for the course. Further, the students involved will be referred to the Dean's o ce for disciplinary action.

Homework problems are meant to be individual exercises; you must do these by yourself. Any of the following actions will be considered as cheating.

  1. A solution which is identical to or nearly identical to the solution submitted by another student in the class
  2. A solution which is identical to or nearly identical to the solution provided by the instructor in a previous o ering of CSI 431/531
  3. A solution which is identical to or nearly identical to a solution available on the Internet.

Cheating in a homework exercise will result in the following penalty for all the students involved.

  1. The homework in which cheating occurred will be assigned a grade of ZERO.
  2. The homework in which cheating occurred will be assigned a grade of ZERO.

Students who cheat in two or more homeworks will receive an E grade for the course. The names of such students will also be forwarded to the Dean's oce for disciplinary action.


Class attendance is required and checked. Each case of missing class without a proper explanation will cause 20% less from your final numerical grade. If you miss a class, it is your responsibility to find out the material covered in the class. There will absolutely no makeup classes. Only in specific, unavoidable situations students are allowed to excuse absences from class: 1) personal emergencies, including, but not limited to, illness of the student or of a dependent of the student, or death in the family [Require doctor's note]; 2) religious observances that prevent the student from attending class; 3) participation in University-sponsored activities, approved by the appropriate University authority, such as intercollegiate athletic competitions, activities approved by academic units, including artistic performances, academic field trips, and special events connected with coursework; 4) government-required activities, such as military assignments, jury duty, or court appearances; and 5) any other absence that the professor approves.  


Homework Assignments : 35% | Exam: 30% | Presentation: 5% | Final Project (3-member team): 25% | Class Discussion and Participation: 5%