ICSI431/ICSI531 Data Mining

Instructor Feng Chen
Office LI-96J
Number (518) 442-4270
Email fchen5@albany.edu
Office Hour
Monday: 4:15PM to 5:15PM
Wednesday: 11:00AM to 12:00PM

TA
Lin Zhang
Office CS lounge
Number  
Email lzhang22@albany.edu
Office Hour
Monday: 5:45PM to 7:45PM
Wednesday: 11:40AM to 2:40PM

TA
Jonathan Song
Office CS lounge
Number  
Email njsong@albany.edu
Office Hour
Monday: 12:30PM to 2:30PM
Thursday: 1:00PM to 3:00PM

TA
Huang, Trey
Office CS lounge
Number  
Email thuang2@albany.edu
Office Hour
Friday: 4PM to 6PM
Saturday: 4AM to 6PM

TA
Makkar, Nippun
Office CS lounge
Number  
Email nmakkar@albany.edu
Office Hour
Monday 5:45PM T0 7:45PM
Wednsday 1:00PM T0 3:00PM

TA
Nian, Xiaohu
Office CS lounge
Number  
Email xnian@albany.edu
Office Hour
Monday 12:30PM T0 2:30PM
Wednsday 12:30PM T0 2:30PM

Class Time and Location MoWe 2:45PM - 4:05PM, LC 05
Class Website http://www.cs.albany.edu/~fchen/course/2015-ICSI-431-531/

Course Postings

Course Description:

A course on data mining (finding patterns in data) algorithms and their application to interesting data types and situations. We cover algorithms that addresses the five core data mining tasks: prediction, classification, regression, clustering, and associations. Course projects will involve advanced topics such as algorithm developments for handling large data sets, sequential, spatial, and streaming data. Prerequisite(s): A Csi 310.

TextBook

Introduction to Data Mining
Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Addison-Wesley, 2005
ISBN-10: 0321321367
ISBN-13: 978-0321321367
Data Mining: Concepts and Techniques (2nd Edition)
Jiawei Han, Micheline Kamber
Publisher: Morgan Kaufmann, 2011
ISBN-10: 0123814790
ISBN-13: 978-0123814791

Course Description:

The schedule indicates the concepts and material to be covered in each week under the column labeled "Topics". Each topci with "*" mark will be presented by a six- member team.

Week Date Lecture Topics Presentation Read Due (To be announced)
1 1/21 Introduction   Ch1  
2 1/26 Introduction - Continued   Ch2,Ch3  
1/28 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2,Ch3  
3 2/2 Class Canceled   Ch2
2/4 Class Canceled   Ch2  
4 2/9 Data Collection (Twitter, Craigslist, and Foursquare)   Ch2
2/11 Data Collection (Twitter, Craigslist, and Foursquare) Ch2  
5 2/16 Exploring Data Ch3
2/18 Exploring Data Ch3  
6 2/23 Classification - Introduction - Decision Tree   Ch4  
2/25 Classification - Support Vector Machines Ch4 HW4 1st Submission; HW3 2nd Submission
7 3/2 Classification - Support Vector Machines - Continued Ch4, Ch5
3/4 Classification - Support Vector Machines - Continued   Ch4, Ch5 HW5 1st submission
8 3/9 Classification - Support Vector Machines - Continued Ch4, Ch5  
3/11 Classification - Support Vector Machines - Continued   Ch4, Ch5 HW3 3rd submission; HW4 2nd submission; HW6 1st Submission
9 3/16 Spring Break      
3/18 Spring Break      
10 3/23 Clustering - K-means Ch8, Ch9
3/25 Clustering - Cluster Analysis Ch8, Ch9 HW4 3rd submission; HW5 and HW6 2nd submission;
11 3/30 Midterm Exam (Concepts, Close Book)      
4/1 Clustering - Cluster Analysis - Continued;
Course Project Discussion;
Ch8, Ch9  
4/3 No Class HW7 1st submission;
12 4/6 Association Rule Mining: Support, Confidence Ch6, Ch7
4/8 Association Rule Mining - Continued   Ch6, Ch7 HW5 and HW6 3rd submission;
13 4/13 Recommendaiton System   Reference materials on blackboard
4/15 Recommendaiton System - Continued   Reference materials on blackboard Posting of take-home quiz problem; HW7 2nd submission;
14 4/20 Sequential Data: Markov Model Reference materials on blackboard Submission of take-home quiz
4/22 Sequential Data: Dynamic Time Wrapping. Reference materials on blackboard
15 4/27 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
4/29 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard HW7 3rd submission;
16 5/4 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
5/6 Graph Data: Probabilistic Soft Logic   Reference materials on blackboard  
15 5/11 Project Presentation      
5/13 Project Presentation      
5/15 No Class Submission of course project report

Course Project Requirement

Course Project teams:

Team Member
Team Project Title
Presentation Schedule
1 Nicholas Brown
Darshana Rane
Vinny Cerchia
5/11, order 1 (8 minutes)
2 Yizhen Chen
Wentao Liu
Hanyu xue
5/6, order 3 (8 minutes)
3. Phuc Bui
Hang Lin
Xuanyi Lin
The prediction of transportation 5/11, order 13 (8 minutes)
4 Samarth Shah
Gaurav Ghosh
Summit Hotwan
SomePlaceElse 5/11, order 1 (8 minutes)
5 Anthony Paradiso
Aashish Chaudhary
Eric Zeissler
5/11, order 4 (8 minutes)
6 Sam Pellino
Priya Balachandran
Saurabh Saxena
5/11, order 2 (8 minutes)
7 Vaibhav Kapse
Subhash Chandra Kilari
Rahul Srivastava
Fraud Detection 5/11, order 14 (8 minutes)
8 Paul Tomch
David Vadney
Daniel Hono
5/11, order 4 (8 minutes)
9 Lili Guo
Rui Wang
Yang Vincent
5/11, order 16 (8 minutes)
10 Ryan Dubowsky
Margaret Dubowsky
5/11, order 3 (8 minutes)
11 apurva kulkarni
prafull soni
akhil chaturvedi
5/11, order 2 (8 minutes)
12 Aaron champagne
Oguz aranay
Rafael Veras
Jonathan Shepard
5/11, order 7 (8 minutes)
13 Akash Shashikant Gawade
Shivam Agrawal
Harshad Bhanushali
Potential Car Buyers 5/11, order 12 (8 minutes)
14 Baojian Zhou
Zeyang Wu
Russell Sean
5/11, order 11 (8 minutes)
15 Congzhou Wang
Steven Heiple
Sushant Obeja
'College Culture': A Twitter-based System for Recommending Colleges to High School Students 5/11, order 10 (8 minutes)
16 Justine Buddie
Mike Scalera
Greg R Scalera
5/11, order 15 (8 minutes)
17 Dhruv Patel
Lars Hansen
Yuhan Zhang
Partify 5/11, order 5 (8 minutes)
18 Kushagra Sharma
Dhiraj Tanwar
Bilal Khan
5/11, order 5 (8 minutes)
19 Josh Gibbons
David Noftsier
Lin Yun
5/11, order 9 (8 minutes)
20 Kanakamedala Rajesh
Chenna Rohith Raj
Estimation of Crime within a city based on previous Crime rate 5/11, order 6 (8 minutes)
21 Botla Sai Prasanna Kumar
Bangaru Bhavana
Mangu Vamsee Jagannath
recommendation systems for movies 5/11, order 8 (8 minutes)

References for Lecture Topics:

1. Decision Tree

[1] Decision Tree Lecture Slides: http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_classification.ppt (http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_classification.pdf)

[2] Decision Tree 7 minutes tutorial video: https://www.youtube.com/watch?v=a5yWr1hr6QY

 

2. Logistic Regression

[1] Machine Learning with Python - Logistic Regression: http://aimotion.blogspot.com/2011/11/machine-learning-with-python-logistic.html

[2] A Tutorial in Logistic Regression: http://www.statpt.com/logistic/demaris_1995.pdf

 

Examinations and Assignments:

There are around 12 homework assignments. Homework assignments are due at the start of class. If you have an excused absence from a class, turn in the homework assignment prior to the class session. All assignments must have your name, student ID and course name/ number. 

Late Submission Policy: 

Assignments must be submitted before the class on the specified due date (Monday of designated week). A penalty of 30% will be deducted from your score for the first 24-hour period if your assignment is late. A penalty of 70% will be deducted from your score for >= 24-hour period. Assignments submitted more than 3 days late will not be assessed and will score as a zero (0). Weekend days will be counted. For assignments, you are encouraged to type your answers. 

Policy on Cheating: 

Cheating in an exam will result in an E grade for the course. Further, the students involved will be referred to the Dean's o ce for disciplinary action.

Homework problems are meant to be individual exercises; you must do these by yourself. Any of the following actions will be considered as cheating.

  1. A solution which is identical to or nearly identical to the solution submitted by another student in the class
  2. A solution which is identical to or nearly identical to the solution provided by the instructor in a previous o ering of CSI 431/531
  3. A solution which is identical to or nearly identical to a solution available on the Internet.

Cheating in a homework exercise will result in the following penalty for all the students involved.

  1. The homework in which cheating occurred will be assigned a grade of ZERO.
  2. The homework in which cheating occurred will be assigned a grade of ZERO.

Students who cheat in two or more homeworks will receive an E grade for the course. The names of such students will also be forwarded to the Dean's oce for disciplinary action.

Attendance:

Class attendance is required and checked. Each case of missing class without a proper explanation will cause 20% less from your final numerical grade. If you miss a class, it is your responsibility to find out the material covered in the class. There will absolutely no makeup classes. Only in specific, unavoidable situations students are allowed to excuse absences from class: 1) personal emergencies, including, but not limited to, illness of the student or of a dependent of the student, or death in the family [Require doctor's note]; 2) religious observances that prevent the student from attending class; 3) participation in University-sponsored activities, approved by the appropriate University authority, such as intercollegiate athletic competitions, activities approved by academic units, including artistic performances, academic field trips, and special events connected with coursework; 4) government-required activities, such as military assignments, jury duty, or court appearances; and 5) any other absence that the professor approves.  

Grading:

Homework Assignments : 35% | Exam: 30% | Presentation: 5% | Final Project (3-member team): 25% | Class Discussion and Participation: 5%