Class Information

Instructor: Petko Bogdanov

Time and Location: Tuesdays and Thursdays 2:45PM-4:05PM, PC 263
Office Hours: Monday 1pm-2pm, Wednesday 10am-11am or by appointment, Computer Science 095I
Prerequisites: Java programming, basic probability, proofs and algorithmic analysis and linear algebra.



Course topics

MapReduce and Hadoop
Frequent itemsets and Association rules
Near Neighbor Search in High Dimensions
Locality Sensitive Hashing (LSH)
Dimensionality reduction: SVD and CUR
Recommender systems
Clustering
Analysis of massive graphs
Analysis: PageRank, HITS
Web spam and TrustRank
Proximity search on graphs
Large-scale supervised machine learning
Mining data streams
Web advertising
Optimizing submodular functions
Anomaly Detection
Distributed graph processing
.....More Advanced Topics

Syllabus for the class

Text

BOOK: Mining Massive Datasets by J. Leskovec, A. Rajaraman, J. Ullman. Available here

Grading and Evaluation

50% - Final Project
30% - Homework
10% - In-class Quizzes
10% - Critical Paper Reviews
Extra Credit: up to 5% for in-class participation


Project

Teams: 1 or 2 members (No Exceptions)
Milestones: Unless otherwise stated, all milestones are due at midnight of the designated date.
- (Sep 4) Project groups + web site. Create a public website for the project containing the team members.
- (Sep 25) Project proposal (post on the project web site). This includes problem formulation, related work and how the project is different from the related work.
- (Sep 24) Flash presentation (due in class). 2 minutes per project in class presentation/advertisement of the proposed project (you can use a slide if you email it to the instructor the previous day)
- (Sep 28 - Oct 2) Come to office hours as a team to discuss your project.
- (Oct 9) Evaluation plan (update project website): A planned outline of datasets, what are you going to measure in order to evaluate the project.
- (Oct 31) Mid-project report (update project website): draft of your final write-up for the project. Even though some experiments/implementation might be missing. A special section on key risks/unknowns.
- (Dec 8) Final project presentation (due in class). Conference-style presentation with Q&A.
- (Dec 11) Project paper due in Blackboard. You are expected to use the ACM format to write your project reports (8 pages maximum, 4 pages minimum, including references; this page limit is strict).