Class Information

Instructor: Petko Bogdanov

Time and Location: Tuesdays and Thursdays 2:45PM-4:05PM, PC 263
Office Hours: Monday 1pm-2pm, Wednesday 10am-11am or by appointment, Computer Science 095I
Prerequisites: Java programming, basic probability, proofs and algorithmic analysis and linear algebra.



Course topics

MapReduce and Hadoop
Frequent itemsets and Association rules
Near Neighbor Search in High Dimensions
Locality Sensitive Hashing (LSH)
Dimensionality reduction: SVD and CUR
Recommender systems
Clustering
Analysis of massive graphs
Analysis: PageRank, HITS
Web spam and TrustRank
Proximity search on graphs
Large-scale supervised machine learning
Mining data streams
Web advertising
Optimizing submodular functions
Anomaly Detection
Distributed graph processing
.....More Advanced Topics

Syllabus for the class

Text

BOOK: Mining Massive Datasets by J. Leskovec, A. Rajaraman, J. Ullman. Available here

Class notes to supplement slides from MMDS@Stanford

Grading and Evaluation

40% - Final Project
40% - Homework
10% - In-class Quizzes
10% - Critical Paper Reviews, extra assignments
Extra Credit: up to 5% for in-class participation


Project

Teams: 1 or 2 members (No Exceptions)
Milestones: Unless otherwise stated, all milestones are due at midnight of the designated date.
- (Jan 28) Project groups + web site. Create a public website for the project containing the team members.
- (Feb 15) Project proposal (post on the project web site). This includes problem formulation, related work and how the project is different from the related work.
- (Feb 16) Flash presentation (due in class). 2 minutes per project in class presentation/advertisement of the proposed project (you can use a slide if you email it to the instructor the previous day)
- (Feb 15 - Feb 24) Come to office hours as a team to discuss your project.
- (Feb 29) Evaluation plan (update project website): A planned outline of datasets, what are you going to measure in order to evaluate the project.
- (Mar 20) Mid-project report (update project website): draft of your final write-up for the project. Even though some experiments/implementation might be missing. A special section on key risks/unknowns.
- (Apr 24) First experimental figure due
- (May 3) Final project presentation (due in class). Conference-style presentation with Q&A.
- (May ) Project paper due in Blackboard. You are expected to use the ACM format to write your project reports (8 pages maximum, 4 pages minimum, including references; this page limit is strict).

Project teams and web pages


Yutthana and Sayali: Project home

Zeyang: Project home

Chunpai: Project home

Ravi: Project home

Gregory: Project home

Yun Lin: Project home