Text: Collection of Papers Available from Library or on-line. No required text.
I will cover algorithms that address the five core data mining tasks: prediction, classification, estimation, clustering and associations. Course projects will involve researching, implementing and testing algorithm developments for interesting situations (i.e.. rare events, large data sets) and data types (i.e.. sequential, spatial and streaming).
The majority of the course will consist of myself introducing the core data mining tasks. Most of this material will be drawn from the book by Dunham – Data Mining: Introductory and Advanced Topics. During this period you will choose several research areas of interest, researching and implementing variations to the basic algorithms for interesting situations (see below). For each interesting situation there will be a cache of papers to introduce you to the material. For the remainder of the course you will be presenting your results.
Data Mining Tasks
Classification, estimation, clustering, associations, prediction and outlier detection
Pages 1 – 10 Dunham
Chapter 4 Dunham 5 Lectures
Chapter 5 Dunham 4 Lectures
Chapter 6 Dunham 4 Lectures
TBA
Provost and Fawcett
http://citeseer.nj.nec.com/cache/papers/cs/18783/http:zSzzSzwww.hpl.hp.comzSzpersonalzSzTom_FawcettzSzpaperszSzKDD99.pdf/fawcett99activity.pdf
The topic list is not completed and will include:
Scaling Algorithms to Large Amounts of Data
Streaming Data
Spatial Data
Sequential Data
Cost Sensitive Predictions
Predicting Rare Events
Using Collections of Models to Predict More Accurately
Class Participation: 10%
Assignment #1: 30%
Assignment #2: 30%
Assignment #3: 30%