Reading List for CSI 661 Data Mining

 

Text: Collection of Papers Available from Library or on-line. No required text.

 

I will cover algorithms that address the five core data mining tasks: prediction, classification, estimation, clustering and associations. Course projects will involve researching, implementing and testing algorithm developments for interesting situations (i.e.. rare events, large data sets) and data types (i.e.. sequential, spatial and streaming).

 

The majority of the course will consist of myself introducing the core data mining tasks. Most of this material will be drawn from the book by Dunham – Data Mining: Introductory and Advanced Topics. During this period you will choose several research areas of interest, researching and implementing variations to the basic algorithms for interesting situations (see below). For each interesting situation there will be a cache of papers to introduce you to the material. For the remainder of the course you will be presenting your results.

 

Introductory Material

 

Data Mining Tasks

Classification, estimation, clustering, associations, prediction and outlier detection

Pages 1 – 10 Dunham

 

Classification and Estimation

Chapter 4 Dunham 5 Lectures

 

Clustering

Chapter 5 Dunham 4 Lectures

 

Assocations

Chapter 6 Dunham 4 Lectures

 

Predictions

TBA

 

Outlier Detection

Provost and Fawcett

http://citeseer.nj.nec.com/cache/papers/cs/18783/http:zSzzSzwww.hpl.hp.comzSzpersonalzSzTom_FawcettzSzpaperszSzKDD99.pdf/fawcett99activity.pdf

 

 

Topic Areas

 

The topic list is not completed and will include:

Scaling Algorithms to Large Amounts of Data

Streaming Data

Spatial Data

Sequential Data

Cost Sensitive Predictions

Predicting Rare Events

Using Collections of Models to Predict More Accurately

 

Class Participation: 10%

Assignment #1: 30%

Assignment #2: 30%

Assignment #3: 30%