Math3346 -- Data Mining

Maths3346 -- Data Mining, 2nd Semester, 2008

As in 2006 and 2007, John Maindonald and Graham Williams will share responsibilty for this course.

Lectures and laboratories in 2008

Lectures will be held on Thursdays 14.00 - 16.00h, in Engineering (ENGN) T, and Fridays 9.00-10.00h, all in G35.

The weekly laboratory is held on Thursdays, from 9.00-11.00am. From Thursday September 4, the laboratory will be held in the RSPAS Joint Schools Training Lab, in the RSSS wing of the Coombs Building. The RSSS wing is the hexagon between the main entrance and the Chancelry, on the ground floor. To get to the laboratory, go to the breezeway (marked X on the map; if outside the University, you will need to log in via a proxy to get to the map) between the Tea Room quadrangle and East Road (the tea room balcony is marked Y on the map). If you stand with your back to the Chancelry building, there will be doors on your left and right. Go through the door on your left, it is next to a white door that contains a fire hose. Continue along that corridor, which ends at the entrance to the training lab.

Course materials in 2008 -- John Maindonald
Course materials in 2008 -- Graham Williams

A Perspective on Data Mining

Data mining gives a new twist on data deployment and analysis methodologies that have been developed over the past century or more. Recent developments have included: Classification is a major pre-occupation of data mining, with a more limited focus on regression with a continuous outcome variable. The aim is typically prediction rather than the more challenging task of interpretation of model parameters.

Students intending to take this course will it useful to come with some initial familiarity with the open-source R system for scientific and statistical computing and for graphics. The R system is available without charge for downloading from the internet. It is a marvelous example of what can be achieved when highly skilled specialists co-operative internationally, using the internet for communication and co-ordination.

Background Reading

Ian Ayres 2007, Super Crunchers. Why Thinking-By-Numbers is the New Way to be Smart. Bantam. [This places data mining in a wider context of data-based decision-making in business, government and consumer affairs. While popular in style and short on analysis detail, it offers a useful overview of ways in which applications of data mining and related analytical techniques are developing and changing, in part because of the new opportunities and challenges of the internet.]

Thomas H. Davenport and Jeanne G. Harris 2007, Competing on Analytics: The New Science of Winning. Harvard Business School Press. [Analytics is a buzzword for the application of data mining type approaches in commerce. Davenport and Thomas give a useful overview of issues for the deployment of analytical techniques within organizations - benefits and traps, choice of amenable tasks, the role of management, skill base issues, etc.]

John Maindonald and John Braun 2007, Data Analysis and Graphics Using R - An Example-Based Approach, 2nd edn Cambridge University Press. [Of greatest relevance to the course are Chapter 2 on Styles of Data Analysis, Chapters 5 & 6 (through to 6.3) on Linear Models, Chapter 8 (through to 8.3) on logistic regression, Chapter 11 on Tree-based Methods, and Chapter 12 (through to 12.2) on Multivariate Data Exploration & Discrimination.]

Links

MATH3346: Detailed syllabus and course description
Details as Graduate course (MATH6210)
Course materials in 2007
Suggestions for getting started on R
Graham Williams' data mining web page (NB in particular rattle, a GUI interface to a data mining toolkit)
Felix Andrews' web page for playwith; an R package for interaction with graphs. (Felix was a Math3346 student in 2004!)
John Maindonald's data mining talks and papers