Skip Navigation  |  
    
0
Skip Navigation | ANU Home | Search ANU | HORUS | Staff Home
The Australian National University
Centre for Science and Engineering of Materials
Printer Friendly Version of this Document

SATOMGI


Module 1
Data Mining and Matching

Lecturer: Dr. Peter Christen, Department of Computer Science, ANU
Email: Peter.Christen@anu.edu.auPeter.Christen@anu.edu.au

Module Description: Data mining is data analysis performed on very large databases with an emphasis on identifying and extracting novel, potentially useful, and understandable patterns and associations. Data mining is a multi-disciplinary field which uses a combination of machine learning, statistical analysis, modelling techniques, visualisation and database technology.

In many cases information from several data sources needs to be matched, linked and aggregated in order to allow more detailed data analysis or mining. Similarly, detecting and removing duplicate records that relate to the same entity within one data set is of importance, as data quality affects any subsequent analysis or mining. The aim of such linkages is to match and aggregate all records relating to the same entity.

Learning outcomes: On completion of this module, participants should have gained a understanding of the basic concepts and techniques used in data mining and data matching, including:
-the data mining process, how data mining is defined, application areas, disciplines involved, and the major challenges in data mining;
-data issues relevant to data mining (size, complexity, types and formats), data warehousing, data cleaning and pre-processing;
-unsupervised learning techniques like cluster analysis and association rules mining (including the basic methods used);
-supervised learning techniques (classification and prediction), including the basics of decision tree induction and how to measure classifier accuracy;
-schema integration, data matching (deterministic and probabilistic linkage), the importance of data cleaning, deduplication and geocoding.

Assumed knowledge:
-Basic understanding of spreadsheets and databases (tables, attributes, records).
-Knowledge of working with windows based computer systems.




General Enquiries to:
Email: Zbigniew.Stachurski@anu.edu.au
Phone: +61 2 6125 5681
Fax: +61 2 6125 0506