EECE 890G: Data Mining and Warehousing
Fall 1999: T Th 11:00 a.m.––12:15 p.m.
Professor: Dr. Michael N. Huhns, Room 3A41, 777-5921, (786-2686 home), huhns@sc.edu
This course will present the current state of research in strategies for enterprise-access to data for decision support and knowledge discovery. The course will cover both research and industrial practice in business intelligence, spanning the following three processes:
- Data Warehousing – extracting, cleaning, and organizing data from transactional databases
- Data Mining – taking warehouse data and extracting patterns and relationships
- Decision Support – taking the patterns extracted from the data and making management decisions
In addition to normal lectures by the professor, the course will include student presentations of assigned topics. Use of a PC to develop and use databases and analyze data is required.
Texts:
OLAP Solutions: Building Multidimensional Information Systems, Erik Thomsen, John Wiley & Sons, Inc., 1997. (ISBN 0-471-14931-4)
Microsoft Data Warehousing, Robert S. Craig, Joseph A. Vivona, and David Bercovitch, John Wiley & Sons, Inc., 1999. (ISBN 0-471-32761-1)
Introduction to Data Mining, Hand, Mannila, and Smyth, MIT Press, Cambridge, MA, 2000. (We will have a preprint version of this.)
The texts will be supplemented with assigned research papers.
Course Outline
- (One week) Introduction: trends in information systems
- Definition and motivations for data warehouses
- (Two weeks) Principles of Data Warehousing
- Types of data
- Conceptual architectures
- Design techniques and logical architectures
- Dimensional modeling
- Star and snowflake schemas (Slides)
- Temporal and spatial dimensions of data
- (Three weeks) Creating a Data Warehouse
- Data capture
- Replication
- Data cleaning
- (One week) Metadata, Ontologies, and Registries
- (One week) Introduction to Tasks of Data Mining
- (Two weeks) Statistical Evaluation of Data
- Parametric and nonparametric models
- Estimation
- Variance and bias
- (One week) Data Preparation: segmentation, outliers, and training sets
- (One week) Data Reduction
- (Two weeks) Data Modeling and Prediction
- Classification: regression, similarity, Bayesian, decision trees, and neural nets
- Clustering: hierarchical and nonhierarchical
- (Throughout the course) Case Studies
Grading
30% Approximately 10 problem sets and programming assignments
10% Analysis and presentation of research paper or experimental project
30% Midterm exam
30% Final exam