Data Mining Techniques: Theory and Practice

In this course, you will learn about data mining methodology that is a superset to the SAS SEMMA methodology around which SAS Enterprise Miner is organized. You will also learn about a wide range of data mining algorithms as well as theoretical knowledge and practical skills. In this class, you will work through all the steps of a data mining project, beginning with problem definition and data selection, and continuing through data exploration, data transformation, sampling, portioning, modeling, and assessment.

    No classes are currenty scheduled for this course.

    Call (919) 283-1653 to get a class scheduled online or in your area!

1. Introduction to Data Mining

  • What is data mining?
  • Directed and undirected data mining
  • Models
  • Profiling and prediction

2. Data Mining Methodology

  • Why have a methodology?
  • How data miners can inadvertently learn things that are not true
  • Translating business problems into data mining problems
  • The importance of model stability
  • Finding the right input variables
  • Sampling to create balanced model sets
  • Partitioning to create training, validation, and test sets
  • Data preparation
  • Model assessment

3. Data Exploration

  • Developing intuition about data
  • Data structure
  • Data types
  • Data values
  • Exploring distributions
  • Summary statistics
  • Histograms
  • using SAS Enterprise Miner for data exploration

4. Regression Models

  • The null hypothesis
  • Statistical significance
  • Confidence bounds
  • Variance and standard deviation
  • Standardized values
  • Correlation
  • Linear regression
  • Logistic regression
  • Using SAS Enterprise Miner to build regression models

5. Decision Trees

  • Decision trees as data exploration and classification tools
  • Decision trees for modeling and scoring
  • Decision trees for variable selection
  • Alternate representations of decision trees
  • Algorithms used to build decision trees
  • Splitting criteria
  • Recognizing instability and overfitting in decision tree models
  • Capturing interactions between variables
  • Using SAS Enterprise Miner to build decision trees

6. Neural Networks

  • Origins of neural networks
  • Neural networks compared with regression
  • Algorithms used to train neural networks
  • Data preparation requirements for neural networks
  • Picking appropriate inputs for neural networks
  • Creating neural network models using SAS Enterprise Miner

7. Memory-Based Reasoning

  • Similarity and distance
  • Distance metrics appropriate for different kinds of data
  • The role of the training set in memory-based reasoning (MBR)
  • Combining the votes of several neighbors
  • Other K-nearest neighbor techniques
  • Collaborative filtering
  • Using the SAS Enterprise Miner MBR node

8. Clustering

  • More on similarity and distance
  • The k-means algorithm
  • Divisive clustering
  • Agglomerative clustering
  • Data preparation for clustering
  • Interpreting clusters
  • Finding clusters with SAS Enterprise Miner

9. Survival Analysis

  • Origins of survival analysis
  • How business data is different from clinical data
  • Hazards and hazard charts
  • Retention curves and survival curves
  • Calculating survival from retention
  • Calculating hazards empirically
  • Parametric hazard models
  • Censoring
  • Competing risks
  • Survival-based forecasting
  • Using SAS code in SAS Enterprise Miner to create survival curves

10. Association Rules

  • Market basket analysis
  • Association rules
  • Sequential pattern analysis
  • Using SAS Enterprise Miner to discover associations in retail data

11. Link Analysis

  • Background on graph theory
  • Sphere of influence
  • Using link analysis to generate derived variables
  • Graph-coloring algorithm
  • Kleinberg's algorithm

12. Genetic Algorithms

  • Optimization techniques and problems (SAS/OR software)
  • Other algorithms
  • Linear programming problems
  • Genetic algorithms

*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.

Exercises or hands-on workshops are included with most SAS courses.

  • Business analysts and their managers
  • Statisticians

Ready to Jumpstart Your IT Career?