Introduction to SAS and Hadoop

Introduction to SAS and Hadoop Course Details:

In this course, you will learn how to use SAS programming methods to read, write, and manipulate Hadoop data. You will learn about Base SAS methods, including reading and writing raw data with the DATA step as well as managing the Hadoop file system and executing Map-Reduce and Pig code from SAS via the HADOOP procedure. In addition, the SAS/ACCESS Interface to Hadoop methods that allow LIBNAME access and SQL pass-through techniques to read and write Hadoop HIVE or Cloudera Impala tables structures is covered. You will receive a brief overview of additional SAS and Hadoop technologies, including DS2, high-performance analytics, SAS LASR Server, and In-Memory Statistics, as well as the computing infrastructure and data access methods that support these.

No classes are currenty scheduled for this course.

Call (919) 283-1674 to get a class scheduled online or in your area!

1. Introduction

What is Hadoop?
How SAS interfaces with Hadoop

2. Accessing HDFS and Invoking Hadoop Applications from SAS

Overview of methods available in Base SAS for interacting with Hadoop
Reading and writing Hadoop files using Base SAS
Methods
Executing mapreduce code
Executing Pig code using PROC HADOOP

3. Using the SQL Pass-Through Facility

Understand the SQL procedure pass-through facility
Connecting to a Hadoop Hive database
Learning methods to query Hive tables
Investigating Hadoop Hive metadata
Creating SQL procedure pass-through queries
Creating and loading Hive tables with SQL pass-through EXECUTE statements
Handling Hive STRING data types

4. Using the SAS/ACCESS LIBNAME Engine

Using the LIBNAME statement for Hadoop
Using data set options
Creating views
Combining tables
Benefits of the LIBNAME method
Using PROC HDMD to access delimited data, XML data, and other non-Hive formats
Performance considerations for the SAS/ACCESS LIBNAME statement
Copying data from a SAS library to a Hive library

5. Partitioning and Clustering Hive Tables

Identifying partitioning, clustering, and indexing methods in Hive
How partitioning and clustering can increase query performance
Creating and loading partitioned and clustered Hive tables

6. Overview of SAS In-Memory Analytics and the Code Accelerator for Hadoop

Using high-performance procedures and the SASHDAT library engine
Creating a LASR Analytic server session
Using the SASIOLA engine
Executing DS2 threads in the Hadoop cluster to summarize data
Using PROC HDMD to access HDFS files

*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.

Exercises or hands-on workshops are included with most SAS courses

SAS programmers who need to access data in Hadoop from within SAS

Introduction to SAS and Hadoop

Introduction to SAS and Hadoop Course Details:

Ready to Jumpstart Your IT Career?