Apache Spark Primer

Apache Spark Primer Course Details:

Apache Spark is an important component in the Hadoop Ecosystem as a cluster computing engine used for Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, Spark offers faster in-memory processing for computing tasks when compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R along with SQL-based front-ends.

This course introduces Scala, Python, or R developers to the world of Spark programming. It begins with an overview of the ecosystem and hands-on experience with the platform such as working with the Spark Shell, using RDDs, and DataFrames. You’ll later explore a wider-scoped introduction to NoSQL, Spark Streaming, Spark SQL, Spark MLLib, and how the pieces are put together in a larger application.

No classes are currenty scheduled for this course.

Call (919) 283-1674 to get a class scheduled online or in your area!

Overview of Spark

Hadoop Ecosystem
Hadoop YARN vs. Mesos
Spark vs. Map/Reduce
Spark: Lambda Architecture
Spark in the Enterprise Data Science Architecture

Spark Component Overview

Spark Shell
RDDs: Resilient Distributed Datasets
Data Frames
Spark 2 Unified DataFrames
Spark Sessions
Functional Programming
Spark SQL
MLib
Structured Streaming
Spark R
Spark and Python

RDDs: Resilient Distributed Datasets

Coding with RDDs
Transformations
Actions
Lazy Evaluation and Optimization
RDDs in Map/Reduce
Exercise: Working with RDDs

DataFrames

RDDs vs. DataFrames
Unified DataFrames (UDF) in Spark 2.x
Partitioning
Exercise: Working with Unified DataFrames

Advanced Spark Overview

NoSQL
Spark SQL
Spark Streaming
Spark ML Lib

*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.

Join an engaging hands-on learning environment, where you’ll learn:

The essentials of Spark architecture and applications
How to execute Spark Programs
How to create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)
How Spark core components come together for complete applications

This course has a 50% hands-on labs to 50% lecture ratio with engaging instruction, demos, group discussions, labs, and project work.

Before attending this course, you should have:

Experience programming in either Java, Python, R, or Scala (only one language needed)
Basic understanding of SQL

Data Scientists, Data Engineers, Software Engineers, Architects, and Developers.

Apache Spark Primer

Apache Spark Primer Course Details:

Ready to Jumpstart Your IT Career?