Working with Apache Hive Course Details:

Hive is the de-facto standard for data warehousing Hadoop. This course starts with a Hive setup and operations and continues into advanced Hive uses. It also discusses performance and execution engines while ending with a practical workshop.

    No classes are currenty scheduled for this course.

    Call (919) 283-1653 to get a class scheduled online or in your area!

Hive Basics

  • Defining Hive Tables
  • SQL Queries over Structured Data
  • Filtering / Search
  • Aggregations / Ordering
  • Partitions
  • Joins
  • Text Analytics (Semi-Structured Data)

Hive Advanced

  • Transformation, Aggregation
  • Working with Dates, Timestamps, and Arrays
  • Converting Strings to Date, Time, and Numbers
  • Create new Attributes, Mathematical Calculations, Windowing Functions
  • Use Character and String Functions
  • Binning and Smoothing
  • Processing JSON Data
  • Execution Engines (Tez, MR, and Spark)

Impala (for Cloudera track)

  • Architecture
  • Impala joins and other SQL specifics

Bonus Project

  • Students will work in teams to do this end-to-end workshop
  • Setup a data warehouse with Hive
  • Query and analyze data with Hive and Spark

*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.

Join an engaging hands-on learning environment, where you’ll learn:

  • Hive basics and features
  • How to process, transform, and manage data
  • Processing and performance management
  • How to setup a date warehouse with Hive
  • Data query and analysis

This course has a 50% hands-on labs to 50% lecture ratio with engaging instruction, demos, group discussions, labs, and project work.

 

Before attending this course, you should:

  • Be familiar with SQL
  • Be able to navigate the Linux command line
  • Have basic knowledge of command line Linux editors (VI/nano)

 

 

Data Scientists, Software Engineers, Developers, and Administrators

Ready to Jumpstart Your IT Career?

CONTACT US NOW!