When you see our name (hadoopinrealworld.com) you probably think we are just Hadoop and our content could be old. Our courses cover more than just Hadoop. For eg. our Hadoop Developer In Real World covers Kafka, File Formats among other things. Not only that, we have a dedicated course on Spark.

Funny story, when we want to register our website we want to register it as bigdatainrealworld.com and we sadly found out that it was taken. So we went with hadoopinrealworld.com. Couple years back we found bigdatainrealworld.com is available and we got it. Try www.bigdatainrealworld.com on your browser and see where it takes you.

You will get life time access to all our courses
You will get free cluster access on all our courses
You will get 30 day money back guarantee on all our courses

HADOOP STARTER KIT

I AM A BEGINNER AND I WANT TO LEARN HADOOP ESSENTIALS FOR FREE

This free course is better than others' paid courses and try Hadoop in our cluster for free in less than 15 minutes.

This course is Free but that does not mean this course only covers "What is Hadoop?". With this course you will get a deep understanding of HDFS & MapReduce. Infact, with our free cluster access you will have hands on experience with HDFS and MapReduce.

n addition to HDFS and MapReduce this course will also give you a very good introduction to Apache Pig & Hive. Again, you can try Pig & Hive in our cluster. Plus, you will learn what is Cloudera Manager and how to setup a Hadoop cluster on cloud using Cloudera Manager.

HADOOP DEVELOPER IN REAL WORLD

I KNOW THE BASICS, I WANT TO BECOME A CONFIDENT HADOOP DEVELOPER

Only Hadoop Developer course in the market that will team you how to survive in a real Hadoop production environment.

Target Audience

This is the course for you if you are new to big data world and you want to start from the basics and get in to development. If you don't know about distributed systems, MapReduce then you should certainly start with this course.

Non programmers can certainly take this course. You will see code in Java in the course. We understand some of you may not have Java background so we break down every bit of code so you can follow. We promise, no one will be left out.

Coverage

In Hadoop Developer In Real World, you will learn the basics of distributed systems, HDFS and MapReduce. Once you conquered the basics, you will move on to learning important tools in the big data ecosystem like Apache Pig, Hive, Sqoop and Flume. You will also understand Hadoop architecture, learn to setup a Hadoop cluster all in Amazon Web Services (AWS). You will also explore EMR. 

In real world, datasets come in different formats, formats like CSV and text are not the norm. So, you will learn all about different file formats like Avro, ORC and Parquet. You will also learn how to use these file formats with Hive and Pig. To survive in real world high pressure production environments you will need to learn to troubleshoot issues and work on performance optimizations. We have a dedicated chapter to help you will troubleshooting and performance optimizations. Finally, we have a dedicated chapter on Kafka. It is an in-depth chapter on Kakfa covering all basics and advanced concepts like Kafka Schema Registry.

This course also covers a lot of questions that are asked in real interviews. Additionally, students who write a review about the course will also receive our free Hadoop Developer Interview Guide for free.

Hands-On Projects

Facebook Problem | Finding Mutual Friends
New York Times: Time Machine | Distributed Text to PDF conversion
Million Song Dataset | Finding Rare Artists with Great Songs
Wikipedia | Page Ranking
Twitter | Find Influential Users
Meetup.com | Streaming RSVP with Kafka
Meetup.com | Schema evolution with Kafka Schema Registry

HADOOP ADMINISTRATOR IN REAL WORLD

I WANT TO ADMINISTER CHAOTIC HADOOP CLUSTERS WITH CONFIDENCE

Learn all skills necessary to manage and administer chaotic Hadoop production clusters stress free.

Target Audience

This is the course for you if you are new to big data world and you want to start from the basics and get in to big data administration. Non programmers can certainly take this course and it is ideal for system engineers, DBAs etc. 

Coverage

You will learn all the admin essentials in the course from getting to know your cluster, starting, stopping services, adding & removing nodes to and from the cluster, protecting and recovering from data losses, control disk usage and assign quotas to control the storage efficiently etc. Administration is more than just install, start and stop services. You will learn all critical functionalities like installing and configuring Kerberos for authentication, High Availability, Schedulers like Fair, Capacity for resource management just to highlight a few. Bottom line, we have not missed a topic which is critical. 

We have a dedicated chapter on cluster planning. You will be able to estimate the storage needs, computational needs, number of nodes in the cluster, picking the right configuration for individual nodes, design a good network topology, choosing between storage intensive nodes and compute intensive nodes etc. Simply put, you will plan a cluster like a PRO.

Hadoop/Big Data administrators are paid top dollars to handle chaos when things are broken. When you know how things work under the hood, that is the configuration details behind the tools and it's functionalities you will be in a better positions to fix issues efficiently. We show you how to install and configure services manually so you understand the details behind the scenes.

Hands-On Projects

Cluster setup with Cloudera Manager & Ambari
High availability configuration and setup
Protecting services with Kerberos 
Resource management with schedulers like Fair & Capacity
Cluster installation with Puppet
Cluster planning and tuning 
Troubleshooting and monitoring

SPARK STARTER KIT

I WANT TO LEARN WHAT'S SO SPECIAL ABOUT SPARK

NOT another "What is Spark?" course ! Explore Spark in depth and get a strong foundation in Spark.

Most courses and other online help including Spark's documentation is not good in helping students understand the foundational concepts. They explain what is Spark, what is RDD, what is "this" and what is "that" but students were most interested in understanding core fundamentals and more importantly answer questions like -

Why do we need Spark when we have Hadoop ?
What is the need for RDD ?
How Spark is faster than Hadoop?
How Spark achieves the speed and efficiency it claims ?
How does memory gets managed in Spark?
How fault tolerance work in Spark ?
and that is exactly what you will learn in this free Spark Starter Kit course. The aim of this course is to give you a strong foundation in Spark.

SPARK DEVELOPER IN REAL WORLD

I WANT TO TAKE MY BIG DATA SKILLSET TO THE NEXT LEVEL

Finally, a spark course designed to demystify spark, prepare you for real world & give you the confidence you need

Target Audience

This course is for you if you are already familiar with the basics of big data and distributed systems and you want to become a master Spark developer. 

Coverage

Spark is more of a mystery even for the ones who are working in Spark. This is because most don't understand how Spark works, how it achieves the efficiency with job execution and how Spark interact with other sources. This is exactly what scares beginners in Spark to get in to Spark as well. Don't worry. We have got your back. We will untie the tangled parts of Spark and demystify Spark in an easy and simple way in which you could understand. You are in good hands with us.

We go beyond RDD, DataFrame and Dataset. Spark is much more than RDDs and Spak SQL. You will learn Shuffle in Depth and we have covered both Hash and Sort based shuffle managers in Spark. You will understand how your code get converted to Spark application, stages and tasks. Knowing this detail will help you write better and optimized Spark applications. You will learn how to use Spark with different Sources like databases, Kafka, HBase and ElasticSearch. You will also learn to use Spark with different file formats from CSV to Avro, Parquet & ORC.

You will understand Spark's approach to optimization with Catalyst Optimizer and Project Tungsten. You will learn details you will find nowhere. You will learn about setting up a Spark cluster and different resource management options available for you to use. A Spark course is not complete until you learn about Optimizations & Troubleshooting. Spark is more than just a computation framework, it is a data analytics platform. So you will learn both Spark Streaming and Spark Machine Learning in depth.

Hands-On Projects

Page Ranking pages from Wikipedia DataFrames | RDD
Analyzing Trending YouTube videos (CSV & JSON) | Datasources & Formats
Steaming with activity data from IoT devices | Spark Streaming
Streaming data from Meetup.com with Kafka | Spark Streaming
Predicting Country’s Happiness Rank from Happiness Score | Machine Learning
Predicting 2016 US Elections | Machine Learning
Predicting Yelp Rating (+ve / -ve) | Machine Learning
Build mini site with Stackoverfow data with Elasticsearch | End to End Project