BLOG - Page 2 of 14 - Big Data In Real World

BLOG

May 8, 2023

How to fix Kafka Broker may not be available on 127.0.0.1 error?

This is a common error when you start working with Kafka. Pretty much every Kafka developer has seen this error at least once. Problem You might […]
May 4, 2023

How to use SnowSQL client to work with Snowflake?

In this post, we will discuss how to use SnowSQL client in Windows to connect and work with Snowflake. If you are new to Snowflake, we […]
May 1, 2023

Getting started with Snowflake in less than 10 minutes

Quick Introduction Snowflake Inc. offers an advanced data platform that provides a fully self-managed service called Snowflake. This platform enables faster, more flexible data storage, processing, […]
April 27, 2023

How to transpose a DataFrame from columns to rows in Spark?

Unfortunately there is no built in function to transpose a DataFrame from columns to rows in Spark. In this post we will show an easy way […]
April 24, 2023

How to properly remove or decommission a node from an Elasticsearch cluster?

Shutting down a node abruptly is not the right way to decommission or remove a node from the Elasticsearch cluster. Doing so will cause your shards […]
April 20, 2023

How to rename an S3 bucket?

There is no single command that can rename a S3 bucket. Mainly because S3 is not a filesystem and a bucket is not a typical folder […]
April 17, 2023

How to view a message in Kafka?

Sometimes we have the need to see the message that we ingest into Kafka. This is typically needed for a quick sanity check or we recently […]
April 13, 2023

What is the difference between map and mapValues functions in Spark?

In this post we will look at the differences between map and mapValues functions and when it is appropriate to use either one. We have a […]
April 10, 2023

What is an alias and how to create an alias in Elasticsearch?

An alias as the name suggests is an alias or another name to the index in Elasticsearch. It is quite useful when you want to refer […]
April 6, 2023

How to check size of a bucket in S3?

This is a simple and common problem or question with a simple answer. Solution Do a directory listing with recursive, human-readable and summarize options. You will […]
April 3, 2023

How to transpose or convert columns to rows in Hive?

Let’s say we have a table name  employee_multiple_depts and each employee in the table is mapped to 3 departments – dept1, dept2 and dept3. select * […]
March 30, 2023

How to read and write XML files with Spark?

We will be using the spark-xml package from Databrick to read and write XML files with Spark. Here is how we enter the spark shell to […]
March 27, 2023

How to automatically add timestamp to documents and find the latest document in Elasticsearch?

Elasticsearch used to add _timestamp field with the ingestion timestamp automatically to all documents that are being added to the index. Unfortunately, this was removed in […]
March 23, 2023

How to search a file or objects by name inside an S3 bucket?

Each files we upload to S3 gets assigned a key and the key has the following structure [FOLDERNAME]/[FILENAME] A common problem is to search the objects […]
March 20, 2023

How to fail a Hive script based on a condition?

This is a very useful trick when you have a big Hive script as part of your production jobs and you want to check the consistency […]
March 16, 2023

How to read and write Excel files with Spark?

In this post we are going to see how to work with Excel files in Spark. We will be using the spark-excel package created by Crealytics. […]
March 13, 2023

What is a pipeline and how to create a pipeline in Elasticsearch?

A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared.  Think of a […]
March 9, 2023

How to rename files or objects in Amazon S3?

Amazon S3 is not a filesystem so there is no rename command to rename objects or files. However this is a well known workaround. Workaround Issue […]
March 7, 2023

How to delete duplicate data from the Hive table?

This is a common problem if you are bringing data from a legacy system or simply from a system which you don’t have control over. First […]
February 25, 2023

Hadoop In Real World is now Big Data In Real World!

www.bigdatainrealworld.com is now fully live Hadoop In Real World is now Big Data In Real World! In-case you missed our communication from last week, you can […]
gdpr-image
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X