Writing A File To HDFS - Java Program - Big Data In Real World

Writing A File To HDFS – Java Program

Speculative Execution
August 16, 2015
Reading A File From HDFS – Java Program
August 26, 2015

Writing A File To HDFS – Java Program

 

Writing a file to HDFS is very easy, we  can simply execute hadoop fs -copyFromLocal  command to copy a file from local filesystem to HDFS. In this post we will write our own Java program to write the file from local file system to HDFS.

Here is the program – FileWriteToHDFS.java

public class FileWriteToHDFS {

public static void main(String[] args) throws Exception {

//Source file in the local file system
String localSrc = args[0];
//Destination file in HDFS
String dst = args[1];

//Input stream for the file in local file system to be written to HDFS
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));

//Get configuration of Hadoop system
Configuration conf = new Configuration();
System.out.println("Connecting to -- "+conf.get("fs.defaultFS"));

//Destination file in HDFS
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst));

//Copy file from local to HDFS
IOUtils.copyBytes(in, out, 4096, true);

System.out.println(dst + " copied to HDFS");

}
}

The program takes in 2 parameters. The first paramter is the file and its location in the local file system that will be copied to the location mentioned in the second parameter in HDFS.

//Source file in the local file system
String localSrc = args[0];
//Destination file in HDFS
String dst = args[1];

We will create a InputStream using the BufferedInputStream object by using the first parameter which is the location of the file in the local file system. The input stream objects are regular java.io  stream objects and not hadoop libraries because we are still referencing a file from the local file system and not HDFS.

//Input stream for the file in local file system to be written to HDFS
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));

Now we need to create an output stream to the file location in HDFS where we can write the contents of the file from the local file system. The very first thing we need to know is few key information about the cluster, like the name node details etc. The details are already specified in the configuration files during cluster setup.

The easiest way to get the configuration of the cluster is by instantiating the Configuration object and this will read the configuration files from the classpath and read and load all the information that is needed by the program.

//Get configuration of Hadoop system
Configuration conf = new Configuration();
System.out.println("Connecting to -- "+conf.get("fs.defaultFS"));

//Destination file in HDFS
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst));

In the next line we will get the File System object using the URL that we passed as the program’s input and the configuration that we just created. The file system that will be returned is the DistributedFileSystem  object. Once we have the file system object the next thing we need is the outputstream to the file that we would like to write the contents of the file from the local file system.

We will then call the create method on the file system object using the location of the file in HDFS which we passed to the program as the second parameter.

//Copy file from local to HDFS
IOUtils.copyBytes(in, out, 4096, true);

Finally we will use copyBytes  method from hadoop’s IOUtils class and we will supply the input and output stream object. We will then read 4096 bytes at a time from the input stream and write it to the output stream which will copy the entire file from the local file system to HDFS.

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

1 Comment

  1. […] this last post we saw how to write a file to HDFS by writing our own Java program. In this post we will see how to […]

gdpr-image
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X