We hosted a webinar Saturday, September 30th 2017 and the topic that was covered was RCFile vs. ORC. We had over 60 participants in the webinar. So first of all, we would like to thank everyone who joined the webinar. We always like hosting webinars and going live. This gives us a great way to interact with Hadoop In Real World community/students live. From the participants who shared where they are from – we had participants for India, US, Canada and UK. The webinar was very engaging and interactive.
The topic of discussion was RCFile vs. ORC. So we started the discussion explaining what is a row major and column major format. We went over the differences between a row major and a column major format; followed by discussing the advantages and disadvantages of each.
RCFile is a joint effort from Facebook, Ohio State University, and the Institute of Computing Technology at the Chinese Academy of Sciences. We explained the motivation behind RCFile and how it combines the advantages of both record and columnar formats and what benefits it brings to the big data space. Discussion on RCFile is not complete with out explaining Lazy Decompression. RCFile uses a technique called Lazy Decompression which offers optimization benefits during execution. We explained what is lazy decompression and how it is used in RCFile.
Next, we went in to ORC (Optimized RCFile). ORC is an open source tool from Hortonworks. We started discussing the inefficiencies of RCFile and the need for optimizations to RCFile. We briefly looked at the structure of the ORC file. One of the strong selling points of ORC is statistics or metadata about the columns. ORC stores these statistics or indexes at 3 locations – file level, stripe level and row level. We took a very simple query with a where condition and illustrated how ORC can skip files, stripes or row groups based on the where condition and statistics stored in the indexes. Finally we finished the discussion by touching on the efficient compression and encoding in ORC.
Participants were super engaging and here are some of the questions in our Q&A.
We quite often hosts webinars like these and sign up below to get invitations to join one of our webinars.
Here is the full recording of the webinar. Enjoy!