Big Data Trendz

Posts

Showing posts from August, 2018

How-to: Use Parquet with Impala, Hive, Pig, and MapReduce

- August 20, 2018

Source: Cloudera Blog The CDH software stack lets you use your tool of choice with the Parquet file format – – offering the benefits of columnar storage at each phase of data processing. An open source project co-founded by Twitter and Cloudera, Parquet was designed from the ground up as a state-of-the-art, general-purpose, columnar file format for the Apache Hadoop ecosystem. In particular, Parquet has several features that make it highly suited to use with Cloudera Impala for data warehouse-style operations: Columnar storage layout: A query can examine and perform calculations on all values for a column while reading only a small fraction of the data from a data file or table. Flexible compression options: The data can be compressed with any of several codecs. Different data files can be compressed differently. The compression is transparent to applications that read the data files. Innovative encoding schemes: Sequences of ide...

Search This Blog

Big Data Trendz

UPDATES

Posts

How-to: Use Parquet with Impala, Hive, Pig, and MapReduce

Big Data Trendz