Why does Hadoop uses KV (Key/Value) pairs?

- October 16, 2013

Hadoop implements the MapReduce paradigm as mentioned in the original Google MapReduce Paper in terms of key/value pairs. The output types of the Map should match the input types of the Reduce as shown below.
(K1,V1) -> Map -> (K2,V2)
(K2, V2) -> Reduce -> (K3,V3)

The big question is why use key/value pairs?
MapReduce is derived from the concepts of functional programming. Here is a tutorial on the map/fold primitives in the functional programming which are used in the MapReduce paradigm and there is no mention of key/value pairs anywhere.
According to the Google MapReduce Paper

We realized that most of our computations involved applying a map operation to each logical “record” in our input in order to compute a set of intermediate key/value pairs, and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately.

The only reason I could see why Hadoop used key/value pairs is that the Google Paper had key/value pairs to meet their requirements and the same had been implemented in Hadoop. And everyone is trying to fit the the problem space in the key/value pairs.
Here is an interesting article on using tuples in MapReduce.

Search This Blog

Big Data Trendz

UPDATES

Why does Hadoop uses KV (Key/Value) pairs?

Comments

Post a Comment

Popular posts from this blog

Introduction to HDFS Erasure Coding in Apache Hadoop

3X FASTER INTERACTIVE QUERY WITH APACHE HIVE LLAP

Cloudera Data Hub: Where Agility Meets Control

Big Data Trendz