Why does Hadoop uses KV (Key/Value) pairs?

Hadoop implements the MapReduce paradigm as mentioned in the original Google MapReduce Paper in terms of key/value pairs. The output types of the Map should match the input types of the Reduce as shown below.
(K1,V1) -> Map -> (K2,V2)
(K2, V2) -> Reduce -> (K3,V3)


The big question is why use key/value pairs?
MapReduce is derived from the concepts of functional programming. Here is a tutorial on the map/fold primitives in the functional programming which are used in the MapReduce paradigm and there is no mention of key/value pairs anywhere.

According to the Google MapReduce Paper


We realized that most of our computations involved applying a map operation to each logical “record” in our input in order to compute a set of intermediate key/value pairs, and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately.


The only reason I could see why Hadoop used key/value pairs is that the Google Paper had key/value pairs to meet their requirements and the same had been implemented in Hadoop. And everyone is trying to fit the the problem space in the key/value pairs.
Here is an interesting article on using tuples in MapReduce.

Comments

Popular posts from this blog

3X FASTER INTERACTIVE QUERY WITH APACHE HIVE LLAP

Introduction to HDFS Erasure Coding in Apache Hadoop

Cloudera Data Hub: Where Agility Meets Control

Big Data Trendz