Hadoop Research Tips
Those who are interested to work on Hadoop, One commonly asked question that I got from these people is what Hadoop feature can I work on? Here are some items that I have in mind that are good topics for students to attempt if they want to work in Hadoop. Ability to make Hadoop scheduler resource aware, especially CPU, memory and IO resources. The current implementation is based on statically configured slots. Abilty to make a map-reduce job take new input splits even after a map-reduce job has already started. Ability to dynamically increase replicas of data in HDFS based on access patterns. This is needed to handle hot-spots of data. Ability to extend the map-reduce framework to be able to process data that resides partly in memory. One assumption of the current implementation is that the map-reduce framework is used to scan data that resides on disk devices. But memory on commodity machines is becoming larger and larger. A cluster of 3000 machines with ...