Kojak // w9d2

Winter 2015
03/10/2015

Planned schedule and activities

9:00 am: Good morning!

9:30 am: Intro to Hadoop and MapReduce

10:30 am: Set up hadoop on your own DigitalOcean droplets:

11:30 am: Simple Map and Reduce on the command line

12:00 pm: Lunch

1:30 pm: Hadoop tutorial

5:00 pm: Stop Work

Challenge

By now you will have done word counts on the 3 Gutenberg texts. Now, instead of word counts, compute the tf-idf for all words in these texts.

Reading

Hadoop: the Definitive Guide (3rd Ed.)
Good book by Apache Hadoop contributor Tom White what is apache?
what is nutch?
TED Talk (Peter Diamandis 90% of data 2 years)
practical guide "Hadoop in my IT department: How to plan a cluster?"
Great article comparing DFS's (GFS, HDFS, Amazon Dynamo Microsoft Azure)
big data article 1: the hard sell. "Addressing 5 objections to big data"
big data article 2: the singularity is coming, a brief take by a creepy big data optimist

Other Slide Decks