Kojak // w9d4

Winter 2015

Planned schedule and activities

9:00 am: Morning!

9:15 am: Git branches

9:30 am: Hive!

10:30 am: Work

12:00 pm: Lunch

1:30 pm: Work

2:00 pm: Headshots. Strike a beauty pose.

3:00 pm: Work

5:00 pm: Chris Johnson, guest speaker

6:00 pm: Nothing

6:30 pm: (Optional) Women in Machine Learning meetup

Lecture Notes

Hive Setup and Tutorial

Hive Challenges

Challenge 1

Upload the AllstarFull, Appearences, TeamFranchises and Batting tables to Hive.

Challenge 2

For each year (after 1985), calculate the average salary of all players that year. Then for each year, calculate the average salary of all star players. Save these outputs in two files in HDFS. To record query results into a file, you can do INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT ........; (The ...... after SELECT is whatever your query is, and the '/path/to/output/dir' is where you want the output to go in hdfs).

Challenge 3

For the years 2000, 2005 and 2010, calculate the average salary of New York Yankees, New York Mets, Chicago Cubs and Chicago White Sox in each of these years. Also calculate the total salary for these teams -- their salary budget.

Challenge 4

In the history of baseball, who has the record for most home runs in a single year? Who are the top 10 for this statistic?