9:00 am: Morning!
9:15 am: Git branches
9:30 am: Hive!
10:30 am: Work
12:00 pm: Lunch
1:30 pm: Work
2:00 pm: Headshots. Strike a beauty pose.
3:00 pm: Work
5:00 pm: Chris Johnson, guest speaker
6:00 pm: Nothing
6:30 pm: (Optional) Women in Machine Learning meetup
Hive Setup and Tutorial
Upload the AllstarFull, Appearences, TeamFranchises and Batting tables to Hive.
For each year (after 1985), calculate the average salary of all players that year. Then for each year, calculate the average salary of all star players. Save these outputs in two files in HDFS. To record query results into a file, you can do
INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT ........; (The ...... after SELECT is whatever your query is, and the '/path/to/output/dir' is where you want the output to go in hdfs).
For the years 2000, 2005 and 2010, calculate the average salary of New York Yankees, New York Mets, Chicago Cubs and Chicago White Sox in each of these years. Also calculate the total salary for these teams -- their salary budget.
In the history of baseball, who has the record for most home runs in a single year? Who are the top 10 for this statistic?