Luther // w2d2

Winter 2015
01/21/2015

Planned schedule and activities

9:00 am: Yo!

9:20 am: Unicode. Is it really a nightmare? Nah. Well maybe a little bit.

9:40 am: Pickling: Save your python objects, as easy as 1-2-3.

9:50 am: Short break

10:00 am: More Web Scraping: Drive your browser with Selenium, fill forms, click buttons, go crazy

10:30 am: Pandas Challenges: Learn pandas by getting your hands dirty with it (One on ones continue from here on as well)

12:00 pm: Hunger Games: Eating Frenzy

1:30 pm: Continue with Pandas challenges and scraping data

Tentative: Linear Regression packages in python
(depending on pandas challenge progression, we'll have this later today or tomorrow)

5:00 pm: The curtain falls

Lecture Notes

Python HOW TO tutorial for dealing with unicode
Unicode in Wikipedia

w2d2_Web_Scraping_2_Selenium_Webdriver.ipynb (9.5 KB) Here is the ipython notebook for webscraping with selenium webdriver. Open it and we will follow along (you won't be typing along)

To start with selenium, you need to install it:

pip install selenium

You will also need to put the chromedriver file in the same directory as the ipython notebook. Download the corresponding zip file (chromedriver_mac32.zip) for most of you, unzip it and move it to the same directory

Xpath tutorial

You can use this XPATH selector tutorial when you need to construct an xpath selector.
You can also look here.

Finding elements in Selenium

(source): There are various strategies to locate elements in a page. You can use the most appropriate one for your case. Selenium provides the following methods to locate elements in a page:

Pickling notebook:
w2d2_Pickling_Python_Objects.ipynb (2.5 KB)

Challenges: Exploring and visualizing our scraped movie data in pandas

Pandas: just getting started? read this guide - [10 minutes to pandas] (http://pandas.pydata.org/pandas-docs/stable/10min.html)

You don't have movie data to work on yet?
Here, you can use some data on the top grossing 100 movies from 2013 (scraped from box office mojo):
2013_movies.csv (7.6 KB)