McNulty // w5d1

Winter 2015
02/09/2015

Planned schedule and activities

9:00 am: Arrivals

9:15 am: Error Analysis and Tree/Forest Challenges

10:15 am: SVMs

11:00 am: Challenges + Work on McNulty

12:00pm: Lunges

1:30pm: Work on McNulty

5:00pm: Departures

Lecture Notes

w5d1_SVMs.pdf (2.5 MB)

Reading

SVM math
A tutorial on SVMs
Another tutorial on SVMs

An Idiot's Guide to SVMs
SVM lecture
How to tune SVM Parameters
Preprocessing data in sklearn
SVMs in sklearn
RBF Kernel

Error Analysis Challenges

We will go back to the original Supervised Learning Challenges.

Challenge 1

For the house representatives data set, calculate the accuracy, precision, recall and f1 scores of each classifier you built (on the test set).

Challenge 2

For each, draw the ROC curve and calculate the AUC.

Challenge 3

Calculate the same metrics you did in challenge 1, but this time in a cross validation scheme with the cross_val_score function (like in Challenge 9)

Challenge 4

For your movie classifiers, calculate the precision and recall for each class.

Challenge 5

Draw the ROC curve (and calculate AUC) for the logistic regression classifier from challenge 12

Installing pydot for the Tree challenges:

Note: Uninstall pydot if you already installed it but it's not working

pip uninstall pydot

Otherwise, you can start here:

pip uninstall pyparsing

pip install -Iv
https://pypi.python.org/packages/source/p/pyparsing/pyparsing-1.5.7.tar.gz#md5=9be0fcdcc595199c646ab317c1d9a709

pip install pydot

brew install graphviz

Note: If you're trying to draw a tree and you get an error about not finding dot_parser

Try the following and it should be fixed:

 pip install pyparsing==1.5.7

Tree / Forest Challenges

Challenge 1

For the house representatives data set, fit and plot a decision tree classifier

Challenge 2

Fit and draw a decision tree classifier for your movie dataset

Tackle the Titanic Survivors kaggle competition with decision trees. Look at your splits, how does your tree decide?