MySQL setup and tutorial

let's get some data and start playing with it!

first, we'll play with some tennis data from here.

mkdir tennis
cd tennis
mkdir data
cd data
wget http://archive.ics.uci.edu/ml/machine-learning-databases/00300/Tennis-Major-Tournaments-Match-Statistics.zip

To unzip it, sudo apt-get install unzip, then unzip Tennis-Major-Tournaments-Match-Statistics.zip

Data: acquired.

Before we get into the sql client, open up the files and poke around.

Getting the data into mysql database

Your mysql server should still be running (it automatically started itself when you installed it), so type the following at the command line to join back in:

mysql --local-infile=1 -u root -p

(and then hit enter) (and then enter your mysql root password).

You only need to use the --local-infile=1 flag if you are going to put data into the database using a local file, (like we are about to do).

Now that we are in the mysql client, we can see what databases exist SHOW DATABASES; -- at this point there should not be anything interesting.

We create our database with CREATE DATABASE tennis; -- if we use SHOW TABLES;, we see there are not yet any tables in the database. So let's create one.

creating a mysql database.

Step 1: preparing the columns

CREATE TABLE  aus_ladies_2013(
      player1 VARCHAR(255),
      player2 VARCHAR(255),
      round INT,
      result INT,
      fnl1 DOUBLE,
      fnl2 DOUBLE,
      fsp_1 DOUBLE,
      fsw_1 DOUBLE,
      ssp_1 DOUBLE,
      ssw_1 DOUBLE,
      ace_1 INT,
      dbf_1 INT,
      wnr_1 INT,
      ufe_1 INT,
      bpc_1 INT,
      bpw_1 INT,
      npa_1 INT,
      npw_1 INT,
      tpw_1 INT,
      st1_1 INT,
      st2_1 INT,
      st3_1 INT,
      st4_1 INT,
      st5_1 INT,
      fsp_2 DOUBLE,
      fsw_2 DOUBLE,
      ssp_2 DOUBLE,
      ssw_2 DOUBLE,
      ace_2 INT,
      dbf_2 INT,
      wnr_2 INT,
      ufe_2 INT,
      bpc_2 INT,
      bpw_2 INT,
      npa_2 INT,
      npw_2 INT,
      tpw_2 INT,
      st1_2 INT,
      st2_2 INT,
      st3_2 INT,
      st4_2 INT,
      st5_2 INT);

Step 2: filling the table with data (back to mysql)

LOAD DATA LOCAL INFILE 'AusOpen-women-2013.csv' INTO TABLE
aus_ladies_2013 FIELDS TERMINATED BY ',' IGNORE 1 LINES;

The bit FIELDS TERMINATED BY tells us that the data is comma-separated, and IGNORE 1 LINES has us skip the header row and not include it as data.

Fun fact: your mysql command history is saved in your home directory in a hidden file called .mysql_history

Mysql tutorial continued

use tennis;
SHOW TABLES;

Load the other tables

sql CREATE TABLE french_men_2013( player1 VARCHAR(255), player2 VARCHAR(255), round INT, result INT, fnl1 DOUBLE, fnl2 DOUBLE, fsp_1 DOUBLE, fsw_1 DOUBLE, ssp_1 DOUBLE, ssw_1 DOUBLE, ace_1 INT, dbf_1 INT, wnr_1 INT, ufe_1 INT, bpc_1 INT, bpw_1 INT, npa_1 INT, npw_1 INT, tpw_1 INT, st1_1 INT, st2_1 INT, st3_1 INT, st4_1 INT, st5_1 INT, fsp_2 DOUBLE, fsw_2 DOUBLE, ssp_2 DOUBLE, ssw_2 DOUBLE, ace_2 INT, dbf_2 INT, wnr_2 INT, ufe_2 INT, bpc_2 INT, bpw_2 INT, npa_2 INT, npw_2 INT, tpw_2 INT, st1_2 INT, st2_2 INT, st3_2 INT, st4_2 INT, st5_2 INT); LOAD DATA LOCAL INFILE 'FrenchOpen-men-2013.csv' INTO TABLE french_men_2013 FIELDS TERMINATED BY "," IGNORE 1 LINES; SELECT * FROM french_men_2013 LIMIT 5; SELECT * FROM french_men_2013 LIMIT 5 \G SHOW TABLES; sql CREATE TABLE us_men_2013 LIKE french_men_2013; CREATE TABLE wimbledon_men_2013 LIKE us_men_2013; LOAD DATA LOCAL INFILE 'USOpen-men-2013.csv' INTO TABLE us_men_2013 FIELDS TERMINATED BY "," IGNORE 1 LINES; LOAD DATA LOCAL INFILE 'Wimbledon-men-2013.csv' INTO TABLE wimbledon_men_2013 FIELDS TERMINATED BY "," IGNORE 1 LINES; SHOW TABLES; DESCRIBE wimbledon_men_2013; SELECT * FROM wimbledon_men_2013 LIMIT 5; #### Let's look around a bit

SELECT player1, player2, result FROM us_men_2013 LIMIT 5;
SELECT player1, result FROM us_men_2013 WHERE player1="Richard Gasquet";
SELECT player1, player2, result FROM us_men_2013 WHERE player1="Richard Gasquet";
SELECT player1, player2, result FROM us_men_2013 WHERE player1="Richard Gasquet" OR player2="Richard Gasquet";
SELECT COUNT(*) FROM us_men_2013;
SELECT player1, COUNT(*) FROM us_men_2013 GROUP BY player1;
SELECT player1, AVG(result) FROM us_men_2013 GROUP BY player1;
SELECT player1, player2, result FROM us_men_2013 WHERE result=1 LIMIT 5;
SELECT COUNT(*) FROM us_men_2013 WHERE result=1;
SELECT player1, player2, result FROM french_men_2013 WHERE result=1 LIMIT 5;
SELECT us_men_2013.player1, us_men_2013.tpw_1, french_men_2013.tpw_1 FROM us_men_2013, french_men_2013 WHERE us_men_2013.player1 = french_men_2013.player1;
SELECT us_men_2013.player1, us_men_2013.tpw_1, french_men_2013.tpw_1 FROM us_men_2013, french_men_2013 WHERE us_men_2013.player1 = french_men_2013.player1 GROUP BY us_men_2013.player1;
SELECT us_men_2013.player1, us_men_2013.tpw_1, french_men_2013.tpw_1, us_men_2013.tpw_1 + french_men_2013.tpw_1 AS total_points FROM us_men_2013, french_men_2013 WHERE us_men_2013.player1 = french_men_2013.player1 GROUP BY us_men_2013.player1;

Python bindings

How can I access all this through python? Using the MySQLdb module, you can execute any query and get the results in python to do whatever you want to them (put it in pandas, throw it into scikit.learn algorithms, etc.)

import MySQLdb
import credentials

db = MySQLdb.connect(host="localhost",
                     user=credentials.user,
                     passwd=credentials.password,
                     db='tennis')

cursor = db.cursor()
cursor.execute("SELECT player1, player2, round, result FROM us_men_2013")

for row in cursor.fetchall():
    print row

By the way, credentials.py is just a local file I created that looks like this:

# credentials.py
user="bornwild"
password = "yolo"

Here is how you would use the power of python to manipulate sql queries:

for player, ace in [('player1','ace_1'), ('player2','ace_2')]:
    print 'PLAYER %s' % player
    cursor.execute('SELECT %s, %s FROM us_men_2013 '
                   'WHERE %s > (SELECT (AVG(ace_1)+AVG(ace_2))/2.0 FROM us_men_2013)' % (player, ace, ace))

    for row in cursor.fetchall():
        print row

Done? Move on to the SQL challenges