Posts Tagged ‘Programming’

Calculating NCAA March Madness Bracket using Chess ELO predictive scoring

March 13th, 2012

Hokay, So I had a lot of fun with this. Let me start by saying I’m not the first to do this.  However, after a lot of Googling, I found surprisingly few NCAA Bracket predictions using the ELO system. Those that I did weren’t transparent about the data they used.  I wanted to do it so I could see the result with data I knew, and as a good excuse to code some Ruby.

Overview

First, the ELO system. The ELO system is a way of calculating the relative skill between two players (and thus a probability for one to win in a future match). Wikipedia has an excellent write-up including the history and the math behind the scoring.  In a nutshell, it calculates an expected result based on the rankings of the two teams.  It then compares the actual result to the expected one, and adjusts each player’s rank accordingly (increasing it for the winner, subtracting for the looser).  If a favored team wins, the adjustment is small.  If an underdog wins, the adjustment is larger.

Assumptions

Your conclusion is only as good as your assumptions, and we’ll we need to make a few.  Most of the work is done by choosing the ELO system, it’s one of the simpler systems for relative rankings.  For our data, we’re only interested in what two teams played, and which team won.  We ignore the final score, whether traveling or home, players used, fouls, timeouts, point distribution, etc.  Also, for the purposes of this calculation, if the game went into overtime, I count it as a tie.  That’s probably the most debatable assumption, but I feel it’s valid because it essentially means after an hour, the two teams displayed equal skill.

Data

So this turned out to be the hardest part. I wanted to use the 2011-2012 season as my dataset. After a half-hour of Googling, I couldn’t find the data in a well-structured format (read: csv or xls).  So I had to resort to web scraping.

The best website I could find was the official NCAA site.  They have a page with the Men’s Division 1 listing by team, where you can click into each team, to see a game history (amongst other things). Let’s grab it.

wget --mirror "http://stats.ncaa.org/team/
             inst_team_list?sport_code=MBB&division=1"

Well that was fun.  wget was a little overzealous, so I moved all the relevent pages (those starting with 10740) into their own folder.  I then wrote a Ruby script to organize the data, clean it up, and write it to a file.

a

The output from that script is a beautifully structured file, if I do say so myself.  Well, at least from a data perspective.

a

Number Crunching

Okay, so now it’s time to actually calculate the elos.  I basically wrote a straight implementation of the math as presented on WikipedidaThe second ruby script, reads in the scores, calculates the adjustments, and keeps track of the changes.

Here the output while it’s running:

a

lastly, it sorts the results and writes them to a file.

Conclusions

Here it is:

Finall ELO Predicted Bracket

a

We can see that the comparing our generated results to the seeded rankings, there’s a lot of overlap.  The top three teams are predicted exactly as seeded.  However, from there the list diverges quite a bit. For example, Murray St. is expected to take the west, but didn’t get seeded so hot.

So, if this wins, I’ll get some money from our office bracket pool. Which is nice. And if it doesn’t, it will be proof that my computer messed up on the calculation.

—–EDIT——

My initial calculations didn’t account for the order in which the games were played.  Although I didn’t think this would have a big influence, running the script on a computer that lists the data files in a different order actually made some big differences.  Thus, I changed the data scraping script to account for the dates, and calculate all ELO scores in the order that they were played.  This should result in a more accurate, and reproducable result.  Here is the updated script, and the updated final result.  Here’s my Final Bracket.

Dynamic Display of the Alphabet with PHP

December 23rd, 2008

Here’s a neat trick I recently used: Say you want to the display the alphabet on your web page.  The most likely scenario being for paging links to organize a directory of people or businesses. PHP has a chr() function, which displays the ASCII character for any given integer.

Rather than looping through an array with 26 values, or worse yet, typing out 26 lines of code, just loop through the display code 26 times.

<?php
for ($i=65; $i<=90; $i++) {
 echo chr($i);
}
?>

For those not familiar with ASCII mappings, values 65-90 represent the uppercase letters A-Z. Alternately, you could use the values 97-122 for lowercase a-z.  If you wanted to mix the two (say to display uppercase, but use lowercase in the link) just use the strtoupper() or strtolower() functions inside the loop.  Here’s a more applicable sample:

<?php
for ($i=97; $i<=122; $i++) {
 $x = chr($i);
 echo '<a href="memberlist.php?alpha=' . $x . '>' . strtoupper($x) . '</a>';
}
?>

You can see an example of both applied here.

Coding Music

December 10th, 2008

Music is a big part of programming.  Nothing gets me zen faster than immersion in a good back beat.  As such, I’d like to pass on a few genres and artists that I think stand out as being conducive to cranking out code.

Trance:

Eric Jordan

Eric Jordan

By far, I spend the most time listening to Eric Jordan.  His trance mixes are stunning.  They are always creative, drawing from his vast knowlege of obscure, but impressive tracks.  He is a master at creating a mood, evoking emotion, and pulling the listener in.  Every month he posts a new mix on his website: neverrain.com, available for free to download.  His mixes tend to have more subtle melodies, and less vocals, so I find them extra conducive to zoning out and programming complex algorithms.

asd

Vocal Trance:

DJ GT

DJ GT

DJ GT takes a similar approach, creating hour long mixes and posting them for free on his website, generationtrance.com. He is equally talented, although his track lists tend to be slightly more mainstream.  All of the songs he uses contain lyrics, which makes the mixes a little more structured and digestable for those not as used to electronic music.  I’ll throw one of his many tracks in the queue when I want something more upbeat to tap my feet to.

asd

Psycadelic Trance:

Shpongle

Shpongle

Shpongle is group out of the UK that defies description.  Generally, I’d label it Psy Trance, but you’ll find heavy influences of world beat, classical, opera, jazz, ska, punk, dub, and half a dozen other genres.  It all adds up to 100% awesome.  Lyrics in their music are not arranged into verses to tell a story, but rather sampled to become part of the ambience and reinforce the mood.  My only wish is that they had more than three albums.

asd

Dub:

O.T.T.

OTT

If I had to describe Dub to someone who hadn’t heard it, I’d label it the offspring of Reggae and Trance.  It’s characterized by a slower tempo than most electronica, with a heavier bassline and more emphasis on the ambience of the music.  Ott’s first album Blumenkraft stands out as my favorite mix of any artist mentioned here.  It’s simply the most powerful weapon against the drone of office background noise I’ve found.  I save it for when I need to write that recursive function I’ve been putting off all week.

asd

Industrial:

Front Line Assembly

Front Line Assembly

Two of my favorite artists are Bill Leeb and Rhys Fulber.  They’ve been making electronic music for two decades, and they’ve made some of the best.  They’ve gone under several different names, the most popular being Delerium and Front Line Assembly.  Delerium, like Shpongle, defies description and has gone through several distinct transitions through the years.  I own and love the entire collection, but the music I find best to program to however, are their oldest albums, found here and here.  Each of those are a two CD set that compiles all the their earliest releases which are (sadly) no longer available.  The music is very experimental, and it laid the groundwork for the industrial genre.

Well, that concludes the tour of my favorite programming music.  I hope you enjoyed the trip; be careful opening the overhead bins, as items may have shifted in flight.  I’m always open for recommendataion, so if you hear a good tune that makes you stop and close your eyes to listen, let me know.

PHP Month View Calendar

December 3rd, 2008

I recently had to develop a month view Calendar for a website I’m building.  While such a thing is very common, it presents a number of twists:

  • You can’t start it on the first of the month – The first will likely fall mid-week
  • To find the acutual first day of the week, you need to know how many days are in the previous month and count backwards
  • You cannot assume 4 weeks per month -most have 5 or 6
  • you will need to use the correct number of days in the month, and then add on the correct number of days of the next month to finish the week

After playing with a few algorithms, I decided to represent the month with a 2-dimensional array, one index for each week, and another for each day.  Then, you can put the actual string date in each value, or perhaps an array with that day’s events.  Without further ado:

<?php
//CreateMonthView -
//Takes one parameter: a unix timestamp anywhere in the month
function CreateMonthView( $now ) {

    //get numberic day of month (01-31)
    $dayOfMonth = strftime('%d', $now);
    //subtract as approptriate to get to start of month
    $monthStart = $now - (86400 * ($dayOfMonth -1));

    //get numeric day of week (0-6)
    $dayOfWeek = strftime('%w', $monthStart);

    //subtract appropriate number of days to get to the start of the week
    //this will usually be the last part of the previous month
    $calMonthStart = $monthStart - (86400 * $dayOfWeek );

    //initialize variables for while loop
    $thisWeekStart = $calMonthStart;
    $week = 1;
    $monthArray = array();

    //last day of month - text condition for while loop
    $lastDayOfMonth = mktime(23, 59, 59,
                             date("m", $now),
                             date("t", $now),
                             date("Y", $now));

    //foreach week, create a new array to hold the days
    while( $thisWeekStart <= $lastDayOfMonth ) {
        $monthArray[$week] = array();

        //iterate through week - adding each day as a value
        for( $i=0; $i<7; $i++) {
            //get timestamp for each day
            $dayOfWeek = $thisWeekStart + 86400 * $i;
            //convert to a and ISO date - seconds are too precise
            $date = date('Y-m-d',$dayOfWeek);

            //each day will be the value in the array
            $monthArray[$week][] = $date;
        }

        //increment sentinal variable and week counter
        $thisWeekStart = $dayOfWeek + 86400;
        $week++;
    }

    return $monthArray;
}
?>

Now you may be saying to yourself “okay, that’s all fine and good, but it doesn’t do anything on it’s own”, and you’d be right, it’s just a function.  To make something happen, you simply need to call it, and display the result.  Here is another snippet that does just that:

<?php
$month = CreateMonthView( mktime() ); //create the current month

echo '<table>';
foreach( $month as $week ) {
    echo '<tr>';

    foreach ($week as $day ) {
        if( $day == date('Y-m-d') ){
            //apply selector to distinguish today's date
            echo '<td class="today">';
        } else {
            echo '<td>';
        }
        //reduce the complete ISO date down to the day and display it
        echo substr($day, 8, 2);
        echo '</td>';
    }
    echo "</tr>\n";
}
echo '</table>';
?>

Viola! You can certainly copy and paste this code as is, but it’s not very visually exciting. Here’s an example. It could definitely use some styling – but I’ll leave that up to you.

Quick Hash Generator

December 2nd, 2008

I often find myself quickly needing the 1-way encrypted value (hash) of a string. Most often, it’s to securely store an individual password into a database or script. Rather than temporarily hard-coding the plain text to see what PHP will generate, I wrote a little reusable script to generate hash values. Anyone who might find it useful can find it here. If you’re paranoid about security, you can find a secure version here, although you may get a warning about my self-signed certificate.

Get a Random Row from Database

November 29th, 2008

I had an app the other day that required a single random row to be displayed from the database. It’s an easy task, but I tripped on it for a second. Here was my impulse:

//Get the total number of rows from the MySQL database
$query = "SELECT COUNT(*) as total FROM table";
$result = mysql_query( $query);
$rows = mysql_fetch_assoc( $result );

//Let PHP choose a random row
//pick a number between 1 & the total
$row = rand(1, $rows['total'])

//Now get that row
$query = "SELECT * FROM table LIMIT($row, 1)";
$result = mysql_query( $query);

After typing that out, I realized that I had take the long way round the problem. Very similar to taking the Misty Mountain Path rather than the Mines of Moria. Here’s a better way:

//Get a random row from the database
$query = "SELECT * FROM table ORDER BY RAND() LIMIT 1";
$result = mysql_query( $query );

Much better. Gimley the dwarf would be proud