Calculating NCAA March Madness Bracket using Chess ELO predictive scoring

March 13th, 2012 by Peter Anselmo 6 comments »

Hokay, So I had a lot of fun with this. Let me start by saying I’m not the first to do this.  However, after a lot of Googling, I found surprisingly few NCAA Bracket predictions using the ELO system. Those that I did weren’t transparent about the data they used.  I wanted to do it so I could see the result with data I knew, and as a good excuse to code some Ruby.

Overview

First, the ELO system. The ELO system is a way of calculating the relative skill between two players (and thus a probability for one to win in a future match). Wikipedia has an excellent write-up including the history and the math behind the scoring.  In a nutshell, it calculates an expected result based on the rankings of the two teams.  It then compares the actual result to the expected one, and adjusts each player’s rank accordingly (increasing it for the winner, subtracting for the looser).  If a favored team wins, the adjustment is small.  If an underdog wins, the adjustment is larger.

Assumptions

Your conclusion is only as good as your assumptions, and we’ll we need to make a few.  Most of the work is done by choosing the ELO system, it’s one of the simpler systems for relative rankings.  For our data, we’re only interested in what two teams played, and which team won.  We ignore the final score, whether traveling or home, players used, fouls, timeouts, point distribution, etc.  Also, for the purposes of this calculation, if the game went into overtime, I count it as a tie.  That’s probably the most debatable assumption, but I feel it’s valid because it essentially means after an hour, the two teams displayed equal skill.

Data

So this turned out to be the hardest part. I wanted to use the 2011-2012 season as my dataset. After a half-hour of Googling, I couldn’t find the data in a well-structured format (read: csv or xls).  So I had to resort to web scraping.

The best website I could find was the official NCAA site.  They have a page with the Men’s Division 1 listing by team, where you can click into each team, to see a game history (amongst other things). Let’s grab it.

wget --mirror "http://stats.ncaa.org/team/
             inst_team_list?sport_code=MBB&division=1"

Well that was fun.  wget was a little overzealous, so I moved all the relevent pages (those starting with 10740) into their own folder.  I then wrote a Ruby script to organize the data, clean it up, and write it to a file.

a

The output from that script is a beautifully structured file, if I do say so myself.  Well, at least from a data perspective.

a

Number Crunching

Okay, so now it’s time to actually calculate the elos.  I basically wrote a straight implementation of the math as presented on WikipedidaThe second ruby script, reads in the scores, calculates the adjustments, and keeps track of the changes.

Here the output while it’s running:

a

lastly, it sorts the results and writes them to a file.

Conclusions

Here it is:

Finall ELO Predicted Bracket

a

We can see that the comparing our generated results to the seeded rankings, there’s a lot of overlap.  The top three teams are predicted exactly as seeded.  However, from there the list diverges quite a bit. For example, Murray St. is expected to take the west, but didn’t get seeded so hot.

So, if this wins, I’ll get some money from our office bracket pool. Which is nice. And if it doesn’t, it will be proof that my computer messed up on the calculation.

—–EDIT——

My initial calculations didn’t account for the order in which the games were played.  Although I didn’t think this would have a big influence, running the script on a computer that lists the data files in a different order actually made some big differences.  Thus, I changed the data scraping script to account for the dates, and calculate all ELO scores in the order that they were played.  This should result in a more accurate, and reproducable result.  Here is the updated script, and the updated final result.  Here’s my Final Bracket.

Fun with Mag Stripes – What’s on your card?

October 11th, 2011 by Peter Anselmo 2 comments »

I was speaking with a friend when he told me something interesting.  Apparently some hotels embed your personal information into your hotel room key-card.  Things like your name, and phone are written to the magnetic strip that you use to unlock your hotel door.  This is the same key-card that most people simply toss in the trash when they’re done with their stay.  Talk about a HUGE privacy hole!

In other news: you can buy a USB mag stripe reader – for cheap! I went ahead and picked on up, and it just arrived today:

Magnetic Strip Reader

a

Time to get Swiping!

Cards

a

Notes:
-For obvious reasons, I’ve substituted actual values with their meanings (CA => STATE)
-I’ve added brackets[] that were not present in the scan to group the information visually.
-I used a lowercase ‘d’ to stand for “digit” (a number)
-I used a lowercase “a” to mean “alphanumeric” (mixed letters and numbers)

First up, my CA Driver’s License.  Here’s what was embedded:

%[STATE][CITY]^[LASTNAME]$[FIRSTNAME]$[MIDDLE]^[ADDRESS]^?;
[WEIGHT][ddddd][DLNUMBER]=[dddddddddddd]?+!![ZIPCODE] [CLASS]
[SEX][HEIGHT][EYE][HAIR] [addddddddddd][aaaaaaaaa];<?

I suppose I was a bit suprised at just how much information was embedded.  I expected just the DL number.  Something interesting to note, that means that anyone who swipes your ID (say to buy alcohol or get into a nightclub) can store all of your personal information including your address, height and eye color!

I decided to round up every card I could find with a Mag Strip.  Here’s a few results:

UCD ID Card:

:[IDNUMBER]=0?

UCD Gym Card:

%[ddd]^[STUDENTID]^[FIRST]^[LAST]?;[EXPDATE]?

IKEA Gift Card:

;[ddddddddddddddddddd]?

Safeway Club Card:

;[dddd][ACCOUNTNUM]=[dddd]?

AAA Member Card:

%B[ACCOUNTNUMBER]^[LAST]/[FIRST]^[ddddddddddddddddddddd][EXPDATE][dddd]?;
[d][ACCOUNTNUMBER]=[dddddddddddddddddddd]?

And some Finance Cards:

Wells Debit Card:

%[a][ACCOUNTNUMBER]^[LAST]/[FIRST] [MI]^[EXPDATE][ddddddddddddddd] [ddddddddddd]?;
[ACCOUNTNUMBER]=[EXPDATE][ddddddddddddd]?

REI Visa:

%[d][ACCOUNTNUMBER]^[LAST]/[FIRST] [MI] ^[EXPDATE][ddddddddddddddddddddddddddd]?;
[ACCOUNTNUMBER]=[EXPDATE][dddddddddddddddd]?+==[REIMEMBER]=?

 Conclusions:
Although I didn’t have any hotel cards around to test the original claim, the mag stipes of the cards I did have were interesting nonetheless.  Club cards tend to be well-behaved and only showed your account number amongst other digits (what I suspect are store codes and such).  Financial cards have plenty of sensitive data in the stripe, but that’s no suprise.  I’ll keep the reader around, and I’ll update this post if I come across any cards with overly-sensitive data embedded.

How to get your car registered in Maryland (in 24 easy steps)

June 8th, 2011 by Peter Anselmo 1 comment »

1. Visit the MVA (Maryland’s special name for the DMV), find you need an Inspection, the Title, and Forms filled out.

2. Fill out Forms.

3. Call Lender, request Title.

4. Get car inspected.

5. Fail inspection because there’s a tiny sub-bulb of the headlight out, windshield wipers are worn, and the windows are tinted.

6. Fix headlight and windshield wipers, need to have MVA police officer inspect tint.

7. Lender mails notice, saying DMV has Title and will send it within 21 business days.

8. (On a Friday)Visit MVA to have officer inspect tint, find out police officers are only on duty Wed & Thurs 8:00 – 12:00.

9. (On a Thursday) Visit MVA, officer measures tint. Too dark, must remove.

10. Go to tint shop, pay to have tint removed.

11. (On a Thursday) Visit MVA, officers are now only on duty Wed 8:00-12:00.

12. (The following Wednesday) Visit MVA, officer approves lack of tint.

13. Go back to inspection place; car needs all new inspection because it’s been more than 30 days.  Bring back when you have time to wait.

14. Get car inspected.

15. Pass inspection.

16. Call up the CA DMV and ask them why it’s taking so friggin’ long to get Title.  They say it’s in process.

17. Receive Title in mail, ~50 days after requested.

18. Go to MVA with Title and inspection certificate in hand.  Wait in two lines.  Find out you need proof of Maryland insurance.  CA insurance doesn’t cut it.

19.  Call insurance, transfer isn’t simple, They need more info and signed docs.  Leave the MVA.

20. Set up new Insurance.

21. Cancel old insurance.

22. Re-fill out forms.

23. Visit MVA with Title, Inspection, Insurance and Forms.

24. GET MARYLAND LICENSE PLATES & REGISTRATION!!!

a

Re-Ordering select list elements with jQuery

February 21st, 2011 by Peter Anselmo No comments »

So, I came across an interesting development problem.  I was embedding a ‘select’ list from a 3rd party web service, and I wanted to re-order the items shown.  More specifically, I wanted the most commonly selected elements to be at the top of the list.  Because I couldn’t access the raw HTML, directly changing the order was not an option.  jQuery to the rescue!  Here’s a small snippet I came up with that did just the trick:

$('option[value='myVal']').detach().prepend('select.mySelect');

Viola!  Way easy!

Concrete5 vs WordPress: Benchmarking Load Time

December 14th, 2010 by Peter Anselmo 7 comments »

I just discovered Concrete5 CMS recently when another developer in my area launched a site with it.  Always up for learning something new, I went to the website, read the sales pitch, and decided to give it a whirl.  Before I spend time learning yet another CMS, for kicks I thought I would benchmark it for speed against WordPress, my current go-to solution.  Here we go.

Preliminaries
Hardware: All tests will be conducted on my Desktop Computer; a custom built PC with a 3.5Ghz Core2 Duo, 4GB of Ram, and a 10,000 RPM hard drive.  Not identical, but similar to many server setups on the market.

Software: I’m running Ubuntu 10.04 with Apache 2.2, PHP 5.3, and MySQL 5.1.  Once again, apart from using a Desktop OS, this is almost identical to your usual LAMP server software. For benchmarking, I will be using Siege 2.68.

Step 1: Fresh Installs
I downloaded and installed the latest version of Concrete5 (5.4.1.1) and WordPress (3.03)  Here are the screenshots of the home pages out of the box:

WordPress Fresh Install

Concrete5 Fresh Install

a

Step 2: Balancing
Okay, First thing, to be fair we need to balance the page weights.  Siege will not load all the linked resources (like CSS and Javscript), I actually only care about the html page weights.  Out of the box Concrete is 1771 bytes and WordPress is 2015 bytes.  Pretty close.  After removing several widgets from the WordPress sidebar (Those extra queries weren’t fair anyway) and adding the right amount of Lorem Ipsum, the WordPress page is now exactly 1771 bytes as well.  Perfect.

Step 3: Attack!
To stress test my desktop server I am using the following siege command:

 siege -c 50 -r 40 http://localhost/[siteurl]

This will attempt to make 50 concurrent requests to the website, and will repeat each request 40 times.  This is a total of 2000 requests to each site.

Step 4: Results
Here is the raw data from the tests:

WordPress Concrete5
Total Requests: 2000 2000
Average Response Time (seconds): 2.52 1.62
Transactions per second: 16.23 22.46
Longest Transaction (seconds): 4.8 3.13
Shortest Transaction (seconds): .10 .07
Elapsed Time 123.21 89.04

And the corresponding screenshots:

Results for WordPress

Results for Concrete5

a

Step 5: Conclusions
As we can see from the data Concrete5 outperformed WordPress by 20-30% in every measure. This is a significant amount.  What does this tell us?  For small sites with little content, Concrete5 will scale to additional concurrent users better than WordPress. What does this not tell us?  For one thing, the sites may not scale to additional content equally well.  This test also ignored all the static content which will download from either CMS with equal speed.  Finally, WordPress also has some excellent caching plugins that may have closed the gap.

Other Considerations
Am I suggesting you ditch WordPress and port your sites to Concrete?  Not at all. WordPress has many good things going for it; it is the #1 blogging tool and is a finely tuned engine that powers millions of websites.  What I CAN say for sure, is that if you can outperform that, then you’re doing something right.  Hats off to the team at Concrete.

——————————————————-

Update (1 Day Later)
Okay, After posting this, I had a lot of people point out to me that this is not a fair fight.  Concrete has caching enabled by default, and WordPress does not.   After installing wp-super-cache, the elapsed time for 2000 requests from WordPress fell from 123 seconds to 27 seconds.  Wow, crazy plugin.  Either way, my original results stand when you consider out of the box performance.  Wp-super-cache is neither bundled with WordPress nor do it’s version numbers suggest it is stable.  It’s technical options are overwhelming to all but advanced users.  Kudos still go to Concrete5 for integrating a simple, stable caching system into the framework.

Code Complete

December 12th, 2010 by Peter Anselmo No comments »

Whew!  This book took a little while to get through.  That was partly because it’s a healthy 862 pages, and partly because I became addicted to StarCraft II halfway through reading it 😉  Nonetheless, it is an awesome book.  Five out of five stars.  Jeff Atwood classifies this book as “The Joy of Cooking for software developers.” I would have to agree.  If there was such a thing as a modern programming Bible, this is it.

This book reads like a how-to for all aspects of software development.  It begins with preliminary things like gathering specifications, defining prerequisites, and planning.  It moves through very high-level decisions such as choice of language for a project, and how much project infrastructure to use (build tools, version control, automated testing, etc).  It covers estimation, and different techniques for planning the development and integration of large projects.

The author then moves through high level decisions within your code.  Things like class structure, interfaces, and guidelines for creating routines are covered.  He weighs pros and cons of breaking routines into sub-routines, choosing parameters to pass effectively, and vital concepts such as abstraction and encapsulation.  There is a very informative chapter on defensive programming that covers topics such as error-handling, assertions, exceptions, and debugging.

The bulk of the book is on the nitty-gritty of writing code.  Variables are covered in depth. He thoroughly covers topics such as variable scope, initialization, placement, and persistence.  He devotes an entire chapter to variable names.  Following variables is an excellent discussion of data types.  Moving right along, conditionals, loops, and other control structures are covered in great detail.

Next McConnell takes the reader through various other aspects of programming, including collaboration, testing, debugging, refactoring, and code tuning.  Finally, the book addresses topics such as the effect of project size on construction, using programming tools effectively, code documentation, and the considerations of programmer personalities themselves.

What stuck out most to me about this book (and what makes it stand out from the crowd) is the author’s effective merging of both professional experience and academic knowledge.  For things like variable names: he won’t just tell you a variable name should be between 7 and 20 letters, he’ll cite a study showing that code using variables of that length were shown to have fewer bugs.  The book is filled with pieces of “hard data” that are usually a result of a formal study that confirms what he has learned through experience.  McConnell’s bibliography is extensive, and his command of sources is impressive.

Also, what impressed me about this book is how the small topics reinforce larger themes.  One of the primary tenets of good software construction (and one that is thoroughly covered here)  is managing complexity.  Concepts such as good class structure, good routine names, and good commenting are not effective for their own sake.  They are effective because they make code less complex and easier to understand.  Programs that are broken down into smaller, less complex pieces are programs that are easier to develop, debug, and maintain.

All in all, I would strongly recommend this book to any software developer.  Even if you don’t feel like reading through a 2″ thick book with thin pages and no pictures, you can jump to any particular topic of interest and absorb what tidbit you need.  The book is also structured well enough to serve as a reference.  It will likely serve as such for me until a year or two from now when I plan on re-reading it.  It’s that good.

Online Flashcards

May 29th, 2010 by Peter Anselmo 1 comment »

I’ve stared a new side project – an online flashcard site.  This has stemmed from four reasons:

1. There are currently (that I could find) no smart flashcard sites.  When I use flashcards I don’t just look at the front and back from start to finish.  This is how all existing sites online work.  I want to remove cards I know as I go, swap the front & back (starting with definitions), shuffle them, and much more.  This site will do all that.

2. There are no multi-platform flashcard sites.  I want to create the cards on my desktop computer, and be able to browse them later on my smartphone.  This site will have an iPhone and Android compatible web version, making it truly convienient.

3. I (personally) like to use flashcards to learn things, and I need a better system to do it.

4. I need a good project to get some jQtouch mobile code under my belt.

There is currently a beta version of the site available, it’s basically more of a proof-of-concept at this stage.  Many features are not yet implemented, it’s just the basics. Feel free to check it out, and leave some feedback as a comment. I’ve started the mobile version as well, but it’s currently buggy and not worth visiting.

Online Flashcards

Cheers
-Peter

Book Review – Code

April 18th, 2010 by Peter Anselmo 2 comments »

I just finished reading Code by Charles Petzold.  This book is like no other I’ve ever read.  This book explains how computers work.   Think about that for a second.  Do you know how a computer actually works?  Really?  This isn’t about double clicking on the blue “E” to access the internet.  This book explains how a machine can take electrical 1’s and 0’s and use them to do math, save files, display graphics, and everything else a computer does.

I love the way the book progresses.  It starts with the most basic of electrical circuits.  Simple light bulb and battery type stuff.  It spends a few chapters building your circuit board chops and BAM!  – he shows you how you can wire a circuit to add binary numbers.  Wow, you can now build a very simple computer.  He continues to add to what you already know piece by piece.   Components are added to the circuit board so that it can now perform subtraction, multiplication, and division.  You learn how a circuit can remember data, the basis of memory.  You learn the issues surrounding floating point math and how they are resolved.  He explains machine code, and how it can be simplified by assembly language, and in turn, high-level languages.

Mixed in with the technical chapters, he adds sections on lighter topics such as Morse code, Braile, alternate (non-base 10) number systems, and more.  He covers how letters can be stored as a series of bits, and why there are 8 bits to a byte.

This book changed the way I think about computers.  I highly recommend it to anyone who wants to understand them better.  Although it assumes no prior knowledge, this book is not for the faint of heart; some of the chapters require tenacity to stay focused and comprehend.  However, I guarantee it will be well worth it.

Book Review – jQuery in Action

April 9th, 2010 by Peter Anselmo 1 comment »

jQuery in ActionI just recently finished reading  jQuery in Action by Bear Bibeaut and Yehuda Katz.  I had used very little jQuery before reading it, and even less AJAX.  I can’t recommend this book highly enough.  Five Stars.  To be fair, I may be biased largely because jQuery is so awesome, anything about jQuery will inherit it’s awesomeness.  But either way, if you want to get into jQuery, this is a good place to start.

I’ve heard it pointed out that all of the information in this book is already online, on the jQuery website.  But that’s missing the point.  The book presents the information at a well-thought out pace and order, minimizing confusion.  For example, early on it spends a good amount of time introducing and explaining the various CSS3 selectors and getting your “Wrapped Set” of elements before it jumps into how to manipulate those elements.

The book process to move through all of the awesomeness that jQuery offers with DOM & Content manipulation, Event Handling, Animations, Plugins, and Finally AJAX.  My First attempt trying to implement AJAX was without a framework, using the WROX “Beginning AJAX” book.  I’ve decided that book isn’t worth it’s weight in lead, and that it should be pulled from the shelves.  jQuery makes it ridiculously easy to make POST and GET requests to the server, and handle the results.  I was using AJAX in production code within a week of reading this.

I don’t know what else to say.  The level of Awesome that jQuery exudes is matched only by other epic wins such as Dropbox and Vim.  If you haven’t tried jQuery, you need to. Now.

My Groups widget for BuddyPress

March 19th, 2010 by Peter Anselmo 8 comments »

I had a client request a modification of the “Groups” widget for BuddyPress.  Instead, it should be limited to the groups for the logged in user.  And so, I give you the “My Groups” widget.  It will display all the groups for the user alphabetially sorted.  Cheers.

Download:
bp-my-groups-widget.php (zip)