Archive for the ‘Everything Else’ category

Calculating NCAA March Madness Bracket using Chess ELO predictive scoring

March 13th, 2012

Hokay, So I had a lot of fun with this. Let me start by saying I’m not the first to do this.  However, after a lot of Googling, I found surprisingly few NCAA Bracket predictions using the ELO system. Those that I did weren’t transparent about the data they used.  I wanted to do it so I could see the result with data I knew, and as a good excuse to code some Ruby.

Overview

First, the ELO system. The ELO system is a way of calculating the relative skill between two players (and thus a probability for one to win in a future match). Wikipedia has an excellent write-up including the history and the math behind the scoring.  In a nutshell, it calculates an expected result based on the rankings of the two teams.  It then compares the actual result to the expected one, and adjusts each player’s rank accordingly (increasing it for the winner, subtracting for the looser).  If a favored team wins, the adjustment is small.  If an underdog wins, the adjustment is larger.

Assumptions

Your conclusion is only as good as your assumptions, and we’ll we need to make a few.  Most of the work is done by choosing the ELO system, it’s one of the simpler systems for relative rankings.  For our data, we’re only interested in what two teams played, and which team won.  We ignore the final score, whether traveling or home, players used, fouls, timeouts, point distribution, etc.  Also, for the purposes of this calculation, if the game went into overtime, I count it as a tie.  That’s probably the most debatable assumption, but I feel it’s valid because it essentially means after an hour, the two teams displayed equal skill.

Data

So this turned out to be the hardest part. I wanted to use the 2011-2012 season as my dataset. After a half-hour of Googling, I couldn’t find the data in a well-structured format (read: csv or xls).  So I had to resort to web scraping.

The best website I could find was the official NCAA site.  They have a page with the Men’s Division 1 listing by team, where you can click into each team, to see a game history (amongst other things). Let’s grab it.

wget --mirror "http://stats.ncaa.org/team/
             inst_team_list?sport_code=MBB&division=1"

Well that was fun.  wget was a little overzealous, so I moved all the relevent pages (those starting with 10740) into their own folder.  I then wrote a Ruby script to organize the data, clean it up, and write it to a file.

a

The output from that script is a beautifully structured file, if I do say so myself.  Well, at least from a data perspective.

a

Number Crunching

Okay, so now it’s time to actually calculate the elos.  I basically wrote a straight implementation of the math as presented on WikipedidaThe second ruby script, reads in the scores, calculates the adjustments, and keeps track of the changes.

Here the output while it’s running:

a

lastly, it sorts the results and writes them to a file.

Conclusions

Here it is:

Finall ELO Predicted Bracket

a

We can see that the comparing our generated results to the seeded rankings, there’s a lot of overlap.  The top three teams are predicted exactly as seeded.  However, from there the list diverges quite a bit. For example, Murray St. is expected to take the west, but didn’t get seeded so hot.

So, if this wins, I’ll get some money from our office bracket pool. Which is nice. And if it doesn’t, it will be proof that my computer messed up on the calculation.

—–EDIT——

My initial calculations didn’t account for the order in which the games were played.  Although I didn’t think this would have a big influence, running the script on a computer that lists the data files in a different order actually made some big differences.  Thus, I changed the data scraping script to account for the dates, and calculate all ELO scores in the order that they were played.  This should result in a more accurate, and reproducable result.  Here is the updated script, and the updated final result.  Here’s my Final Bracket.

How to get your car registered in Maryland (in 24 easy steps)

June 8th, 2011

1. Visit the MVA (Maryland’s special name for the DMV), find you need an Inspection, the Title, and Forms filled out.

2. Fill out Forms.

3. Call Lender, request Title.

4. Get car inspected.

5. Fail inspection because there’s a tiny sub-bulb of the headlight out, windshield wipers are worn, and the windows are tinted.

6. Fix headlight and windshield wipers, need to have MVA police officer inspect tint.

7. Lender mails notice, saying DMV has Title and will send it within 21 business days.

8. (On a Friday)Visit MVA to have officer inspect tint, find out police officers are only on duty Wed & Thurs 8:00 – 12:00.

9. (On a Thursday) Visit MVA, officer measures tint. Too dark, must remove.

10. Go to tint shop, pay to have tint removed.

11. (On a Thursday) Visit MVA, officers are now only on duty Wed 8:00-12:00.

12. (The following Wednesday) Visit MVA, officer approves lack of tint.

13. Go back to inspection place; car needs all new inspection because it’s been more than 30 days.  Bring back when you have time to wait.

14. Get car inspected.

15. Pass inspection.

16. Call up the CA DMV and ask them why it’s taking so friggin’ long to get Title.  They say it’s in process.

17. Receive Title in mail, ~50 days after requested.

18. Go to MVA with Title and inspection certificate in hand.  Wait in two lines.  Find out you need proof of Maryland insurance.  CA insurance doesn’t cut it.

19.  Call insurance, transfer isn’t simple, They need more info and signed docs.  Leave the MVA.

20. Set up new Insurance.

21. Cancel old insurance.

22. Re-fill out forms.

23. Visit MVA with Title, Inspection, Insurance and Forms.

24. GET MARYLAND LICENSE PLATES & REGISTRATION!!!

a

Concrete5 vs WordPress: Benchmarking Load Time

December 14th, 2010

I just discovered Concrete5 CMS recently when another developer in my area launched a site with it.  Always up for learning something new, I went to the website, read the sales pitch, and decided to give it a whirl.  Before I spend time learning yet another CMS, for kicks I thought I would benchmark it for speed against WordPress, my current go-to solution.  Here we go.

Preliminaries
Hardware: All tests will be conducted on my Desktop Computer; a custom built PC with a 3.5Ghz Core2 Duo, 4GB of Ram, and a 10,000 RPM hard drive.  Not identical, but similar to many server setups on the market.

Software: I’m running Ubuntu 10.04 with Apache 2.2, PHP 5.3, and MySQL 5.1.  Once again, apart from using a Desktop OS, this is almost identical to your usual LAMP server software. For benchmarking, I will be using Siege 2.68.

Step 1: Fresh Installs
I downloaded and installed the latest version of Concrete5 (5.4.1.1) and WordPress (3.03)  Here are the screenshots of the home pages out of the box:

WordPress Fresh Install

Concrete5 Fresh Install

a

Step 2: Balancing
Okay, First thing, to be fair we need to balance the page weights.  Siege will not load all the linked resources (like CSS and Javscript), I actually only care about the html page weights.  Out of the box Concrete is 1771 bytes and WordPress is 2015 bytes.  Pretty close.  After removing several widgets from the WordPress sidebar (Those extra queries weren’t fair anyway) and adding the right amount of Lorem Ipsum, the WordPress page is now exactly 1771 bytes as well.  Perfect.

Step 3: Attack!
To stress test my desktop server I am using the following siege command:

 siege -c 50 -r 40 http://localhost/[siteurl]

This will attempt to make 50 concurrent requests to the website, and will repeat each request 40 times.  This is a total of 2000 requests to each site.

Step 4: Results
Here is the raw data from the tests:

WordPress Concrete5
Total Requests: 2000 2000
Average Response Time (seconds): 2.52 1.62
Transactions per second: 16.23 22.46
Longest Transaction (seconds): 4.8 3.13
Shortest Transaction (seconds): .10 .07
Elapsed Time 123.21 89.04

And the corresponding screenshots:

Results for WordPress

Results for Concrete5

a

Step 5: Conclusions
As we can see from the data Concrete5 outperformed WordPress by 20-30% in every measure. This is a significant amount.  What does this tell us?  For small sites with little content, Concrete5 will scale to additional concurrent users better than WordPress. What does this not tell us?  For one thing, the sites may not scale to additional content equally well.  This test also ignored all the static content which will download from either CMS with equal speed.  Finally, WordPress also has some excellent caching plugins that may have closed the gap.

Other Considerations
Am I suggesting you ditch WordPress and port your sites to Concrete?  Not at all. WordPress has many good things going for it; it is the #1 blogging tool and is a finely tuned engine that powers millions of websites.  What I CAN say for sure, is that if you can outperform that, then you’re doing something right.  Hats off to the team at Concrete.

——————————————————-

Update (1 Day Later)
Okay, After posting this, I had a lot of people point out to me that this is not a fair fight.  Concrete has caching enabled by default, and WordPress does not.   After installing wp-super-cache, the elapsed time for 2000 requests from WordPress fell from 123 seconds to 27 seconds.  Wow, crazy plugin.  Either way, my original results stand when you consider out of the box performance.  Wp-super-cache is neither bundled with WordPress nor do it’s version numbers suggest it is stable.  It’s technical options are overwhelming to all but advanced users.  Kudos still go to Concrete5 for integrating a simple, stable caching system into the framework.

Rackspace Cloud Hosting – Unexpected Awesomeness

June 10th, 2009

Rands, an internet celebrity among the tech crowd, once wrote that for him to use a a new application, it must:

  1. look and feel like magic.
  2. Work flawlessly in the first 10 minutes.
  3. Provide additional, unexpected awesomeness.

Ever since reading that, I’ve placed most new things I encounter up to that test.  Suprsingly few hold up.  Last week however, one passed with flying colors.  I was shopping for a cost-effective way to consolidate my web hosting plans.  I was specifically looking for a virtual private server with strong uptime and speed numbers, Ubuntu Server OS, and a low price. Rackspace cloud servers fit the bill perfectly, so I signed up.  So how did it score? Let’s see:

Magic:  You start off by picking your server size.  It’s easier than shopping on Amazon.  256mb of RAM. Done.  Then you pick your OS.  Not only did they have Ubuntu, they had the latest three releases.  Next up, you see your virtual server created in real time with a nice little progress bar.  Create an entirely new server in 30 seconds?  Watch the whole process without a page load?  Good enough to be magic for me.

Work Flawlessly:  I spent the next 45 minutes going crazy with apt-get and wget.  I installed Apache, Mysql, PHP, Postfix, Dovecot, GD, WordPress & Roundcube.  Not a single hitch.

Unexpected Awesomeness:  Backups!  Daily, weekly and variable backups are included.  You can image the entire server without taking it offline, and restore a backup with one-click.  Scaling!  I decided to bump my plan from 256 to 512mb of RAM.  The entire process took about two minutes, and there was no need to re-configure anything.  DNS Hosting!  You can stack domain nameserver hosting on for free even if the domains are registered elsewhere.  Reverse DNS Lookup!  I’ve never found another host or ISP that makes it as easy to set your PTR record.  Support Chat!  I don’t feel like waiting on hold for support.  Twice I opened an IM window and got help from a human within a minute.  Basically there was enough unexpected awesomeness to go around.

Cloud Hosting and Cloud Computing by Rackspace - Formerly Mosso

Hats off to Rackspace, my new favorite hosting company!

You Cannot Pass

December 29th, 2008

I stumbled across this image online.  I couldn’t resist re-posting it.

You Cannot Pass