Shredding, coding, arguing etc.

Birthday Paradox Calculator

... and the chance of unique keys clashing

13 July 2008

While programming for the web, the scenario often arises in which one wants a method to generate a short, readable 'unique' key. For .NET, I have been using this method and it does the job nicely.

I wanted to find out what is the best size for the keys so that there is minimal risk of a clash. I could just make the keys very long, but this takes more resources to generate and it is often preferable to have the keys readable by a human.

Even though a 4 letter combination of letters and numbers gives 557,845 different possibilities, there is still a high degree of risk that you will have a clash, even with a relatively small number of repitions. This may not be intuitive... people to tend to underestimate the probabilities of coincidences occurring.

This inability to accurately evaluate probability is commonly demonstrated with the pseudo-paradox, the Birthday Paradox. When you have two birthdays on the same day in your office, and the secretary starts acting like its a miracle, but the programmers all roll their eyes because they already know that, in an office of 30 people, there is a 70% chance that two people will have a birthday on the same day, this is the Birthday Paradox in action.

I was looking around Google for a birthday paradox calculator. There are plenty available, but I couldn't find a calculator that could be easily extended to work out the chance of generated keys clashing. Furthermore, a lot of the calculators that turned up in Google crashed my browser if the numbers got too large.

So here's one I made myself that does the job safely in Javascript. I've initialized it with the numbers required to calculate the birthday paradox for an office of 30 people.

Unique Key Collision Calculator (aka Birthday Paradox Calculator)

Number of items in set
For example, if the set includes all alphabetical characters and digits, then enter 36.
Length of key
Number of repetitions of key
Calculate

Results

So here are some example results:

Key length Key set Repetitions Chance of clash
4 Lowercase, uppercase letters, digits 100 1%
4 Lowercase, uppercase letters, digits 500 17%
5 Lowercase, digits 200 17%
5 Lowercase, uppercase letters, digits 5000 75%
6 Lowercase letters, digits 1000 11%
6 Lowercase letters, digits 10000 100%
6 Lowercase, uppercase letters, digits 10000 39%
8 Lowercase, uppercase letters, digits 100000 45%

Microsoft's GUID

By the way, to calculate the chance of Microsoft's GUIDs clashing, you would need to enter 16 for the number of items in the set and a key length of 32. Make sure you enter a large number of repitions, otherwise my program can't handle the numbers.

You may be surprised that there is a significant chance of two GUIDs clashing once you have a few hundred thousand repitions.

Notes

I used a Javascript library for dealing with big integers. You can find it here: Big Integer Javascript Library

Have I got my maths right? I'm not much of a mathematician, so I could have made a mistake. Leave a comment if I have.

Comments

 
   
Stupidity filter

Unfortunately, the internet is rife with automaton zombies that fill up blog threads with rubbish. There are also computer scripts that do this too. Please answer the following question.

Waht is the nmae of the plnaet clsoset to the cneter of the sloar ssyetm?    
  •  
  • del.icio.us  del.icio.us
  • reddit  reddit
  • StumbleUpon  StumbleUpon