Birthday Paradox Calculator
... and the chance of unique keys clashing
13 July 2008
While programming for the web, the scenario often arises in which one wants a method to generate a short, readable
'unique' key. For .NET, I have been using this
method and it does the job nicely.
I wanted to find out what is the best size for the keys so that there is minimal risk of a clash. I could just make
the keys very long, but this takes more resources to generate and it is often preferable to have the keys readable by a human.
Even though a 4 letter combination of letters and numbers gives 557,845 different possibilities, there is still a high
degree of risk that you will have a clash, even with a relatively small number of repitions. This may not be intuitive...
people to tend to underestimate the probabilities of coincidences occurring.
This inability to accurately evaluate probability is commonly demonstrated with the pseudo-paradox, the Birthday Paradox.
When you have two birthdays on the same day in your office, and the secretary starts acting like its a miracle, but the programmers
all roll their eyes because they already know that, in an office of 30 people, there is a 70% chance that two people will have a
birthday on the same day, this is the Birthday Paradox in action.
I was looking around Google for a birthday paradox calculator. There are plenty available, but I couldn't find a calculator
that could be easily extended to work out the chance of generated keys clashing. Furthermore, a lot of the calculators that
turned up in Google crashed my browser if the numbers got too large.
So here's one I made myself that does the job safely in Javascript. I've initialized it with the numbers required to calculate the
birthday paradox for an office of 30 people.
Unique Key Collision Calculator (aka Birthday Paradox Calculator)
Number of items in set
For example, if the set includes all alphabetical characters and digits, then enter 36.
Length of key
Number of repetitions of key
Results
So here are some example results:
| Key length |
Key set |
Repetitions |
Chance of clash |
| 4 |
Lowercase, uppercase letters, digits |
100 |
1% |
| 4 |
Lowercase, uppercase letters, digits |
500 |
17% |
| 5 |
Lowercase, digits |
200 |
17% |
| 5 |
Lowercase, uppercase letters, digits |
5000 |
75% |
| 6 |
Lowercase letters, digits |
1000 |
11% |
| 6 |
Lowercase letters, digits |
10000 |
100% |
| 6 |
Lowercase, uppercase letters, digits |
10000 |
39% |
| 8 |
Lowercase, uppercase letters, digits |
100000 |
45% |
Microsoft's GUID
By the way, to calculate the chance of Microsoft's GUIDs clashing, you would need to enter 16 for the number of items in the
set and a key length of 32. Make sure you enter a large number of repitions, otherwise my program can't handle the numbers.
You may be surprised that there is a significant chance of two GUIDs clashing once you have a few hundred thousand repitions.
Notes
I used a Javascript library for dealing with big integers.
You can find it here: Big Integer Javascript Library
Have I got my maths right? I'm not much of a mathematician, so I could have made a mistake. Leave a comment if I have.