If you download an app from the internet, how do you know it’s what it claims to be? Say you find a copy of an e-book or game in an online marketplace and you want to download it. How do you know your download is the intended media, and not a corrupted program stuffed with malware?
This is where checksums often come into play. The term “checksum” refers to a sequence of letters and numbers that you can use to safeguard against data errors (also known as a “hash”). Checksums are often used in scenarios where data integrity is crucial because they allow you to verify a file’s hash against the original.
Checksums help you ensure that files you’ve downloaded are legitimate, complete, and uncorrupted. As a developer, you should know when and how to implement checksum algorithms in the applications you build. Ahead, we’ll take a closer look at checksums, their use, and ways to calculate them.
Learn something new for free
How checksum programs work
A checksum is a sequence of letters and numbers that you use to check a file’s validity and integrity. Checksum programs work by running a file through an algorithm that uses aspects of the file to produce the checksum value.
The algorithms used to produce checksums include:
Each algorithm has its pros and cons, but the best one to use depends on your specific use case.
How algorithms render checksums
Algorithms use cryptographic hash functions, which are formulas that take an input and output a bunch of letters and numbers. Because the algorithm is relatively complicated, very miniscule differences in two files that are otherwise essentially the same can produce vastly different checksums.
The algorithm analyzes the contents of a file, and each element of the file becomes an input. Because each file contains so many different elements, it’s highly unlikely for two different files to produce the same checksum.
Checksums are standardized, so each file’s checksum contains the same number of digits, and it’s impossible to derive the original file using just the checksum.
For instance, using an online checksum tool, we can upload a file with this blog post, and it’ll produce the following checksum:
4f404820274682dcd6f66290c78b239b
And a document with only the word “the” in it produces this checksum:
fef6e7bcb47ab0764178d7f30e5f5bec
Now, let’s add a period to this article and see how that changes the checksum. The original checksum for this article was this:
4f404820274682dcd6f66290c78b239b
With only a period added, the checksum for “the” becomes this:
5ee798757e220c1a5b9702b37cd4d611
These were produced using the MD5 checksum algorithm. But different algorithms would produce different checksums for the same file.
How checksums are used
Because checksums can pick up even very minor variations in a file, they have many uses. While the average person might not encounter checksums in their day-to-day activities, they come up a lot in instances like:
- Checking whether an internet connection was interrupted while transmitting an important file.
- Analyzing data to identify hard drive problems that could have corrupted it.
- Checking a hard disk, thumb drive, or another external disk to see if it has been corrupted.
- Analyzing data being delivered to a recipient to see if a third-party had intercepted, corrupted, or manipulated it.
- Checking for malware installed within a presumably trustworthy file.
Example of how to use a checksum algorithm
Say you have to download a new version of a game from a site claiming to be the original provider. The site also gives you the checksum of the file. You download the file, and it takes several minutes because it’s huge. Your internet connection seems a little slower than usual, so you walk away for a few minutes while the file finishes downloading.
When you come back, it looks like the download was successful. You then open the file and install the new version of your game. Right away, something doesn’t seem right. Even though the game opens, the user interface lags, and some visual elements seem corrupted.
In this case, you could use a checksum algorithm to check whether there’s a problem with the file. The algorithm used to generate the checksum on the provider’s site will most likely be specified, so all you have to do is copy the checksum and find a checksum generator that uses the same algorithm.
You load the original file you downloaded from the provider’s site, find the algorithm the provider used, and enter the file into the checksum generator. At this point, one of two things will happen:
- The checksums match. This could mean one of the following:
- The provider has been making a corrupted file available to customers, which is unlikely.
- The problem is on your end; your computer isn’t executing the file properly.
- Your computer made a mistake while installing the game.
- The checksums do not match. This means:
- You had a poor internet connection, and something got lost during the download.
- Your connection wasn’t secure, and a hacker interrupted your transmission and altered the contents of the file using a man-in-the-middle attack.
- You made a mistake when choosing which file to input into the checksum algorithm, perhaps putting in one that you had already altered in some way.
Where are checksums used?
Checksums are useful when ensuring the quality of downloads and analyzing data for evidence of a cyber-attack. If you want to learn more about algorithms and how they’re generated and used, check out our free course Java: Algorithms. You can also gain a solid understanding of algorithms with Learn Data Structures and Algorithms with Python.