Data integrity checks
A field guide for “Is this file intact and authentic?” when distributing or transferring data.
What to use when
Pick the tool by goal: detecting corruption vs detecting tampering.
Comparison
| Method | Strength | Weakness / caution | Typical use |
|---|---|---|---|
| Hash (SHA-256, etc.) | Low collision, fast equality check. | Not tamper-proof if the hash itself is replaced. MD5/SHA-1 are weak. | Release checks, dedup, ETag generation. |
| Checksum (CRC32, etc.) | Very fast; good for transmission errors. | Weak against intentional tampering; CRC32 collides easily. | Protocol error detection, lightweight log/file checks. |
| Digital signature (public key) | Hash + private key proves origin and integrity. | Key management required; heavier workflow. | Software distribution, package signing, important documents. |
Basic workflow for downloads/transfers
Reduce corruption and tampering risk with these steps.
On the receiving side
- Obtain the publisher’s hash (SHA-256, etc.) over HTTPS or a signed release note.
- Compute the file hash locally and compare.
- If tampering matters, verify a signature (PGP, minisign, sigstore, etc.) and validate the public key source.
- For archives, hash after extraction too to detect corruption inside the archive.
On the publishing side
- Publish at least SHA-256; avoid MD5/SHA-1 alone.
- For sensitive artifacts, sign them and offer multiple channels for the public key (site + repo + key server).
- For large sets, provide per-file hashes or a manifest to narrow retransmits.
Choosing a hash
Balance security and speed.
Guidelines
- Integrity for releases: SHA-256 is the standard; BLAKE2/3 for speed.
- Signatures: SHA-256 or SHA-384/512 with an appropriate signature scheme.
- MD5/SHA-1 lack collision resistance—only keep them for legacy alongside a stronger hash.
Common pitfalls
Integrity requires trustworthy channels, not just hashes.
Checklist
- If the hash is delivered over an untrusted channel, tampering goes undetected. Use HTTPS or signed notes.
- Hash changes with line endings or permission bits; fix packaging (ZIP/TAR) to stabilize.
- S3 ETag may not be MD5 for multipart uploads—know the provider’s rules before relying on it.
- Signature verification depends on trusting the public key (Web of Trust, pinned keys). A swapped key nullifies the check.
Takeaway
Use hashes/checksums to catch corruption, signatures to catch tampering and prove origin. Secure the delivery channel and key distribution to make the checks meaningful.