ゆるテックノート

Data integrity checks

A field guide for “Is this file intact and authentic?” when distributing or transferring data.

What to use when

Pick the tool by goal: detecting corruption vs detecting tampering.

Comparison

Method Strength Weakness / caution Typical use
Hash (SHA-256, etc.) Low collision, fast equality check. Not tamper-proof if the hash itself is replaced. MD5/SHA-1 are weak. Release checks, dedup, ETag generation.
Checksum (CRC32, etc.) Very fast; good for transmission errors. Weak against intentional tampering; CRC32 collides easily. Protocol error detection, lightweight log/file checks.
Digital signature (public key) Hash + private key proves origin and integrity. Key management required; heavier workflow. Software distribution, package signing, important documents.

Basic workflow for downloads/transfers

Reduce corruption and tampering risk with these steps.

On the receiving side

  • Obtain the publisher’s hash (SHA-256, etc.) over HTTPS or a signed release note.
  • Compute the file hash locally and compare.
  • If tampering matters, verify a signature (PGP, minisign, sigstore, etc.) and validate the public key source.
  • For archives, hash after extraction too to detect corruption inside the archive.

On the publishing side

  • Publish at least SHA-256; avoid MD5/SHA-1 alone.
  • For sensitive artifacts, sign them and offer multiple channels for the public key (site + repo + key server).
  • For large sets, provide per-file hashes or a manifest to narrow retransmits.

Choosing a hash

Balance security and speed.

Guidelines

  • Integrity for releases: SHA-256 is the standard; BLAKE2/3 for speed.
  • Signatures: SHA-256 or SHA-384/512 with an appropriate signature scheme.
  • MD5/SHA-1 lack collision resistance—only keep them for legacy alongside a stronger hash.

Common pitfalls

Integrity requires trustworthy channels, not just hashes.

Checklist

  • If the hash is delivered over an untrusted channel, tampering goes undetected. Use HTTPS or signed notes.
  • Hash changes with line endings or permission bits; fix packaging (ZIP/TAR) to stabilize.
  • S3 ETag may not be MD5 for multipart uploads—know the provider’s rules before relying on it.
  • Signature verification depends on trusting the public key (Web of Trust, pinned keys). A swapped key nullifies the check.

Takeaway

Use hashes/checksums to catch corruption, signatures to catch tampering and prove origin. Secure the delivery channel and key distribution to make the checks meaningful.