Practical Hash Tips
This page focuses on using hashes safely in real projects: collision risk, when to use HMAC or signatures, password hashing, and avoiding common pitfalls in encoding and parsing.
๐งญ What This Covers
Overview
- โ Collision risk intuition and cautions when truncating hashes
- โ Checksum (CRC) vs cryptographic hash and where each fits
- โ HMAC vs digital signatures vs password hashing (KDF)
- โ Encoding/normalization pitfalls that change the hash
- โ Streaming large files and content-addressable examples
โ ๏ธ Collisions and Truncation
Shorter hashes collide sooner; choose bit length by threat model.
Bit-length intuition
| Bits | Birthday-bound scale | Typical note |
|---|---|---|
| 64 | Collisions around ~5e9 items | Temporary IDs or short hashes; not for long-term safety. |
| 128 | ~3e19 | MD5 has practical collisions; avoid for safety. |
| 160 | ~1e24 | SHA-1 has practical collisions; avoid for safety. |
| 256 | ~1e38 | SHA-256+ is standard for safety. |
When truncating
- โ๏ธ Truncating to N bits makes collisions feasible around 2^(N/2) (birthday bound).
- โ๏ธ Keep enough bits or store the full hash if you shorten for URLs/UI.
- โ๏ธ โSafe enoughโ depends on attacker cost; for public IDs prefer SHA-256-length roots.
๐งฎ Checksum vs Cryptographic Hash
CRC/Adler detect random errors; they do not resist intentional tampering.
Choose appropriately
- ๐ Transfer integrity (non-adversarial) โ CRC32/Adler32 can suffice.
- ๐ Tamper detection/authentication โ cryptographic hashes (SHA-256+).
- ๐ Publish file hashes with SHA-256 or stronger for downloads.
๐งฐ HMAC / Signatures / Password Hashing
Who holds the secret and what you protect determines the tool.
Roles
- ๐ง HMAC: shared secret; detects tampering but is forgeable if the secret leaks.
- ๐ง Digital signature: public verification; private key protection and canonicalization matter (XML/JSON).
- ๐ง Password hashing (KDF): PBKDF2/bcrypt/scrypt/Argon2 with salt, stretching, and often memory hardness.
Migration tips
- ๐ง On login, rehash with the new scheme to migrate gradually.
- ๐ง A pepper (app-held secret) adds resilience but needs rotation planning.
๐งพ Encoding and Normalization Pitfalls
Different encodings or input normalization can change the hash entirely.
Common gotchas
- ๐ Hex case, `0x` prefixes, and separators create different strings.
- ๐ BASE64 vs BASE64URL, line breaks on/off (email tools often wrap).
- ๐ Input newline (LF/CRLF) or Unicode normalization (NFC/NFD) differences yield different hashes.
Mitigations
- ๐ ๏ธ Normalize input before hashing and document the policy.
- ๐ ๏ธ Align on the encoding (hex/BASE64URL) when exchanging hashes externally.
โฉ Streaming Hashes and Commands
Hash large files in chunks to avoid excessive memory use.
Practical points
- ๐ `hash_update` / `hash_file` (PHP) or `openssl dgst -sha256 file` support streaming.
- ๐ For many files, keep a manifest (filename + hash) for bulk verification.
- ๐ If you need partial verification, consider range-hashing designs for large storage.
๐ฆ Content-Addressable Examples
Addressing by content improves reproducibility and caching.
Examples
- ๐งฑ Git object IDs (SHA-1 โ SHA-256 migration)
- ๐งฑ IPFS/CAS content IDs (Base58/Base32 hash encodings)
- ๐งฑ Docker layer digests (SHA-256)
โ FAQ
Q. Is MD5 or SHA-1 acceptable?
- A. Not for collision resistance. Restrict to legacy/non-safety contexts and plan replacement.
Q. How many bits are safe?
- A. SHA-256 (256 bits) is the common baseline. For long-term secrecy, consider SHA-512-family for margin.