ใ‚†ใ‚‹ใƒ†ใƒƒใ‚ฏใƒŽใƒผใƒˆ

Practical Hash Tips

This page focuses on using hashes safely in real projects: collision risk, when to use HMAC or signatures, password hashing, and avoiding common pitfalls in encoding and parsing.

๐Ÿงญ What This Covers

Overview

  • โœ… Collision risk intuition and cautions when truncating hashes
  • โœ… Checksum (CRC) vs cryptographic hash and where each fits
  • โœ… HMAC vs digital signatures vs password hashing (KDF)
  • โœ… Encoding/normalization pitfalls that change the hash
  • โœ… Streaming large files and content-addressable examples

โš ๏ธ Collisions and Truncation

Shorter hashes collide sooner; choose bit length by threat model.

Bit-length intuition

Bits Birthday-bound scale Typical note
64 Collisions around ~5e9 items Temporary IDs or short hashes; not for long-term safety.
128 ~3e19 MD5 has practical collisions; avoid for safety.
160 ~1e24 SHA-1 has practical collisions; avoid for safety.
256 ~1e38 SHA-256+ is standard for safety.

When truncating

  • โœ‚๏ธ Truncating to N bits makes collisions feasible around 2^(N/2) (birthday bound).
  • โœ‚๏ธ Keep enough bits or store the full hash if you shorten for URLs/UI.
  • โœ‚๏ธ โ€œSafe enoughโ€ depends on attacker cost; for public IDs prefer SHA-256-length roots.

๐Ÿงฎ Checksum vs Cryptographic Hash

CRC/Adler detect random errors; they do not resist intentional tampering.

Choose appropriately

  • ๐Ÿ” Transfer integrity (non-adversarial) โ†’ CRC32/Adler32 can suffice.
  • ๐Ÿ” Tamper detection/authentication โ†’ cryptographic hashes (SHA-256+).
  • ๐Ÿ” Publish file hashes with SHA-256 or stronger for downloads.

๐Ÿงฐ HMAC / Signatures / Password Hashing

Who holds the secret and what you protect determines the tool.

Roles

  • ๐Ÿง  HMAC: shared secret; detects tampering but is forgeable if the secret leaks.
  • ๐Ÿง  Digital signature: public verification; private key protection and canonicalization matter (XML/JSON).
  • ๐Ÿง  Password hashing (KDF): PBKDF2/bcrypt/scrypt/Argon2 with salt, stretching, and often memory hardness.

Migration tips

  • ๐Ÿ”ง On login, rehash with the new scheme to migrate gradually.
  • ๐Ÿ”ง A pepper (app-held secret) adds resilience but needs rotation planning.

๐Ÿงพ Encoding and Normalization Pitfalls

Different encodings or input normalization can change the hash entirely.

Common gotchas

  • ๐Ÿ“Œ Hex case, `0x` prefixes, and separators create different strings.
  • ๐Ÿ“Œ BASE64 vs BASE64URL, line breaks on/off (email tools often wrap).
  • ๐Ÿ“Œ Input newline (LF/CRLF) or Unicode normalization (NFC/NFD) differences yield different hashes.

Mitigations

  • ๐Ÿ› ๏ธ Normalize input before hashing and document the policy.
  • ๐Ÿ› ๏ธ Align on the encoding (hex/BASE64URL) when exchanging hashes externally.

โฉ Streaming Hashes and Commands

Hash large files in chunks to avoid excessive memory use.

Practical points

  • ๐Ÿšš `hash_update` / `hash_file` (PHP) or `openssl dgst -sha256 file` support streaming.
  • ๐Ÿšš For many files, keep a manifest (filename + hash) for bulk verification.
  • ๐Ÿšš If you need partial verification, consider range-hashing designs for large storage.

๐Ÿ“ฆ Content-Addressable Examples

Addressing by content improves reproducibility and caching.

Examples

  • ๐Ÿงฑ Git object IDs (SHA-1 โ†’ SHA-256 migration)
  • ๐Ÿงฑ IPFS/CAS content IDs (Base58/Base32 hash encodings)
  • ๐Ÿงฑ Docker layer digests (SHA-256)

โ“ FAQ

Q. Is MD5 or SHA-1 acceptable?

  • A. Not for collision resistance. Restrict to legacy/non-safety contexts and plan replacement.

Q. How many bits are safe?

  • A. SHA-256 (256 bits) is the common baseline. For long-term secrecy, consider SHA-512-family for margin.