Choosing data formats

“Just use CSV or JSON” is common, but each format has sweet spots. Here’s a quick guide for exchange, storage, and analytics.

When you need tabular data

For rows and columns, pick text or columnar binary based on how you’ll process it.

CSV / TSV

Human-readable text. CSV uses commas; TSV uses tabs and is simpler when values contain commas.
Escaping and embedded newlines can be tricky. TSV avoids commas but needs care for tabs in data.
Great for ad hoc exchange and spreadsheets; you still need to share the schema separately.

Parquet / Arrow

Columnar binary with compression/encoding; fast for analytics and carries types.
Standard in big data stacks (Spark/Presto/BigQuery compatibility). Arrow also defines in-memory layout.
Not human-friendly, but good for long-term storage and analysis with explicit schema.

When you need flexible structure

For nested objects and arrays, JSON variants are a good fit.

JSON / JSONL

Key/value with nesting; common for APIs and configs.
JSONL (one object per line) works well for streaming and incremental processing.
Types are loose; pair with a schema (e.g., JSON Schema) to catch breaking changes.

Quick guidance by use case

Balance readability, size, speed, and tool support.

Recommendations

Human-friendly editing: TSV/CSV (mind escaping).
APIs, configs, logs: JSON/JSONL (manage schema separately).
Analytics and large datasets: columnar binaries like Parquet/Arrow with schema.

< Data size units Text encoding basics >