By 2026, the "Relational vs. Document" debate is effectively over. Major RDBMS engines have integrated binary JSON storage so deeply that the choice is no longer about capability—it's about engineering discipline. Here's the senior engineer's guide to doing it right.
TL;DR
- JSON is now first-class in SQL: Native JSON Index (SQL Server 2025) and enhanced JSONB (Postgres/SQLite) have closed the performance gap
- Don't "JSON-all-the-things": Use JSON for flexible metadata; use columns for core invariants, FKs, and high-frequency aggregations
- Index selectively: GIN indexes are powerful but expensive to maintain on high-write tables
- Schema-on-read is a trap: Validate your JSON on write using CHECK constraints
The State of JSON in Modern SQL (2026)
SQL Server 2025: The Native JSON Index
Microsoft's latest release introduces a native index for JSON paths. Instead of computed columns, you can now index JSON paths directly.
- Benefit: Massive speedup for
JSON_VALUEandJSON_PATH_EXISTS - Constraint: It's still an index—over-indexing high-churn JSON fields will kill your ingestion throughput
-- SQL Server 2025: Native JSON path indexing
CREATE INDEX idx_order_customer
ON orders (JSON_VALUE(data, '$.customerId'));
-- Query now uses the index
SELECT * FROM orders
WHERE JSON_VALUE(data, '$.customerId') = 'cust_123'; PostgreSQL 17/18: Refined JSONB
Postgres continues to dominate the hybrid space:
- GIN (Generalized Inverted Index): The "everything" index for JSON
- Subscripting:
data['user']['id']syntax is now the standard - Partial Indexes: The "Senior Pro Tip"—don't index the whole JSONB blob
-- Partial index: only index what you query
CREATE INDEX idx_active_orders ON orders ((data->>'status'))
WHERE data->>'status' IS NOT NULL;
-- Subscript syntax (Postgres 14+)
SELECT data['user']['email'] FROM users WHERE data['user']['verified'] = 'true';
-- GIN index for containment queries
CREATE INDEX idx_orders_gin ON orders USING GIN (data);
SELECT * FROM orders WHERE data @> '{"priority": "high"}'; SQLite: JSONB Everywhere
With the jsonb format in SQLite, edge and local-first apps can now store
semi-structured data without the text-parsing overhead. This is huge for mobile and IoT storage in 2026.
-- SQLite JSONB (binary JSON storage)
CREATE TABLE events (
id INTEGER PRIMARY KEY,
data BLOB -- Store as JSONB
);
-- Query with json_extract
SELECT json_extract(data, '$.event_type') FROM events
WHERE json_extract(data, '$.timestamp') > '2026-01-01'; The "Senior Engineer" Hybrid Storage Strategy
When to Use Columns (The "Rigid" Core)
- Foreign Keys: You can't (efficiently) have a FK point to a value inside a JSON blob
- Mandatory Fields: If every row must have a
created_atoruser_id, make them columns - Join Keys: If you're joining on it, columnize it
- High-Frequency Aggregations:
SUM(amount)is way faster on a column thanSUM(data->>'amount')
When to Use JSON (The "Flexible" Appendages)
- Extensibility: User-defined fields, integration metadata, legacy system snapshots
- Polymorphic Data: When different rows have widely different sets of attributes (e.g., event logs)
- Sparse Data: If a field is only present in 5% of rows, putting it in JSON saves storage and avoids "NULL-hell"
-- The hybrid pattern: columns for core, JSON for flex
CREATE TABLE orders (
-- Rigid core (indexed, FK-able, aggregatable)
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
status TEXT NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled')),
total_cents BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-- Flexible appendages (extensible, sparse)
metadata JSONB DEFAULT '{}',
shipping_details JSONB,
-- Constraints on JSON
CONSTRAINT valid_metadata CHECK (jsonb_typeof(metadata) = 'object')
);
-- Index only the JSON paths you actually query
CREATE INDEX idx_orders_priority ON orders ((metadata->>'priority'))
WHERE metadata->>'priority' IS NOT NULL; Security & Integrity Pitfalls
1. Data Integrity Decay
Without CHECK constraints, your JSON column will eventually contain garbage.
-- Basic structure validation
ALTER TABLE user_profiles
ADD CONSTRAINT valid_settings CHECK (
jsonb_typeof(settings) = 'object'
AND settings->>'theme' IN ('light', 'dark')
);
-- More complex validation (Postgres)
CREATE OR REPLACE FUNCTION validate_user_settings(settings JSONB)
RETURNS BOOLEAN AS $$
BEGIN
RETURN (
settings ? 'theme'
AND settings->>'theme' IN ('light', 'dark')
AND (NOT settings ? 'notifications' OR jsonb_typeof(settings->'notifications') = 'boolean')
);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
ALTER TABLE user_profiles
ADD CONSTRAINT valid_settings_v2 CHECK (validate_user_settings(settings)); 2. Index Bloat
A GIN index on a large JSONB column can easily exceed the size of the table itself.
-- ❌ Don't: Blanket GIN index on everything
CREATE INDEX idx_orders_all ON orders USING GIN (data);
-- ✅ Do: Targeted indexes on specific paths
CREATE INDEX idx_orders_customer ON orders ((data->>'customer_id'));
CREATE INDEX idx_orders_status ON orders ((data->>'status'));
-- ✅ Do: Partial indexes for common queries
CREATE INDEX idx_orders_pending ON orders ((data->>'created_at'))
WHERE data->>'status' = 'pending'; 3. Information Leakage
Storing "everything" in a JSON blob often leads to sensitive data (PII, tokens) being persisted where it shouldn't be.
metadata_public and
metadata_internal JSON columns. Apply column-level permissions.
Performance Checklist for 2026
- ☐ Binary Format: Ensure you're using
JSONB(Postgres) or native binary JSON (SQLite/SQL Server), not justTEXT - ☐ Partial Indexes: Index only what you query
- ☐ Functional Indexes: Use
CREATE INDEX idx ON orders ((data->>'user_id')) - ☐ Write Validation: Use
CHECKconstraints to enforce basic structure - ☐ Limit Depth: Deeply nested JSON (10+ levels) is a sign of poor modeling
- ☐ De-normalization Audit: Regularly check if a frequently queried JSON key should be promoted to a column
Common Patterns That Work
Pattern: Event Sourcing with JSON Payloads
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
aggregate_id UUID NOT NULL,
event_type TEXT NOT NULL,
event_data JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-- Index for replay
CONSTRAINT valid_event_data CHECK (jsonb_typeof(event_data) = 'object')
);
CREATE INDEX idx_events_aggregate ON events (aggregate_id, created_at);
CREATE INDEX idx_events_type ON events (event_type, created_at); Pattern: User-Defined Fields
CREATE TABLE contacts (
id UUID PRIMARY KEY,
-- Core fields (always present)
email TEXT NOT NULL,
name TEXT NOT NULL,
-- User-defined custom fields
custom_fields JSONB DEFAULT '{}',
-- Validate custom fields structure
CONSTRAINT valid_custom_fields CHECK (
jsonb_typeof(custom_fields) = 'object'
AND pg_column_size(custom_fields) < 65536 -- 64KB limit
)
);
-- Let users query their custom fields
CREATE INDEX idx_contacts_custom ON contacts USING GIN (custom_fields); Anti-Patterns to Avoid
Anti-Pattern: The "Everything Bag"
-- ❌ Don't do this
CREATE TABLE entities (
id UUID PRIMARY KEY,
type TEXT,
data JSONB -- Everything goes here!
);
-- Problems:
-- 1. No type safety
-- 2. No FK constraints
-- 3. Impossible to index efficiently
-- 4. Schema drift guaranteed Anti-Pattern: JSON for Relationships
-- ❌ Don't store arrays of IDs in JSON
{
"order_id": "ord_123",
"product_ids": ["prod_1", "prod_2", "prod_3"]
}
-- ✅ Do use a proper junction table
CREATE TABLE order_items (
order_id UUID REFERENCES orders(id),
product_id UUID REFERENCES products(id),
quantity INT NOT NULL,
PRIMARY KEY (order_id, product_id)
); References
Continue Learning
- JSON at the Edge: Local-First Storage — SQLite JSONB for offline-first apps
- Big JSON Storage & Compression — Zstd, Parquet, and data lakes
- JSON Schema Guide — Validate before you store
- JSON Tools — Format and validate JSON online