Storage 12 min read

JSON in Relational Databases (2026): The 'JSON Index' Era and Why You Still Need Columns

Learn when to use JSONB vs columns in PostgreSQL, SQL Server 2025's new JSON Index, SQLite JSONB, and the hybrid storage patterns that actually work in production.

#postgresql #jsonb #sql-server #sqlite #database #indexing

By 2026, the "Relational vs. Document" debate is effectively over. Major RDBMS engines have integrated binary JSON storage so deeply that the choice is no longer about capability—it's about engineering discipline. Here's the senior engineer's guide to doing it right.

TL;DR

  • JSON is now first-class in SQL: Native JSON Index (SQL Server 2025) and enhanced JSONB (Postgres/SQLite) have closed the performance gap
  • Don't "JSON-all-the-things": Use JSON for flexible metadata; use columns for core invariants, FKs, and high-frequency aggregations
  • Index selectively: GIN indexes are powerful but expensive to maintain on high-write tables
  • Schema-on-read is a trap: Validate your JSON on write using CHECK constraints

The State of JSON in Modern SQL (2026)

SQL Server 2025: The Native JSON Index

Microsoft's latest release introduces a native index for JSON paths. Instead of computed columns, you can now index JSON paths directly.

  • Benefit: Massive speedup for JSON_VALUE and JSON_PATH_EXISTS
  • Constraint: It's still an index—over-indexing high-churn JSON fields will kill your ingestion throughput
sql-server-json-index.sql
sql
-- SQL Server 2025: Native JSON path indexing
CREATE INDEX idx_order_customer 
ON orders (JSON_VALUE(data, '$.customerId'));

-- Query now uses the index
SELECT * FROM orders 
WHERE JSON_VALUE(data, '$.customerId') = 'cust_123';

PostgreSQL 17/18: Refined JSONB

Postgres continues to dominate the hybrid space:

  • GIN (Generalized Inverted Index): The "everything" index for JSON
  • Subscripting: data['user']['id'] syntax is now the standard
  • Partial Indexes: The "Senior Pro Tip"—don't index the whole JSONB blob
postgres-jsonb-patterns.sql
sql
-- Partial index: only index what you query
CREATE INDEX idx_active_orders ON orders ((data->>'status'))
WHERE data->>'status' IS NOT NULL;

-- Subscript syntax (Postgres 14+)
SELECT data['user']['email'] FROM users WHERE data['user']['verified'] = 'true';

-- GIN index for containment queries
CREATE INDEX idx_orders_gin ON orders USING GIN (data);
SELECT * FROM orders WHERE data @> '{"priority": "high"}';

SQLite: JSONB Everywhere

With the jsonb format in SQLite, edge and local-first apps can now store semi-structured data without the text-parsing overhead. This is huge for mobile and IoT storage in 2026.

sqlite-jsonb.sql
sql
-- SQLite JSONB (binary JSON storage)
CREATE TABLE events (
  id INTEGER PRIMARY KEY,
  data BLOB  -- Store as JSONB
);

-- Query with json_extract
SELECT json_extract(data, '$.event_type') FROM events
WHERE json_extract(data, '$.timestamp') > '2026-01-01';

The "Senior Engineer" Hybrid Storage Strategy

When to Use Columns (The "Rigid" Core)

  • Foreign Keys: You can't (efficiently) have a FK point to a value inside a JSON blob
  • Mandatory Fields: If every row must have a created_at or user_id, make them columns
  • Join Keys: If you're joining on it, columnize it
  • High-Frequency Aggregations: SUM(amount) is way faster on a column than SUM(data->>'amount')

When to Use JSON (The "Flexible" Appendages)

  • Extensibility: User-defined fields, integration metadata, legacy system snapshots
  • Polymorphic Data: When different rows have widely different sets of attributes (e.g., event logs)
  • Sparse Data: If a field is only present in 5% of rows, putting it in JSON saves storage and avoids "NULL-hell"
hybrid-table-design.sql
sql
-- The hybrid pattern: columns for core, JSON for flex
CREATE TABLE orders (
  -- Rigid core (indexed, FK-able, aggregatable)
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  status TEXT NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled')),
  total_cents BIGINT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  
  -- Flexible appendages (extensible, sparse)
  metadata JSONB DEFAULT '{}',
  shipping_details JSONB,
  
  -- Constraints on JSON
  CONSTRAINT valid_metadata CHECK (jsonb_typeof(metadata) = 'object')
);

-- Index only the JSON paths you actually query
CREATE INDEX idx_orders_priority ON orders ((metadata->>'priority'))
WHERE metadata->>'priority' IS NOT NULL;

Security & Integrity Pitfalls

1. Data Integrity Decay

Without CHECK constraints, your JSON column will eventually contain garbage.

json-check-constraint.sql
sql
-- Basic structure validation
ALTER TABLE user_profiles
ADD CONSTRAINT valid_settings CHECK (
  jsonb_typeof(settings) = 'object'
  AND settings->>'theme' IN ('light', 'dark')
);

-- More complex validation (Postgres)
CREATE OR REPLACE FUNCTION validate_user_settings(settings JSONB)
RETURNS BOOLEAN AS $$
BEGIN
  RETURN (
    settings ? 'theme'
    AND settings->>'theme' IN ('light', 'dark')
    AND (NOT settings ? 'notifications' OR jsonb_typeof(settings->'notifications') = 'boolean')
  );
END;
$$ LANGUAGE plpgsql IMMUTABLE;

ALTER TABLE user_profiles
ADD CONSTRAINT valid_settings_v2 CHECK (validate_user_settings(settings));

2. Index Bloat

A GIN index on a large JSONB column can easily exceed the size of the table itself.

⚠️ Solution: Use specific B-Tree indexes on the most queried JSON paths instead of a blanket GIN index.
targeted-indexes.sql
sql
-- ❌ Don't: Blanket GIN index on everything
CREATE INDEX idx_orders_all ON orders USING GIN (data);

-- ✅ Do: Targeted indexes on specific paths
CREATE INDEX idx_orders_customer ON orders ((data->>'customer_id'));
CREATE INDEX idx_orders_status ON orders ((data->>'status'));

-- ✅ Do: Partial indexes for common queries
CREATE INDEX idx_orders_pending ON orders ((data->>'created_at'))
WHERE data->>'status' = 'pending';

3. Information Leakage

Storing "everything" in a JSON blob often leads to sensitive data (PII, tokens) being persisted where it shouldn't be.

Solution: Explicitly separate metadata_public and metadata_internal JSON columns. Apply column-level permissions.

Performance Checklist for 2026

  • Binary Format: Ensure you're using JSONB (Postgres) or native binary JSON (SQLite/SQL Server), not just TEXT
  • Partial Indexes: Index only what you query
  • Functional Indexes: Use CREATE INDEX idx ON orders ((data->>'user_id'))
  • Write Validation: Use CHECK constraints to enforce basic structure
  • Limit Depth: Deeply nested JSON (10+ levels) is a sign of poor modeling
  • De-normalization Audit: Regularly check if a frequently queried JSON key should be promoted to a column

Common Patterns That Work

Pattern: Event Sourcing with JSON Payloads

event-sourcing.sql
sql
CREATE TABLE events (
  id BIGSERIAL PRIMARY KEY,
  aggregate_id UUID NOT NULL,
  event_type TEXT NOT NULL,
  event_data JSONB NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  
  -- Index for replay
  CONSTRAINT valid_event_data CHECK (jsonb_typeof(event_data) = 'object')
);

CREATE INDEX idx_events_aggregate ON events (aggregate_id, created_at);
CREATE INDEX idx_events_type ON events (event_type, created_at);

Pattern: User-Defined Fields

user-defined-fields.sql
sql
CREATE TABLE contacts (
  id UUID PRIMARY KEY,
  -- Core fields (always present)
  email TEXT NOT NULL,
  name TEXT NOT NULL,
  
  -- User-defined custom fields
  custom_fields JSONB DEFAULT '{}',
  
  -- Validate custom fields structure
  CONSTRAINT valid_custom_fields CHECK (
    jsonb_typeof(custom_fields) = 'object'
    AND pg_column_size(custom_fields) < 65536  -- 64KB limit
  )
);

-- Let users query their custom fields
CREATE INDEX idx_contacts_custom ON contacts USING GIN (custom_fields);

Anti-Patterns to Avoid

Anti-Pattern: The "Everything Bag"

anti-pattern-everything-bag.sql
sql
-- ❌ Don't do this
CREATE TABLE entities (
  id UUID PRIMARY KEY,
  type TEXT,
  data JSONB  -- Everything goes here!
);

-- Problems:
-- 1. No type safety
-- 2. No FK constraints
-- 3. Impossible to index efficiently
-- 4. Schema drift guaranteed

Anti-Pattern: JSON for Relationships

anti-pattern-json-relationships.sql
sql
-- ❌ Don't store arrays of IDs in JSON
{
  "order_id": "ord_123",
  "product_ids": ["prod_1", "prod_2", "prod_3"]
}

-- ✅ Do use a proper junction table
CREATE TABLE order_items (
  order_id UUID REFERENCES orders(id),
  product_id UUID REFERENCES products(id),
  quantity INT NOT NULL,
  PRIMARY KEY (order_id, product_id)
);

References

Continue Learning

About the Author

AT

Adam Tse

Founder & Lead Developer · 10+ years experience

Full-stack engineer with 10+ years of experience building developer tools and APIs. Previously worked on data infrastructure at scale, processing billions of JSON documents daily. Passionate about creating privacy-first tools that don't compromise on functionality.

JavaScript/TypeScript Web Performance Developer Tools Data Processing