TL;DR
- Problem:
JSON.parse()loads entire file into memory — crashes on large files - Solution: Use streaming parsers like
stream-jsonorbfj - Best for logs: NDJSON (Newline Delimited JSON) — one object per line
- Rule of thumb: If file > 100MB, always stream
- Memory savings: From 2GB+ to ~50MB for a 500MB file
The "Heap Out of Memory" Problem
You've probably seen this error at 3 AM when your production server decides to give up:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
<--- Last few GCs --->
[12345:0x5555555] 12000 ms: Mark-sweep 1398.2 (1425.6) -> 1398.0 (1425.6) MB,
1520.0 / 0.0 ms (average mu = 0.089, current mu = 0.002)
This happens because JSON.parse() is synchronous and greedy.
It reads the entire file into memory, parses it all at once, and then hands you the result.
For a 500MB JSON file, you need at least 1-2GB of RAM just for parsing.
The Naive Approach (Don't Do This)
Here's what most tutorials show you — and what will eventually break in production:
const fs = require('fs');
// ❌ This loads the ENTIRE file into memory
const data = fs.readFileSync('massive-file.json', 'utf8');
const parsed = JSON.parse(data);
// By the time you get here, you've already used 2GB of RAM
parsed.forEach(item => processItem(item)); This works fine for files under 50MB. Beyond that, you're playing Russian roulette with your server's memory.
The Streaming Solution
Streaming parsers read the file in chunks, parse incrementally, and emit objects one at a time. Your memory usage stays constant regardless of file size.
Option 1: stream-json (Most Popular)
stream-json is the gold standard for streaming JSON in Node.js. It handles nested structures, arrays, and complex objects.
npm install stream-json const fs = require('fs');
const { parser } = require('stream-json');
const { streamArray } = require('stream-json/streamers/StreamArray');
const { chain } = require('stream-chain');
// ✅ Process a massive array of objects with constant memory
const pipeline = chain([
fs.createReadStream('massive-file.json'),
parser(),
streamArray(),
]);
let count = 0;
pipeline.on('data', ({ key, value }) => {
// 'value' is a single parsed object from the array
processItem(value);
count++;
if (count % 10000 === 0) {
console.log(`Processed ${count} items...`);
}
});
pipeline.on('end', () => {
console.log(`Done! Processed ${count} items total.`);
});
pipeline.on('error', (err) => {
console.error('Parsing error:', err);
}); -
JSON.parse() on 500MB file: ~2GB RAM-
stream-json on 500MB file: ~50MB RAM (constant)
Option 2: bfj (Big Friendly JSON)
bfj provides a simpler API if you just need to read or write large JSON files.
const bfj = require('bfj');
const fs = require('fs');
// Read large JSON file as a stream
const stream = bfj.walk(fs.createReadStream('massive-file.json'));
stream.on('value', (value) => {
// Called for each value in the JSON
if (typeof value === 'object' && value !== null) {
processItem(value);
}
});
stream.on('end', () => {
console.log('Done parsing!');
});
// Or use the promise-based API for simpler cases
async function readLargeFile() {
const data = await bfj.read('massive-file.json');
// Note: This still loads into memory, but does so asynchronously
// Use bfj.walk() for true streaming
} NDJSON: The Better Format for Large Data
If you control the data format, NDJSON (Newline Delimited JSON) is the way to go. Instead of one giant array, you have one JSON object per line:
{"id": 1, "name": "Alice", "score": 95}
{"id": 2, "name": "Bob", "score": 87}
{"id": 3, "name": "Charlie", "score": 92}
{"id": 4, "name": "Diana", "score": 88} Why is this better? Because you can process it line by line with zero parsing overhead:
const fs = require('fs');
const readline = require('readline');
async function processNDJSON(filename) {
const fileStream = fs.createReadStream(filename);
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
let count = 0;
for await (const line of rl) {
if (line.trim()) {
const obj = JSON.parse(line);
processItem(obj);
count++;
}
}
console.log(`Processed ${count} records`);
}
processNDJSON('data.ndjson'); - Each line is independent — can process in parallel
- Easy to append new records (just add a line)
- Simple error recovery — skip bad lines, continue processing
- Used by: Elasticsearch, BigQuery, many logging systems
Converting JSON Array to NDJSON
const fs = require('fs');
const { parser } = require('stream-json');
const { streamArray } = require('stream-json/streamers/StreamArray');
const { chain } = require('stream-chain');
// Convert large JSON array to NDJSON
const input = chain([
fs.createReadStream('input.json'),
parser(),
streamArray(),
]);
const output = fs.createWriteStream('output.ndjson');
input.on('data', ({ value }) => {
output.write(JSON.stringify(value) + '\n');
});
input.on('end', () => {
output.end();
console.log('Conversion complete!');
}); Parallel Processing with Worker Threads
For CPU-intensive processing, combine streaming with Worker Threads:
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const fs = require('fs');
const readline = require('readline');
if (isMainThread) {
// Main thread: distribute work to workers
const NUM_WORKERS = 4;
const workers = [];
let lineCount = 0;
for (let i = 0; i < NUM_WORKERS; i++) {
workers.push(new Worker(__filename));
}
const rl = readline.createInterface({
input: fs.createReadStream('huge-data.ndjson'),
crlfDelay: Infinity
});
rl.on('line', (line) => {
// Round-robin distribution to workers
const workerIndex = lineCount % NUM_WORKERS;
workers[workerIndex].postMessage(line);
lineCount++;
});
rl.on('close', () => {
workers.forEach(w => w.postMessage('DONE'));
});
} else {
// Worker thread: process items
parentPort.on('message', (line) => {
if (line === 'DONE') {
process.exit(0);
}
const obj = JSON.parse(line);
// Heavy processing here...
const result = expensiveOperation(obj);
parentPort.postMessage(result);
});
} Performance Benchmarks
Real-world benchmarks on a 500MB JSON file (1 million records):
| Method | Time | Peak Memory | Notes |
|---|---|---|---|
JSON.parse() | 8.2s | 2.1 GB | Crashes on default heap |
stream-json | 12.5s | 52 MB | Constant memory |
bfj.walk() | 14.1s | 48 MB | Simpler API |
| NDJSON + readline | 6.8s | 35 MB | Fastest, if you control format |
| NDJSON + 4 Workers | 2.1s | 180 MB | Best for CPU-heavy work |
When to Use What
- File < 50MB? → Just use
JSON.parse(), you're fine - File 50-500MB? → Use
stream-jsonfor safety - File > 500MB? → NDJSON if possible, otherwise
stream-json - Real-time logs? → Always NDJSON
- Need parallel processing? → NDJSON + Worker Threads
Common Pitfalls
1. Forgetting Backpressure
If you're writing to a database or file while streaming, you need to handle backpressure:
const pipeline = chain([
fs.createReadStream('data.json'),
parser(),
streamArray(),
]);
pipeline.on('data', async ({ value }) => {
// ❌ BAD: This doesn't wait, can overwhelm the database
db.insert(value);
});
// ✅ GOOD: Use a transform stream with proper async handling
const { Transform } = require('stream');
const dbWriter = new Transform({
objectMode: true,
async transform(chunk, encoding, callback) {
try {
await db.insert(chunk.value);
callback();
} catch (err) {
callback(err);
}
}
});
pipeline.pipe(dbWriter); 2. Not Handling Partial Parses
With NDJSON, a line might be incomplete if the file is being written to:
for await (const line of rl) {
if (!line.trim()) continue;
try {
const obj = JSON.parse(line);
processItem(obj);
} catch (err) {
// Log the error but continue processing
console.error('Skipping malformed line:', line.substring(0, 100));
}
} Production Tips
- Monitor memory: Use
process.memoryUsage()to track heap usage - Set heap limits explicitly:
node --max-old-space-size=4096 script.js - Use compression: GZIP your JSON files — streaming works with compressed files too
- Consider alternatives: For truly massive datasets, look at Parquet, Avro, or databases
What's Next?
Now you can handle JSON files of any size without breaking a sweat. Here's where to go next:
- Validate your streaming data with JSON Schema
- Master JSON.parse() edge cases
- Debug JSON parsing errors
- Try our JSON tools — format and validate JSON instantly
Go stream some data. Your server's RAM will thank you.