By Shyam Verma

Building a Rendering Proxy under 1,000 Lines of Code

Building a Rendering Proxy under 1,000 Lines of Code

When traditional crawlers hit modern JavaScript-heavy websites, they see nothing but empty divs and loading spinners. After dealing with this problem across multiple client projects, I built JS-Rendering-Proxy-Docker – a containerized solution that makes any crawler JavaScript-aware without changing a single line of existing code.

The result? A 1,000-line proxy that's been battle-tested across production environments, handling everything from React SPAs to complex e-commerce sites. Here's the technical breakdown of how it works and why it matters.

The Problem: When Your Crawler Sees Empty Pages

Picture this: You've built a sophisticated web scraping system for a client's competitive intelligence platform. Everything works perfectly until they want to scrape a React-based e-commerce site. Your crawler makes the request and gets this:

<!DOCTYPE html>
<html><body><div id="root"></div><script src="app.js"></script></body></html>

Meanwhile, customers see a fully loaded page with products, prices, reviews – everything your client needs. The JavaScript renders after page load, but your traditional HTTP client never sees it.

This exact scenario happened to me three times in 2024. Each time, the "solution" was to rewrite the entire crawler using headless browsers. Expensive, time-consuming, and honestly, overkill.

The Insight: Transparent Proxy Architecture

Instead of rebuilding crawlers, what if we could add JavaScript rendering as a transparent layer? Your existing crawler keeps working exactly as before, but requests get intercepted, JavaScript gets executed, and fully rendered HTML comes back.

That's the core insight behind JS-Rendering-Proxy-Docker: augment, don't replace.

Building Block by Block: The Technical Journey

Version 1.0: The Minimum Viable Proxy

Started with a simple Fastify server + Puppeteer combination:

// The core concept in ~50 lines
app.get('/', async (request, reply) => {
  const targetUrl = request.query.render_url
  const page = await browser.newPage()
  await page.goto(targetUrl, { waitUntil: 'networkidle0' })
  const html = await page.content()
  await page.close()
  return html
})

This worked for basic SPAs, but production taught me harsh lessons quickly.

The Performance Problem: Version 1.4.x Series

Early March 2025, I hit the first major bottleneck. The proxy was spawning Chrome processes that never died, consuming memory until the container crashed. The commit history tells the story:

March 25, 2025: "Fix: Update browser launch settings... disable piping and specify executable path. Add delay for page stability and improve error handling"

This was the moment I learned that Puppeteer process management isn't just about await page.close(). You need zombie process detection, memory monitoring, and graceful degradation.

The Resource Optimization: Version 1.5.x Series

By late March, I was handling concurrent load testing and discovered another bottleneck: resource blocking. Most crawlers don't need images, CSS, or media files – they just need the rendered DOM structure.

March 31, 2025: "Add Chrome cache size configurations... Adjust maxWorkers from 4 to 2 for better resource management"

The breakthrough was implementing selective resource blocking:

page.on('request', (req) => {
  const resourceType = req.resourceType()
  if (['image', 'stylesheet', 'media', 'font'].includes(resourceType)) {
    req.abort() // Block non-essential resources
  } else {
    req.continue()
  }
})

This single change reduced rendering time from 8-10 seconds to 2-3 seconds per page.

The Reliability Push: Version 1.7.x Series

March-April 2025 was all about production reliability. Real-world usage revealed edge cases I'd never considered:

April 1, 2025: "Introduced average CPU usage calculation over a 10-second window... Adjusted logging to reflect average CPU usage instead of instantaneous values"

This commit represents a key insight: monitoring must be averaged, not instantaneous. CPU spikes are normal during page rendering, but sustained high usage indicates problems.

The helper.js file emerged during this phase:

// Real-time system monitoring
function getCPUUsage() {
  const cpuStat = fs.readFileSync('/sys/fs/cgroup/cpu.stat', 'utf8')
  // Parse cgroup v2 metrics for accurate CPU usage
}

function cleanupChromeProcesses() {
  // Identify and terminate zombie Chrome processes
}

The Architecture That Emerged

After 50+ commits and real-world battle-testing, here's what the final architecture looks like:

Request Flow

Client → Authentication → URL Validation → Resource Monitoring 
→ Chrome Launch → Page Render → Content Extract → Response

Key Components

Security Layer: SSRF protection blocks private IP ranges. I learned this the hard way when someone tried to scan internal networks through the proxy.

Resource Manager: CPU/memory monitoring with circuit breaker functionality. The system backs off when under pressure rather than crashing.

Browser Pool: Smart Chrome process management with cleanup routines for zombie processes.

Selective Rendering: Blocks images/CSS by default but allows override via headers.

Docker: The Game-Changer

Containerization solved the notorious "headless browser dependency hell" problem. The Dockerfile handles everything:

FROM node:18-slim
RUN apt-get update && apt-get install -y \
    chromium \
    fonts-liberation \
    fonts-noto-color-emoji \
    libnss3 \
    libxss1

ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium

Multi-architecture support (AMD64 + ARM64) means it runs anywhere Docker runs. No more "works on my machine" problems.

The Lessons: What Production Taught Me

Resource Management is Everything

Chrome processes are resource-hungry beasts. Without proper cleanup and monitoring, they'll consume all available memory and crash your system. The helper functions that monitor CPU usage and clean up zombie processes aren't nice-to-haves – they're essential.

Performance vs Completeness Tradeoffs

Blocking CSS/images improves performance by 75%, but occasionally breaks sites that use CSS for JavaScript functionality. The default configuration works for 95% of cases, but you need override mechanisms for edge cases.

Error Transparency Matters

The proxy preserves original HTTP status codes (404, 500, etc.) so your crawler's error handling logic still works. This seems obvious, but many proxy solutions mask these crucial error signals.

Stealth Mode Has Limits

The puppeteer-extra-plugin-stealth helps with basic bot detection, but sophisticated sites can still identify automated requests. It's an arms race, not a silver bullet.

Why This Matters: The Bigger Picture

This project represents more than solving a technical problem – it's about adaptation strategy for the modern web. As websites become increasingly JavaScript-dependent, our tooling must evolve accordingly.

The transparent proxy approach strikes the right balance: adding JavaScript rendering capabilities without requiring infrastructure overhauls. It makes modern web scraping accessible to teams with existing crawler investments.

Technical Deep-Dive: The Code That Matters

For developers wanting to understand the implementation, here are the critical pieces:

Browser Management

async function getBrowser() {
  if (!browser || !browser.isConnected()) {
    browser = await puppeteer.launch({
      headless: 'new',
      args: [
        '--no-sandbox',
        '--disable-setuid-sandbox', 
        '--disable-dev-shm-usage',
        '--disable-background-timer-throttling',
        '--max-old-space-size=512'
      ]
    })
  }
  return browser
}

Resource Monitoring

function isUnderPressure() {
  const cpuUsage = getCPUUsage()
  const memUsage = getMemoryUsage()
  return cpuUsage > 80 || memUsage > 85
}

Request Processing

app.get('/', async (request, reply) => {
  if (isUnderPressure()) {
    return reply.code(503).send({ error: 'Service temporarily unavailable' })
  }
  
  const page = await browser.newPage()
  await page.setRequestInterception(true)
  
  page.on('request', handleResourceBlocking)
  await page.goto(targetUrl, { waitUntil: 'networkidle0', timeout: 10000 })
  
  const content = await page.content()
  await page.close()
  
  return content
})

Production Deployment: Digital Ocean at Scale

Infrastructure: Deployed on Digital Ocean's App Platform with load balancer and auto-scaling Docker container pool
Scaling Configuration: Minimum 2 instances, maximum 8 instances, automatically scaling based on CPU utilization thresholds
Typical Load: Runs 3-4 instances during peak hours, scaling up for traffic spikes and down during low-demand periods
Cost Optimization: Auto-scaling ensures optimal resource usage while maintaining performance during traffic variations
Reliability: Containerized architecture provides consistent performance across instances with Digital Ocean's health checking and automatic failover
Production Metrics: Maintained 99.8% uptime over three months with zero manual intervention required

The Bottom Line

1,000 lines of code. 50+ production commits. Zero maintenance overhead after stabilization.

JS-Rendering-Proxy-Docker proves that sometimes the best solution isn't rebuilding everything from scratch, but building intelligent adaptation layers that preserve existing investments while adding new capabilities.

Whether you're scraping SPAs, analyzing dynamic content, or building next-generation crawlers, the challenge is similar: bridging the gap between traditional HTTP clients and the JavaScript-heavy modern web.

The complete source code is available on GitHub, battle-tested and ready for production deployment. It's containerized, well-documented, and actively maintained based on real-world usage patterns.

What JavaScript rendering challenges are you facing in your projects? The proxy might just solve them without changing your existing codebase.


Building something similar? I offer AI-augmented development services to help teams solve complex technical challenges like this. Get a senior developer with 20+ years of experience for a predictable monthly cost. Let's talk about your project →