Open Source Browserless Web Scraping API with Human-like Behavior
π― Unified Solution: Website + API on a single domain
π§ Human-like Behavior: 40+ anti-detection techniques
π Deploy Anywhere: Docker, Node.js+PM2, or Development
- π Unified Architecture: Website and API on one domain
- π§ Human-like Intelligence: Natural mouse movements, smart scrolling, behavioral randomization
- π Multiple Formats: HTML, text, screenshots, PDFs
- β‘ Batch Processing: Handle multiple URLs efficiently
- π Production Ready: Docker, PM2, Nginx, SSL support
- π‘οΈ Anti-Detection: 40+ stealth techniques for reliable scraping
# 1. Clone and configure
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
# Quick setup (makes scripts executable + creates .env)
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh
# Then edit: nano .env # Update DOMAIN, SUBDOMAIN, and AUTH_TOKENChoose your deployment:
| Method | Command | Best For |
|---|---|---|
| π³ Docker | docker-compose up -d |
Production, easy deployment |
| π§ Auto Setup | chmod +x scripts/setup.sh && sudo ./scripts/setup.sh |
VPS/Server with full control |
| π» Development | npm install && npm start |
Local development, testing |
Access your HeadlessX:
π Website: https://your-subdomain.yourdomain.com
π§ Health: https://your-subdomain.yourdomain.com/api/health
π Status: https://your-subdomain.yourdomain.com/api/status?token=YOUR_AUTH_TOKEN
HeadlessX v1.2.0 introduces a completely refactored modular architecture for better maintainability, scalability, and development experience.
- π§ Separation of Concerns: Distinct modules for configuration, services, controllers, and middleware
- π Better Performance: Optimized browser management and resource usage
- π οΈ Developer Experience: Clear module boundaries and dependency injection
- π¦ Production Ready: Enhanced error handling and logging with correlation IDs
- π Security: Improved authentication and rate limiting
- π Monitoring: Structured logging and health monitoring
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Routes βββββΆβ Controllers βββββΆβ Services β
β (api.js) β β (rendering.js)β β (browser.js) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Middleware β β Utils β β Config β
β (auth.js) β β (logger.js) β β (index.js) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Quick Migration from v1.1.0:
- The original
src/server.js(3079 lines) has been broken down into 20+ focused modules - Environment variable
TOKENis nowAUTH_TOKEN - PM2 config moved from
config/ecosystem.config.jstoecosystem.config.js - All functionality preserved with improved performance and maintainability
π Detailed Documentation: MODULAR_ARCHITECTURE.md
# Install Docker (if needed)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Deploy HeadlessX
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Configure DOMAIN, SUBDOMAIN, AUTH_TOKEN
# Start services
docker-compose up -d
# Optional: Setup SSL
sudo apt install certbot
sudo certbot --standalone -d your-subdomain.yourdomain.comDocker Management:
docker-compose ps # Check status
docker-compose logs headlessx # View logs
docker-compose restart # Restart services
docker-compose down # Stop services# Automated setup (recommended)
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Configure environment
chmod +x scripts/setup.sh
sudo ./scripts/setup.sh # Installs dependencies, builds website, starts PM2π Nginx Configuration (Auto-handled by setup script):
The setup script automatically configures nginx, but if you need to manually configure:
# Copy and configure nginx site
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx
# Replace placeholders with your actual domain
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/your-subdomain.yourdomain.com/g' /etc/nginx/sites-available/headlessx
# Enable the site
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default
# Test and reload nginx
sudo nginx -t && sudo systemctl reload nginxManual setup (if not using setup script):
sudo apt update && sudo apt upgrade -y
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs build-essential
npm install && npm run build
sudo npm install -g pm2
npm run pm2:startPM2 Management:
npm run pm2:status # Check status
npm run pm2:logs # View logs
npm run pm2:restart # Restart server
npm run pm2:stop # Stop servergit clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Set AUTH_TOKEN, DOMAIN=localhost, SUBDOMAIN=headlessx
# Make scripts executable
chmod +x scripts/*.sh
# Install dependencies
npm install
cd website && npm install && npm run build && cd ..
# Start development server
npm start # Access at http://localhost:3000HeadlessX Routes:
βββ /favicon.ico β Favicon
βββ /robots.txt β SEO robots file
βββ /api/health β Health check (no auth required)
βββ /api/status β Server status (requires token)
βββ /api/render β Full page rendering
βββ /api/html β HTML extraction
βββ /api/content β Clean text extraction
βββ /api/screenshot β Screenshot generation
βββ /api/pdf β PDF generation
βββ /api/batch β Batch URL processing
π Request Flow:
- Nginx receives request on port 80/443
- Proxies to Node.js server on port 3000
- Server routes based on path:
/api/*β API endpoints/*β Website files (built Next.js app)
curl https://your-subdomain.yourdomain.com/api/healthcurl -X POST "https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "timeout": 30000}'curl "https://your-subdomain.yourdomain.com/api/screenshot?token=YOUR_AUTH_TOKEN&url=https://example.com&fullPage=true" \
-o screenshot.pngcurl -X POST "https://your-subdomain.yourdomain.com/api/text?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "waitForSelector": "main"}'curl -X POST "https://your-subdomain.yourdomain.com/api/pdf?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "format": "A4"}' \
-o document.pdfHTTP Request Module Configuration:
{
"url": "https://your-subdomain.yourdomain.com/api/html",
"method": "POST",
"headers": {
"Content-Type": "application/json"
},
"qs": {
"token": "YOUR_AUTH_TOKEN"
},
"body": {
"url": "{{url_to_scrape}}",
"timeout": 30000,
"waitForSelector": "{{optional_selector}}"
}
}Webhooks by Zapier Setup:
- URL:
https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN - Method: POST
- Headers:
Content-Type: application/json - Body:
{
"url": "{{url_from_trigger}}",
"timeout": 30000,
"humanBehavior": true
}HTTP Request Node:
{
"url": "https://your-subdomain.yourdomain.com/api/html",
"method": "POST",
"authentication": "queryAuth",
"query": {
"token": "YOUR_AUTH_TOKEN"
},
"headers": {
"Content-Type": "application/json"
},
"body": {
"url": "={{$json.url}}",
"timeout": 30000,
"humanBehavior": true
}
}Available via n8n Community Node:
- Install:
npm install n8n-nodes-headlessx - GitHub Repository
import requests
def scrape_with_headlessx(url, token):
response = requests.post(
"https://your-subdomain.yourdomain.com/api/html",
params={"token": token},
json={
"url": url,
"timeout": 30000,
"humanBehavior": True
}
)
return response.json()
# Usage
result = scrape_with_headlessx("https://example.com", "YOUR_TOKEN")
print(result['html'])const axios = require('axios');
async function scrapeWithHeadlessX(url, token) {
try {
const response = await axios.post(
`https://your-subdomain.yourdomain.com/api/html?token=${token}`,
{
url: url,
timeout: 30000,
humanBehavior: true
}
);
return response.data;
} catch (error) {
console.error('Scraping failed:', error.message);
throw error;
}
}
// Usage
scrapeWithHeadlessX('https://example.com', 'YOUR_TOKEN')
.then(result => console.log(result.html))
.catch(error => console.error(error));curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example1.com",
"https://example2.com",
"https://example3.com"
],
"timeout": 30000,
"humanBehavior": true
}'curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com", "https://httpbin.org"],
"format": "text",
"options": {"timeout": 30000}
}'HeadlessX v1.2.0 - Modular Architecture/
βββ π src/ # Modular application source
β βββ π config/ # Configuration management
β β βββ index.js # Main configuration loader
β β βββ browser.js # Browser-specific settings
β βββ π utils/ # Utility functions
β β βββ errors.js # Error handling & categorization
β β βββ logger.js # Structured logging
β β βββ helpers.js # Common utilities
β βββ π services/ # Business logic services
β β βββ browser.js # Browser lifecycle management
β β βββ stealth.js # Anti-detection techniques
β β βββ interaction.js # Human-like behavior
β β βββ rendering.js # Core rendering logic
β βββ π middleware/ # Express middleware
β β βββ auth.js # Authentication
β β βββ error.js # Error handling
β βββ π controllers/ # Request handlers
β β βββ system.js # Health & status endpoints
β β βββ rendering.js # Main rendering endpoints
β β βββ batch.js # Batch processing
β β βββ get.js # GET endpoints & docs
β βββ π routes/ # Route definitions
β β βββ api.js # API route mappings
β β βββ static.js # Static file serving
β βββ app.js # Main application setup
β βββ server.js # Entry point for PM2
β βββ rate-limiter.js # Rate limiting implementation
βββ π website/ # Next.js website (unchanged)
β βββ app/ # Next.js 13+ app directory
β βββ components/ # React components
β βββ .env.example # Website environment template
β βββ next.config.js # Next.js configuration
β βββ package.json # Website dependencies
βββ π scripts/ # Deployment & management scripts
β βββ setup.sh # Automated installation (updated)
β βββ update_server.sh # Server update script (updated)
β βββ verify-domain.sh # Domain verification
β βββ test-routing.sh # Integration testing
βββ π nginx/ # Nginx configuration
β βββ headlessx.conf # Nginx proxy config
βββ π docker/ # Docker deployment (updated)
β βββ Dockerfile # Container definition
β βββ docker-compose.yml # Docker Compose setup
βββ ecosystem.config.js # PM2 configuration (moved to root)
βββ .env.example # Environment template (updated)
βββ package.json # Server dependencies (updated)
βββ MODULAR_ARCHITECTURE.md # Architecture documentation
βββ README.md # This file
# 1. Install dependencies
npm install
# 2. Build website
cd website
npm install
npm run build
cd ..
# 3. Set environment variables
export AUTH_TOKEN="development_token_123"
export DOMAIN="localhost"
export SUBDOMAIN="headlessx"
# 4. Start server
npm start # Uses src/app.js
# 5. Access locally
# Website: http://localhost:3000
# API: http://localhost:3000/api/health# Test server and website integration
bash scripts/test-routing.sh localhost
# Test with environment variables
bash scripts/verify-domain.shCreate your .env file from the template:
cp .env.example .env
nano .envRequired configuration:
# Security Token (Generate a secure random string)
AUTH_TOKEN=your_secure_token_here
# Domain Configuration
DOMAIN=yourdomain.com
SUBDOMAIN=headlessx
# Optional: Browser Settings
BROWSER_TIMEOUT=60000
MAX_CONCURRENT_BROWSERS=5
# Optional: Server Settings
PORT=3000
NODE_ENV=productionOption 1: Automatic (Recommended)
# The setup script automatically replaces domain placeholders
sudo ./scripts/setup.shOption 2: Manual Configuration
# Copy nginx configuration
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx
# Replace domain placeholders (replace with your actual domain)
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/headlessx.yourdomain.com/g' /etc/nginx/sites-available/headlessx
# Example: If your domain is "api.example.com"
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/api.example.com/g' /etc/nginx/sites-available/headlessx
# Enable site and reload nginx
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginxYour final URLs will be:
- Website:
https://your-subdomain.yourdomain.com - API Health:
https://your-subdomain.yourdomain.com/api/health - API Endpoints:
https://your-subdomain.yourdomain.com/api/*
| Endpoint | Method | Description | Auth Required |
|---|---|---|---|
/api/health |
GET | Health check | β |
/api/status |
GET | Server status | β |
/api/render |
POST | Full page rendering (JSON) | β |
/api/html |
GET/POST | Raw HTML extraction | β |
/api/content |
GET/POST | Clean text extraction | β |
/api/screenshot |
GET | Screenshot generation | β |
/api/pdf |
GET | PDF generation | β |
/api/batch |
POST | Batch URL processing | β |
All endpoints (except /api/health) require a token via:
- Query parameter:
?token=YOUR_TOKEN - Header:
X-Token: YOUR_TOKEN - Header:
Authorization: Bearer YOUR_TOKEN
Visit your HeadlessX website for full API documentation with examples, or check:
curl https://your-subdomain.yourdomain.com/api/health
curl "https://your-subdomain.yourdomain.com/api/status?token=YOUR_TOKEN"# PM2 logs
npm run pm2:logs
pm2 logs headlessx --lines 100
# Docker logs
docker-compose logs -f headlessx
# Nginx logs
sudo tail -f /var/log/nginx/access.loggit pull origin main
npm run build # Rebuild website
npm run pm2:restart # PM2
# OR
docker-compose restart # Docker"npm ci" Error (missing package-lock.json):
chmod +x scripts/generate-lockfiles.sh
./scripts/generate-lockfiles.sh # Generate lock files
# OR
npm install --production # Use install instead"Cannot find module 'express'":
npm install # Install dependenciesSystem dependency errors (Ubuntu):
sudo apt update && sudo apt install -y \
libatk1.0-0t64 libatk-bridge2.0-0t64 libcups2t64 \
libatspi2.0-0t64 libasound2t64 libxcomposite1PM2 not starting:
sudo npm install -g pm2
chmod +x scripts/setup.sh # Make script executable
pm2 start config/ecosystem.config.js
pm2 logs headlessx # Check errorsScript permission errors:
# Make all scripts executable
chmod +x scripts/*.sh
# Or use the quick setup
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.shPlaywright browser installation errors:
# Use dedicated Playwright setup script
chmod +x scripts/setup-playwright.sh
./scripts/setup-playwright.sh
# Or install manually:
sudo apt update && sudo apt install -y \
libgtk-3-0t64 libpangocairo-1.0-0 libcairo-gobject2 \
libgdk-pixbuf-2.0-0 libdrm2 libxss1 libxrandr2 \
libasound2t64 libatk1.0-0t64 libnss3
# Install only Chromium (most stable)
npx playwright install chromium
# Alternative: Use Docker (avoids dependency issues)
docker-compose up -d- Token Authentication: Secure API access with custom tokens
- Rate Limiting: Nginx-level request throttling
- Security Headers: XSS, CSRF, and clickjacking protection
- Bot Protection: Common attack vector blocking
- SSL/TLS: Automatic HTTPS with Let's Encrypt
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- π Documentation: Visit your deployed website for full API docs
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
HeadlessX v1.1.0 - The most advanced open-source browserless web scraping solution.
Made with β€οΈ for the developer community.