API Endpoints
GET
/health
Check the health status of the BrowserElf service. Returns service status, uptime information, and current timestamp.
Response Example
{
"status": "ok",
"message": "BrowserElf is alive 🧝♂️",
"timestamp": "2024-01-15T10:30:00.000Z"
}
GET
/scrape
Easy browser testing endpoint. Pass URL as query parameter for quick testing.
Browser URL Example
https://your-domain.com/scrape?url=https://example.com&stealth=true&format=json
POST
/scrape
The main scraping endpoint. Extract content, take screenshots, and get structured data from any website.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Required | The URL to scrape (must include protocol) |
| screenshot | boolean | Optional | Whether to take a screenshot (default: false) |
| screenshotOptions | object | Optional | Puppeteer screenshot configuration |
| format | string | Optional | Response format: "html", "json", or "raw" |
| selector | string | Optional | CSS selector for specific content extraction |
| headers | object | Optional | Custom HTTP headers to send with request |
| extract | array | Optional | Content types to extract: ["text", "links", "images", "metadata"] |
| stealth | boolean | Optional | Enable stealth mode for bypassing security (default: true) |
| forceProxy | boolean | Optional | Force proxy usage (bypasses smart proxy logic, default: false) |
Request Example
curl -X POST http://127.0.0.1:3000/scrape \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"screenshot": true,
"extract": ["text", "links", "metadata"],
"selector": "h1",
"headers": {"User-Agent": "CustomBot/1.0"},
"format": "json",
"stealth": true
}'
{
"url": "https://example.com",
"status": 200,
"timestamp": "2024-01-15T10:30:00.000Z",
"loadTimeMs": 1250,
"size": 45230,
"html": "<!DOCTYPE html>...",
"text": "Example Domain...",
"links": ["https://www.iana.org/"],
"images": ["https://example.com/image.jpg"],
"metadata": {
"title": "Example Domain",
"description": "This domain is for use in examples",
"canonical": "https://example.com/",
"headers": ["Example Domain"],
"favicon": "/favicon.ico"
},
"screenshot": "iVBORw0KGgoAAAANSUhEUgAA..."
}
GET
/logs
Retrieve the last 20 scraping requests with detailed information including status, timing, and any errors.
Response Example
{
"logs": [
{
"url": "https://example.com",
"timestamp": "2024-01-15T10:30:00.000Z",
"options": { "screenshot": true },
"status": "success",
"loadTime": 1250,
"error": null
}
],
"total": 42,
"timestamp": "2024-01-15T10:30:00.000Z"
}
Features & Capabilities
Screenshot Capture
Take full-page or viewport screenshots with customizable Puppeteer options including viewport size, format, and quality settings.
Smart Content Extraction
Extract text, links, images, and metadata. Use CSS selectors for precise content targeting with Cheerio parsing.
Metadata Parsing
Automatically extract page title, description, canonical URLs, favicons, and heading structure (H1-H3).
High Performance
Optimized for speed with timeout handling, connection pooling, and efficient memory management for large-scale scraping.
Detailed Logging
Comprehensive request logging with performance metrics, error tracking, and request history (last 100 requests).
Flexible Output
Multiple response formats: JSON for structured data, HTML for raw content, or raw format with headers.
Robust Error Handling
Comprehensive error handling for timeouts, invalid URLs, network issues, and screenshot failures with detailed error messages.
Custom Headers
Send custom HTTP headers including User-Agent strings, authentication tokens, and other request modifications.
Stealth Mode
Advanced stealth features to bypass Cloudflare and other security measures with human-like behavior patterns.