Overview
SpiderSite is an intelligent website crawler with AI-powered lead generation. It crawls websites, extracts contact information, and optionally applies AI analysis for company insights, team identification, and lead scoring.Version 2.10.0: All AI features now combine into a single efficient API call, including custom prompts for tailored analysis.
How SpiderSite Works
The 5 Request Types
SpiderSite supports 5 different levels of extraction, from basic scraping to full AI analysis:| Type | Description | AI Used | Cost |
|---|---|---|---|
| 1. Basic Scraping | URL → markdown compendium only | No | Free |
| 2. Contact Extraction | Scrape + contacts/social media | No | Free |
| 3. AI Lead Intelligence | + team, company info, pain points | Yes | AI tokens |
| 4. CHAMP Lead Scoring | + lead scoring with product/ICP | Yes | AI tokens |
| 5. Custom AI Prompts | + your own analysis prompts | Yes | AI tokens |
Example 1: Basic Contact Extraction (No AI)
The simplest request - just provide a URL:- Emails, phones, addresses
- Social media links (14 platforms)
- Markdown compendium (fit level)
- No AI tokens used
Example 2: Full Lead Intelligence (AI Enabled)
Extract company info and team members:Request Body
- All contact info
- Company vitals (name, summary, industry, services, target audience)
- Team members (names, titles, emails, LinkedIn)
- Pain points analysis
- Markdown compendium
Example 3: CHAMP Lead Scoring
Complete lead scoring with the CHAMP framework:Request Body
- Everything from Example 2, plus:
- CHAMP Analysis:
- Challenges: Specific pain points matched to your solution
- Authority: Decision makers and buying process
- Money: Budget indicators and funding status
- Prioritization: Urgency signals and priority level
- ICP fit score (0-1)
- Personalization hooks for outreach
Example 4: Custom AI Analysis (v2.10.0)
Extract specific information using your own prompts:Request Body
Example 5: Combined AI + Custom Prompt (ONE Call!)
All AI features in a single API call for maximum efficiency:Request Body
- Team members
- Company info
- Pain points
- Lead scoring (CHAMP)
- Custom competitive intel
Example 6: Minimal Compendium for LLM Context
Optimize for RAG/LLM applications with minimal token usage:Request Body
| Level | Size | Best For |
|---|---|---|
raw | 100% | Full fidelity, archival |
fit | ~60% | General purpose (default) |
citations | ~35% | Academic format with sources |
minimal | ~15% | LLM consumption, token savings |
Example 7: SPA-Heavy Site
For React/Vue/Angular sites that need JavaScript rendering:Request Body
Response Structure
Large Compendiums (R2 Storage)
When compendiums are too large, they’re stored in Cloudflare R2:Complete Workflow Example
Here’s a complete workflow from submission to result retrieval:Best Practices
When to use AI features
When to use AI features
Use AI features when:
- Qualifying high-value leads
- Building targeted outreach campaigns
- Identifying decision makers
- Scoring leads by ICP fit
- Bulk contact extraction
- Budget-sensitive scraping
- When you only need contact info
Optimizing crawl strategy
Optimizing crawl strategy
bestfirst (default): Best for most use cases - intelligent prioritizationSitemap-first (automatic): Used automatically when sitemap.xml discoveredbfs: When you need broad coverage across sectionsdfs: When you need deep coverage of specific sections
Choosing cleanup level
Choosing cleanup level
| Level | Use Case |
|---|---|
raw | Academic research, legal compliance |
fit | General purpose (default) |
citations | Research documents with sources |
minimal | LLM/RAG applications |
Custom AI prompt tips
Custom AI prompt tips
Be specific: Clearly define what data you want extractedUse json_schema: Helps the AI return structured dataSet output_field_name: Organize multiple custom analysesAdjust temperature: Lower (0.1) for factual extraction, higher (0.5+) for creative analysis
Error Handling
URL Not Accessible
URL Not Accessible
Error: “Failed to connect to target URL”Causes:
- Invalid URL
- Site blocking bots
- Site requires authentication
- Verify URL is correct and publicly accessible
- Check if site blocks automated access
Timeout
Timeout
Error: “Page load timeout exceeded”Causes:
- Slow-loading site
- Heavy JavaScript rendering
- Increase
timeoutparameter (max 120s) - Increase
spa_timeoutfor SPA sites - Reduce
max_pages
Rate Limit Exceeded
Rate Limit Exceeded
Error: “Rate limit exceeded”Solutions:
- Implement delays between requests
- Use exponential backoff
- Contact support for higher limits
Limitations
robots.txt: SpiderSite respects robots.txt directives
