Overview
v2.15.0 Feature : Orchestrated campaigns automatically chain three SpiderIQ services together:
SpiderMaps - Scrape businesses from Google Maps
SpiderSite - Crawl each business website for emails and company info
SpiderVerify - Verify extracted email addresses
One API call, complete lead data. Instead of managing three separate job types, the orchestrator handles everything automatically. You just call /next in a loop and retrieve aggregated results.
How It Works
The Chain
Step Service What Happens 1 SpiderMaps Searches Google Maps for businesses matching your query 2 Domain Filter Removes social media, review sites, directories (configurable) 3 SpiderSite Crawls each valid business website 4 Email Extract Pulls emails from crawled pages 5 SpiderVerify Verifies each email via SMTP 6 Aggregate Combines all data per business
Automatic Domain Filtering
The orchestrator automatically filters out non-scrapable domains:
Social Media (filter_social_media)
Review Sites (filter_review_sites)
yelp.com
tripadvisor.com
trustpilot.com
g2.com
capterra.com
Directories (filter_directories)
yellowpages.com
bbb.org
manta.com
booking.com
doordash.com
ubereats.com
linktr.ee
google.com/maps
maps.google.com
waze.com
apple.com/maps
Creating an Orchestrated Campaign
Basic Example
curl -X POST https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/submit \
-H "Authorization: Bearer <your_token>" \
-H "Content-Type: application/json" \
-d '{
"query": "restaurants",
"country_code": "LU",
"workflow": {
"spidersite": {
"enabled": true,
"max_pages": 5,
"extract_company_info": true
},
"spiderverify": {
"enabled": true,
"max_emails_per_business": 5
}
}
}'
Response:
{
"campaign_id" : "camp_lu_restaurants_20251223_abc123" ,
"status" : "active" ,
"query" : "restaurants" ,
"country_code" : "LU" ,
"total_locations" : 14 ,
"has_workflow" : true ,
"workflow_config" : {
"spidersite" : {
"enabled" : true ,
"max_pages" : 5 ,
"extract_company_info" : true
},
"spiderverify" : {
"enabled" : true ,
"max_emails_per_business" : 5
}
}
}
Full Configuration Example
Complete Workflow Payload
{
"query" : "restaurants" ,
"country_code" : "FR" ,
"name" : "France Restaurant Lead Gen" ,
"filter" : {
"mode" : "population" ,
"min_population" : 50000
},
"workflow" : {
"spidersite" : {
"enabled" : true ,
"max_pages" : 10 ,
"crawl_strategy" : "bestfirst" ,
"target_pages" : [ "contact" , "about" , "team" ],
"enable_spa" : true ,
"spa_timeout" : 30 ,
"extract_team" : true ,
"extract_company_info" : true ,
"extract_pain_points" : false ,
"product_description" : "AI-powered restaurant management software" ,
"icp_description" : "Restaurant owners looking to streamline operations" ,
"compendium" : {
"enabled" : true ,
"cleanup_level" : "fit" ,
"max_chars" : 100000
},
"timeout" : 30
},
"spiderverify" : {
"enabled" : true ,
"check_gravatar" : false ,
"check_dnsbl" : false ,
"smtp_timeout_secs" : 45 ,
"max_emails_per_business" : 5
},
"filter_social_media" : true ,
"filter_review_sites" : true ,
"filter_directories" : true ,
"filter_maps" : true
}
}
Workflow Configuration Reference
SpiderSite Options
Parameter Type Default Description enabledboolean falseEnable SpiderSite for each business max_pagesinteger 10Pages to crawl per website (1-50) crawl_strategystring "bestfirst"bestfirst, bfs, or dfstarget_pagesarray ["contact", "about", "team", "news", "blog"]Priority page types enable_spaboolean trueEnable SPA/JavaScript rendering spa_timeoutinteger 30SPA rendering timeout (10-120s) extract_teamboolean falseExtract team members with AI extract_company_infoboolean falseExtract company info with AI extract_pain_pointsboolean falseAnalyze pain points with AI product_descriptionstring nullYour product (for CHAMP scoring) icp_descriptionstring nullYour ICP (for CHAMP scoring) timeoutinteger 30Overall timeout (10-120s)
CHAMP Scoring : If you provide product_description, you must also provide icp_description (and vice versa).
SpiderVerify Options
Parameter Type Default Description enabledboolean falseEnable email verification check_gravatarboolean falseCheck for Gravatar images check_dnsblboolean falseCheck spam blacklists smtp_timeout_secsinteger 45SMTP timeout (10-120s) max_emails_per_businessinteger 10Max emails to verify (1-50)
Email Prioritization : The orchestrator automatically prioritizes business emails like contact@, info@, sales@ over generic addresses.
Domain Filter Options
Parameter Type Default Description filter_social_mediaboolean trueFilter Facebook, Instagram, etc. filter_review_sitesboolean trueFilter Yelp, TripAdvisor, etc. filter_directoriesboolean trueFilter YellowPages, BBB, etc. filter_mapsboolean trueFilter Google Maps links, Waze
Compendium Options
Parameter Type Default Description enabledboolean trueGenerate markdown compendium cleanup_levelstring "fit"raw, fit, citations, minimalmax_charsinteger 100000Max compendium size include_in_responseboolean trueInclude in API response remove_duplicatesboolean trueDeduplicate content
Monitoring Progress
Status Endpoint
Check real-time progress with the /status endpoint:
curl https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/{campaign_id}/status \
-H "Authorization: Bearer <your_token>"
Response with Workflow Progress:
{
"campaign_id" : "camp_lu_restaurants_20251223_abc123" ,
"status" : "active" ,
"progress" : {
"completed" : 5 ,
"failed" : 0 ,
"pending" : 9 ,
"total" : 14 ,
"percentage" : 35.7
},
"workflow_progress" : {
"businesses_total" : 150 ,
"sites_queued" : 5 ,
"sites_completed" : 120 ,
"sites_failed" : 2 ,
"verifies_queued" : 10 ,
"verifies_completed" : 80 ,
"verifies_failed" : 1 ,
"emails_found" : 350 ,
"emails_verified" : 280
},
"has_workflow" : true
}
Workflow Progress Fields
Field Description businesses_totalTotal businesses with valid domains sites_queuedSpiderSite jobs waiting sites_completedSpiderSite jobs finished sites_failedSpiderSite jobs failed verifies_queuedSpiderVerify jobs waiting verifies_completedSpiderVerify jobs finished verifies_failedSpiderVerify jobs failed emails_foundTotal emails extracted emails_verifiedTotal emails verified
Getting Results
Per-Job Blocking Results (v2.16.0)
For real-time integrations where you need to wait for a specific job to complete, use the blocking endpoint:
# Wait for job completion (blocks up to 10 minutes)
curl "https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/{campaign_id}/jobs/{job_id}/results" \
-H "Authorization: Bearer <your_token>"
# Or poll without blocking
curl "https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/{campaign_id}/jobs/{job_id}/results?wait=false" \
-H "Authorization: Bearer <your_token>"
Use Case : Perfect for n8n/Xano webhooks where you need to wait for results before proceeding to the next step.
Timeouts & Partial Results:
SpiderSite: 5 minutes per business
SpiderVerify: 2 minutes per business
Maximum wait: 10 minutes
If timeout occurs, returns status: "partial" with all available data
Workflow Results Endpoint
Get aggregated results combining SpiderMaps + SpiderSite + SpiderVerify data:
curl https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/{campaign_id}/workflow-results \
-H "Authorization: Bearer <your_token>"
Response Structure:
{
"campaign_id" : "camp_lu_restaurants_20251223_abc123" ,
"status" : "active" ,
"query" : "restaurants" ,
"country_code" : "LU" ,
"workflow_progress" : {
"businesses_total" : 69 ,
"sites_completed" : 69 ,
"emails_found" : 201 ,
"emails_verified" : 84
},
"total_businesses" : 69 ,
"total_with_domains" : 69 ,
"total_emails_found" : 201 ,
"total_valid_emails" : 8 ,
"locations" : [
{
"location_id" : 843 ,
"search_string" : "Luxembourg, Luxembourg" ,
"status" : "completed" ,
"businesses_count" : 69 ,
"businesses" : [
{
"business_name" : "Café des Tramways" ,
"business_place_id" : "0x47954f2add89aa79:0x74c726ae28575bec" ,
"business_address" : "79 Av. Pasteur, 2311 Luxembourg" ,
"business_phone" : "35226201136" ,
"business_rating" : 4.4 ,
"business_reviews_count" : 706 ,
"business_categories" : [ "Bar" , "Coffee shop" ],
"original_website" : "http://www.cafedestramways.lu/" ,
"domain" : "cafedestramways.lu" ,
"domain_filtered" : false ,
"spidersite_status" : "completed" ,
"pages_crawled" : 2 ,
"emails_found" : [ "info@cafedestramways.lu" ],
"company_info" : {
"industry" : "Restaurant/Bar" ,
"key_services" : [ "Flammekueches" , "Burgers" , "Cocktails" ],
"target_audience" : "Locals and tourists in Luxembourg" ,
"one_sentence_summary" : "Cozy bar offering drinks and homemade food"
},
"spiderverify_status" : "completed" ,
"emails_verified" : [
{
"email" : "info@cafedestramways.lu" ,
"status" : "risky" ,
"score" : 90 ,
"is_deliverable" : true ,
"is_role_account" : true
}
],
"valid_emails_count" : 0 ,
"workflow_stage" : "complete"
}
]
}
]
}
Complete Python Example
import requests
import time
import csv
# Configuration
API_URL = "https://spideriq.di-atomic.com/api/v1"
TOKEN = "your_token_here"
HEADERS = { "Authorization" : f "Bearer { TOKEN } " }
# 1. Create Campaign with Workflow
campaign_response = requests.post(
f " { API_URL } /jobs/spiderMaps/campaigns/submit" ,
headers = HEADERS ,
json = {
"query" : "restaurants" ,
"country_code" : "LU" ,
"name" : "Luxembourg Restaurant Leads" ,
"workflow" : {
"spidersite" : {
"enabled" : True ,
"max_pages" : 5 ,
"extract_company_info" : True
},
"spiderverify" : {
"enabled" : True ,
"max_emails_per_business" : 3
}
}
}
)
campaign = campaign_response.json()
campaign_id = campaign[ 'campaign_id' ]
print ( f "Created campaign: { campaign_id } " )
print ( f "Total locations: { campaign[ 'total_locations' ] } " )
# 2. Process all locations
while True :
next_response = requests.post(
f " { API_URL } /jobs/spiderMaps/campaigns/ { campaign_id } /next" ,
headers = HEADERS
)
next_data = next_response.json()
if next_data.get( 'current_task' ):
task = next_data[ 'current_task' ]
print ( f "Processing: { task[ 'search_string' ] } (job: { task[ 'job_id' ] } )" )
progress = next_data[ 'progress' ]
print ( f "Progress: { progress[ 'completed' ] } / { progress[ 'total' ] } ( { progress[ 'percentage' ] :.1f} %)" )
if not next_data[ 'has_more' ]:
print ( "All locations processed!" )
break
time.sleep( 2 ) # Rate limit between calls
# 3. Wait for workflow jobs to complete
print ( " \n Waiting for SpiderSite and SpiderVerify jobs to complete..." )
while True :
status_response = requests.get(
f " { API_URL } /jobs/spiderMaps/campaigns/ { campaign_id } /status" ,
headers = HEADERS
)
status = status_response.json()
wp = status.get( 'workflow_progress' , {})
sites_done = wp.get( 'sites_completed' , 0 ) + wp.get( 'sites_failed' , 0 )
sites_total = wp.get( 'businesses_total' , 0 )
verifies_done = wp.get( 'verifies_completed' , 0 ) + wp.get( 'verifies_failed' , 0 )
print ( f "Sites: { sites_done } / { sites_total } | Verifies: { verifies_done } | Emails: { wp.get( 'emails_found' , 0 ) } " )
# Check if workflow is complete (all sites done and verifies caught up)
if sites_done >= sites_total and wp.get( 'verifies_queued' , 0 ) == 0 :
break
time.sleep( 5 )
# 4. Get aggregated results
results_response = requests.get(
f " { API_URL } /jobs/spiderMaps/campaigns/ { campaign_id } /workflow-results" ,
headers = HEADERS
)
results = results_response.json()
# 5. Export to CSV
with open ( 'leads.csv' , 'w' , newline = '' , encoding = 'utf-8' ) as f:
writer = csv.writer(f)
writer.writerow([
'Business Name' , 'Address' , 'Phone' , 'Website' , 'Rating' ,
'Industry' , 'Email' , 'Email Status' , 'Email Score'
])
for location in results[ 'locations' ]:
for biz in location[ 'businesses' ]:
for email in biz.get( 'emails_verified' , []):
writer.writerow([
biz[ 'business_name' ],
biz.get( 'business_address' , '' ),
biz.get( 'business_phone' , '' ),
biz.get( 'domain' , '' ),
biz.get( 'business_rating' , '' ),
biz.get( 'company_info' , {}).get( 'industry' , '' ),
email[ 'email' ],
email[ 'status' ],
email[ 'score' ]
])
print ( f " \n Exported { results[ 'total_valid_emails' ] } valid emails to leads.csv" )
print ( f "Total businesses: { results[ 'total_businesses' ] } " )
print ( f "Total emails found: { results[ 'total_emails_found' ] } " )
Best Practices
Start Small : Test with a small country like Luxembourg (14 locations) before running large campaigns.
Use Population Filters : For large countries, filter by population to focus on major cities first.
Monitor Progress : Check /status periodically to track SpiderSite and SpiderVerify completion.
Rate Limiting : Add 1-2 second delays between /next calls to avoid rate limits.
Recommended Settings for Lead Generation
{
"workflow" : {
"spidersite" : {
"enabled" : true ,
"max_pages" : 5 ,
"crawl_strategy" : "bestfirst" ,
"extract_company_info" : true ,
"compendium" : {
"enabled" : false
}
},
"spiderverify" : {
"enabled" : true ,
"max_emails_per_business" : 3
}
}
}
Disable compendium for lead gen campaigns to speed up processing. Compendiums are useful for content analysis but add overhead.
Next Steps