Overview

v2.15.0 Feature: Orchestrated campaigns automatically chain three SpiderIQ services together:
  1. SpiderMaps - Scrape businesses from Google Maps
  2. SpiderSite - Crawl each business website for emails and company info
  3. SpiderVerify - Verify extracted email addresses
One API call, complete lead data. Instead of managing three separate job types, the orchestrator handles everything automatically. You just call /next in a loop and retrieve aggregated results.

How It Works

The Chain

StepServiceWhat Happens
1SpiderMapsSearches Google Maps for businesses matching your query
2Domain FilterRemoves social media, review sites, directories (configurable)
3SpiderSiteCrawls each valid business website
4Email ExtractPulls emails from crawled pages
5SpiderVerifyVerifies each email via SMTP
6AggregateCombines all data per business

Automatic Domain Filtering

The orchestrator automatically filters out non-scrapable domains:
  • facebook.com
  • instagram.com
  • linkedin.com
  • twitter.com / x.com
  • tiktok.com
  • youtube.com
  • pinterest.com
  • yelp.com
  • tripadvisor.com
  • trustpilot.com
  • g2.com
  • capterra.com
  • yellowpages.com
  • bbb.org
  • manta.com
  • booking.com
  • doordash.com
  • ubereats.com
  • linktr.ee

Creating an Orchestrated Campaign

Basic Example

curl -X POST https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/submit \
  -H "Authorization: Bearer <your_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "restaurants",
    "country_code": "LU",
    "workflow": {
      "spidersite": {
        "enabled": true,
        "max_pages": 5,
        "extract_company_info": true
      },
      "spiderverify": {
        "enabled": true,
        "max_emails_per_business": 5
      }
    }
  }'
Response:
{
  "campaign_id": "camp_lu_restaurants_20251223_abc123",
  "status": "active",
  "query": "restaurants",
  "country_code": "LU",
  "total_locations": 14,
  "has_workflow": true,
  "workflow_config": {
    "spidersite": {
      "enabled": true,
      "max_pages": 5,
      "extract_company_info": true
    },
    "spiderverify": {
      "enabled": true,
      "max_emails_per_business": 5
    }
  }
}

Full Configuration Example

{
  "query": "restaurants",
  "country_code": "FR",
  "name": "France Restaurant Lead Gen",
  "filter": {
    "mode": "population",
    "min_population": 50000
  },
  "workflow": {
    "spidersite": {
      "enabled": true,
      "max_pages": 10,
      "crawl_strategy": "bestfirst",
      "target_pages": ["contact", "about", "team"],
      "enable_spa": true,
      "spa_timeout": 30,
      "extract_team": true,
      "extract_company_info": true,
      "extract_pain_points": false,
      "product_description": "AI-powered restaurant management software",
      "icp_description": "Restaurant owners looking to streamline operations",
      "compendium": {
        "enabled": true,
        "cleanup_level": "fit",
        "max_chars": 100000
      },
      "timeout": 30
    },
    "spiderverify": {
      "enabled": true,
      "check_gravatar": false,
      "check_dnsbl": false,
      "smtp_timeout_secs": 45,
      "max_emails_per_business": 5
    },
    "filter_social_media": true,
    "filter_review_sites": true,
    "filter_directories": true,
    "filter_maps": true
  }
}

Workflow Configuration Reference

SpiderSite Options

ParameterTypeDefaultDescription
enabledbooleanfalseEnable SpiderSite for each business
max_pagesinteger10Pages to crawl per website (1-50)
crawl_strategystring"bestfirst"bestfirst, bfs, or dfs
target_pagesarray["contact", "about", "team", "news", "blog"]Priority page types
enable_spabooleantrueEnable SPA/JavaScript rendering
spa_timeoutinteger30SPA rendering timeout (10-120s)
extract_teambooleanfalseExtract team members with AI
extract_company_infobooleanfalseExtract company info with AI
extract_pain_pointsbooleanfalseAnalyze pain points with AI
product_descriptionstringnullYour product (for CHAMP scoring)
icp_descriptionstringnullYour ICP (for CHAMP scoring)
timeoutinteger30Overall timeout (10-120s)
CHAMP Scoring: If you provide product_description, you must also provide icp_description (and vice versa).

SpiderVerify Options

ParameterTypeDefaultDescription
enabledbooleanfalseEnable email verification
check_gravatarbooleanfalseCheck for Gravatar images
check_dnsblbooleanfalseCheck spam blacklists
smtp_timeout_secsinteger45SMTP timeout (10-120s)
max_emails_per_businessinteger10Max emails to verify (1-50)
Email Prioritization: The orchestrator automatically prioritizes business emails like contact@, info@, sales@ over generic addresses.

Domain Filter Options

ParameterTypeDefaultDescription
filter_social_mediabooleantrueFilter Facebook, Instagram, etc.
filter_review_sitesbooleantrueFilter Yelp, TripAdvisor, etc.
filter_directoriesbooleantrueFilter YellowPages, BBB, etc.
filter_mapsbooleantrueFilter Google Maps links, Waze

Compendium Options

ParameterTypeDefaultDescription
enabledbooleantrueGenerate markdown compendium
cleanup_levelstring"fit"raw, fit, citations, minimal
max_charsinteger100000Max compendium size
include_in_responsebooleantrueInclude in API response
remove_duplicatesbooleantrueDeduplicate content

Monitoring Progress

Status Endpoint

Check real-time progress with the /status endpoint:
curl https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/{campaign_id}/status \
  -H "Authorization: Bearer <your_token>"
Response with Workflow Progress:
{
  "campaign_id": "camp_lu_restaurants_20251223_abc123",
  "status": "active",
  "progress": {
    "completed": 5,
    "failed": 0,
    "pending": 9,
    "total": 14,
    "percentage": 35.7
  },
  "workflow_progress": {
    "businesses_total": 150,
    "sites_queued": 5,
    "sites_completed": 120,
    "sites_failed": 2,
    "verifies_queued": 10,
    "verifies_completed": 80,
    "verifies_failed": 1,
    "emails_found": 350,
    "emails_verified": 280
  },
  "has_workflow": true
}

Workflow Progress Fields

FieldDescription
businesses_totalTotal businesses with valid domains
sites_queuedSpiderSite jobs waiting
sites_completedSpiderSite jobs finished
sites_failedSpiderSite jobs failed
verifies_queuedSpiderVerify jobs waiting
verifies_completedSpiderVerify jobs finished
verifies_failedSpiderVerify jobs failed
emails_foundTotal emails extracted
emails_verifiedTotal emails verified

Getting Results

Per-Job Blocking Results (v2.16.0)

For real-time integrations where you need to wait for a specific job to complete, use the blocking endpoint:
# Wait for job completion (blocks up to 10 minutes)
curl "https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/{campaign_id}/jobs/{job_id}/results" \
  -H "Authorization: Bearer <your_token>"

# Or poll without blocking
curl "https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/{campaign_id}/jobs/{job_id}/results?wait=false" \
  -H "Authorization: Bearer <your_token>"
Use Case: Perfect for n8n/Xano webhooks where you need to wait for results before proceeding to the next step.
Timeouts & Partial Results:
  • SpiderSite: 5 minutes per business
  • SpiderVerify: 2 minutes per business
  • Maximum wait: 10 minutes
  • If timeout occurs, returns status: "partial" with all available data
See the Get Job Results (Blocking) endpoint for complete documentation.

Workflow Results Endpoint

Get aggregated results combining SpiderMaps + SpiderSite + SpiderVerify data:
curl https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/campaigns/{campaign_id}/workflow-results \
  -H "Authorization: Bearer <your_token>"
Response Structure:
{
  "campaign_id": "camp_lu_restaurants_20251223_abc123",
  "status": "active",
  "query": "restaurants",
  "country_code": "LU",
  "workflow_progress": {
    "businesses_total": 69,
    "sites_completed": 69,
    "emails_found": 201,
    "emails_verified": 84
  },
  "total_businesses": 69,
  "total_with_domains": 69,
  "total_emails_found": 201,
  "total_valid_emails": 8,
  "locations": [
    {
      "location_id": 843,
      "search_string": "Luxembourg, Luxembourg",
      "status": "completed",
      "businesses_count": 69,
      "businesses": [
        {
          "business_name": "Café des Tramways",
          "business_place_id": "0x47954f2add89aa79:0x74c726ae28575bec",
          "business_address": "79 Av. Pasteur, 2311 Luxembourg",
          "business_phone": "35226201136",
          "business_rating": 4.4,
          "business_reviews_count": 706,
          "business_categories": ["Bar", "Coffee shop"],
          "original_website": "http://www.cafedestramways.lu/",
          "domain": "cafedestramways.lu",
          "domain_filtered": false,
          "spidersite_status": "completed",
          "pages_crawled": 2,
          "emails_found": ["info@cafedestramways.lu"],
          "company_info": {
            "industry": "Restaurant/Bar",
            "key_services": ["Flammekueches", "Burgers", "Cocktails"],
            "target_audience": "Locals and tourists in Luxembourg",
            "one_sentence_summary": "Cozy bar offering drinks and homemade food"
          },
          "spiderverify_status": "completed",
          "emails_verified": [
            {
              "email": "info@cafedestramways.lu",
              "status": "risky",
              "score": 90,
              "is_deliverable": true,
              "is_role_account": true
            }
          ],
          "valid_emails_count": 0,
          "workflow_stage": "complete"
        }
      ]
    }
  ]
}

Complete Python Example

import requests
import time
import csv

# Configuration
API_URL = "https://spideriq.di-atomic.com/api/v1"
TOKEN = "your_token_here"
HEADERS = {"Authorization": f"Bearer {TOKEN}"}

# 1. Create Campaign with Workflow
campaign_response = requests.post(
    f"{API_URL}/jobs/spiderMaps/campaigns/submit",
    headers=HEADERS,
    json={
        "query": "restaurants",
        "country_code": "LU",
        "name": "Luxembourg Restaurant Leads",
        "workflow": {
            "spidersite": {
                "enabled": True,
                "max_pages": 5,
                "extract_company_info": True
            },
            "spiderverify": {
                "enabled": True,
                "max_emails_per_business": 3
            }
        }
    }
)
campaign = campaign_response.json()
campaign_id = campaign['campaign_id']
print(f"Created campaign: {campaign_id}")
print(f"Total locations: {campaign['total_locations']}")

# 2. Process all locations
while True:
    next_response = requests.post(
        f"{API_URL}/jobs/spiderMaps/campaigns/{campaign_id}/next",
        headers=HEADERS
    )
    next_data = next_response.json()

    if next_data.get('current_task'):
        task = next_data['current_task']
        print(f"Processing: {task['search_string']} (job: {task['job_id']})")

    progress = next_data['progress']
    print(f"Progress: {progress['completed']}/{progress['total']} ({progress['percentage']:.1f}%)")

    if not next_data['has_more']:
        print("All locations processed!")
        break

    time.sleep(2)  # Rate limit between calls

# 3. Wait for workflow jobs to complete
print("\nWaiting for SpiderSite and SpiderVerify jobs to complete...")
while True:
    status_response = requests.get(
        f"{API_URL}/jobs/spiderMaps/campaigns/{campaign_id}/status",
        headers=HEADERS
    )
    status = status_response.json()
    wp = status.get('workflow_progress', {})

    sites_done = wp.get('sites_completed', 0) + wp.get('sites_failed', 0)
    sites_total = wp.get('businesses_total', 0)
    verifies_done = wp.get('verifies_completed', 0) + wp.get('verifies_failed', 0)

    print(f"Sites: {sites_done}/{sites_total} | Verifies: {verifies_done} | Emails: {wp.get('emails_found', 0)}")

    # Check if workflow is complete (all sites done and verifies caught up)
    if sites_done >= sites_total and wp.get('verifies_queued', 0) == 0:
        break

    time.sleep(5)

# 4. Get aggregated results
results_response = requests.get(
    f"{API_URL}/jobs/spiderMaps/campaigns/{campaign_id}/workflow-results",
    headers=HEADERS
)
results = results_response.json()

# 5. Export to CSV
with open('leads.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow([
        'Business Name', 'Address', 'Phone', 'Website', 'Rating',
        'Industry', 'Email', 'Email Status', 'Email Score'
    ])

    for location in results['locations']:
        for biz in location['businesses']:
            for email in biz.get('emails_verified', []):
                writer.writerow([
                    biz['business_name'],
                    biz.get('business_address', ''),
                    biz.get('business_phone', ''),
                    biz.get('domain', ''),
                    biz.get('business_rating', ''),
                    biz.get('company_info', {}).get('industry', ''),
                    email['email'],
                    email['status'],
                    email['score']
                ])

print(f"\nExported {results['total_valid_emails']} valid emails to leads.csv")
print(f"Total businesses: {results['total_businesses']}")
print(f"Total emails found: {results['total_emails_found']}")

Best Practices

Start Small: Test with a small country like Luxembourg (14 locations) before running large campaigns.
Use Population Filters: For large countries, filter by population to focus on major cities first.
Monitor Progress: Check /status periodically to track SpiderSite and SpiderVerify completion.
Rate Limiting: Add 1-2 second delays between /next calls to avoid rate limits.
{
  "workflow": {
    "spidersite": {
      "enabled": true,
      "max_pages": 5,
      "crawl_strategy": "bestfirst",
      "extract_company_info": true,
      "compendium": {
        "enabled": false
      }
    },
    "spiderverify": {
      "enabled": true,
      "max_emails_per_business": 3
    }
  }
}
Disable compendium for lead gen campaigns to speed up processing. Compendiums are useful for content analysis but add overhead.

Next Steps