GET
/
api
/
v1
/
jobs
/
{job_id}
/
results
Get Job Results
curl --request GET \
  --url https://spideriq.di-atomic.com/api/v1/jobs/{job_id}/results \
  --header 'Authorization: Bearer <token>'
{
  "200 OK": {},
  "202 Accepted": {},
  "410 Gone": {},
  "404 Not Found": {},
  "success": true,
  "job_id": "<string>",
  "type": "<string>",
  "status": "<string>",
  "processing_time_seconds": 123,
  "worker_id": "<string>",
  "completed_at": "<string>",
  "message": "<string>",
  "data": {},
  "error_message": "<string>",
  "data.url": "<string>",
  "data.pages_crawled": 123,
  "data.crawl_status": "<string>",
  "data.emails": [
    {}
  ],
  "data.phones": [
    {}
  ],
  "data.addresses": [
    {}
  ],
  "data.linkedin": "<string>",
  "data.twitter": "<string>",
  "data.facebook": "<string>",
  "data.instagram": "<string>",
  "data.youtube": "<string>",
  "data.github": "<string>",
  "data.tiktok": "<string>",
  "data.pinterest": "<string>",
  "data.medium": "<string>",
  "data.discord": "<string>",
  "data.whatsapp": "<string>",
  "data.telegram": "<string>",
  "data.snapchat": "<string>",
  "data.reddit": "<string>",
  "data.markdown_compendium": "<string>",
  "data.compendium": {},
  "data.company_vitals": {},
  "data.pain_points": [
    {}
  ],
  "data.team_members": [
    {}
  ],
  "data.lead_scoring": {},
  "data.personalization_hooks": {},
  "data.metadata": {},
  "data.query": "<string>",
  "data.results_count": 123,
  "data.businesses": [
    {}
  ],
  "name": "<string>",
  "place_id": "<string>",
  "rating": 123,
  "reviews_count": 123,
  "address": "<string>",
  "phone": "<string>",
  "website": "<string>",
  "categories": [
    {}
  ],
  "coordinates": {},
  "link": "<string>",
  "business_status": "<string>",
  "price_range": "<string>",
  "working_hours": {}
}

Overview

Retrieve the complete results for a scraping job. This endpoint returns different status codes based on job state.

Path Parameters

job_id
string
required
The unique identifier of the job (UUID format)Example: 550e8400-e29b-41d4-a716-446655440000

Response Status Codes

200 OK
status
Job completed successfully - results available
202 Accepted
status
Job still processing - poll again later
410 Gone
status
Job failed or was cancelled
404 Not Found
status
Job ID does not exist

Response Structure

Flat Structure (v2.7.1): Responses now use a simplified 2-3 level nesting structure (previously 5 levels). All fields are always present - fields not applicable to your request will be null.

Top-Level Response Fields

success
boolean
required
true if job completed successfully, false if failed
job_id
string
required
Unique job identifier (UUID format)
type
string
required
Job type: spiderSite or spiderMaps
status
string
required
Job status: completed, failed, processing, queued, or cancelled
processing_time_seconds
number
Time taken to process the job (null if not completed)
worker_id
string
Worker identifier that processed the job
completed_at
string
Completion timestamp in ISO 8601 format
message
string
Additional context about job state (e.g., “Job is being processed”)
data
object
Job results data (structure varies by job type, see below)
error_message
string
Error message if job failed (null otherwise)

SpiderSite Data Fields

Flat Structure: Social media fields are at the top level of data (e.g., data.linkedin), not nested under data.contact_info.social_media.linkedin.

Basic Information

data.url
string
Website URL that was crawled
data.pages_crawled
integer
Number of pages successfully crawled
data.crawl_status
string
Crawl result: success, partial, or failed

Contact Information (Flat - Top Level)

data.emails
array
Email addresses found (filtered - tracking emails removed)
data.phones
array
Phone numbers found
data.addresses
array
Physical addresses found

Social Media Profiles (All Flat - Top Level)

data.linkedin
string
LinkedIn company/profile URL (null if not found)
data.twitter
string
Twitter/X profile URL (null if not found)
data.facebook
string
Facebook page URL (null if not found)
data.instagram
string
Instagram profile URL (null if not found)
data.youtube
string
YouTube channel URL (null if not found)
data.github
string
GitHub organization/user URL (null if not found)
data.tiktok
string
TikTok profile URL (null if not found)
data.pinterest
string
Pinterest profile URL (null if not found)
data.medium
string
Medium profile URL (null if not found)
data.discord
string
Discord server invite URL (null if not found)
data.whatsapp
string
WhatsApp contact/business URL (null if not found)
data.telegram
string
Telegram contact/channel URL (null if not found)
data.snapchat
string
Snapchat profile URL (null if not found)
data.reddit
string
Reddit profile/subreddit URL (null if not found)

Markdown Compendium

data.markdown_compendium
string
AI-generated markdown summary of the website (if enabled)
data.compendium
object
Compendium metadata including size, cleanup level, and storage location

AI Features (Always Present - Null If Not Enabled)

data.company_vitals
object
Company information extracted with AI (null if extract_company_info: false)
data.pain_points
array
Business pain points identified by AI (null if extract_pain_points: false)
data.team_members
array
Team members found with AI extraction (empty array if extract_team: false)
data.lead_scoring
object
CHAMP framework lead scoring (null if product/ICP not provided)
data.personalization_hooks
object
Personalization data for outreach (null if not available)

Technical Metadata

data.metadata
object
Crawl metadata and statistics including:
  • browser_rendering_available: Whether SPA rendering was used
  • spa_enabled: Whether SPA detection was enabled
  • sitemap_used: Whether sitemap-first crawling was used
  • crawl_strategy: Strategy used (sitemap, bestfirst, bfs, dfs)
  • total_emails_found: Total emails before filtering
  • total_phones_found: Total phone numbers found

SpiderMaps Data Fields

Basic Information

data.query
string
Search query used for the scrape
data.results_count
integer
Number of business listings returned
data.businesses
array
Array of business listings (see structure below)
data.metadata
object
Search metadata (max_results, extract_reviews, language, etc.)

Business Listing Structure

Each business in the businesses array contains:
name
string
Business name
place_id
string
Google Place ID
rating
number
Average rating (1.0-5.0)
reviews_count
integer
Number of reviews
address
string
Full street address
phone
string
Phone number
website
string
Business website URL
categories
array
Business categories/types
coordinates
object
Latitude and longitude coordinates
Google Maps link to the business
business_status
string
Status: OPERATIONAL, CLOSED_TEMPORARILY, etc.
price_range
string
Price range: $, $$, $$$, or $$$$
working_hours
object
Working hours by day of week

Example Request

curl https://spideriq.di-atomic.com/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000/results \
  -H "Authorization: Bearer <your_token>"

Example Responses

  • SpiderSite - Minimal
  • SpiderSite - With AI
  • SpiderSite - Full CHAMP
  • SpiderMaps
  • Processing (202)
  • Failed (410)
  • Not Found (404)
Basic contact extraction without AI features:
200 OK - Minimal Request
{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "spiderSite",
  "status": "completed",
  "processing_time_seconds": 12.4,
  "worker_id": "spider-site-main-1",
  "completed_at": "2025-10-27T14:30:15Z",
  "message": null,
  "data": {
    "url": "https://example.com",
    "pages_crawled": 5,
    "crawl_status": "success",
    "emails": ["contact@example.com", "sales@example.com"],
    "phones": ["+1-555-123-4567"],
    "addresses": ["123 Main St, San Francisco, CA 94105"],
    "linkedin": "https://linkedin.com/company/example",
    "twitter": "https://twitter.com/example",
    "facebook": "https://facebook.com/example",
    "instagram": null,
    "youtube": null,
    "github": "https://github.com/example",
    "tiktok": null,
    "pinterest": null,
    "medium": null,
    "discord": null,
    "whatsapp": null,
    "telegram": null,
    "snapchat": null,
    "reddit": null,
    "markdown_compendium": "# Example Company\n\nLeading provider of...",
    "compendium": {
      "chars": 8450,
      "available": true,
      "cleanup_level": "fit",
      "storage_location": "inline"
    },
    "company_vitals": null,
    "pain_points": null,
    "lead_scoring": null,
    "team_members": [],
    "personalization_hooks": null,
    "metadata": {
      "spa_enabled": true,
      "sitemap_used": true,
      "browser_rendering_available": true,
      "crawl_strategy": "sitemap",
      "total_emails_found": 2,
      "total_phones_found": 1
    }
  },
  "error_message": null
}

Handling Different Status Codes

import requests
import time

def get_job_results(job_id, auth_token, max_retries=60):
    """Get job results with automatic polling"""
    url = f"https://spideriq.di-atomic.com/api/v1/jobs/{job_id}/results"
    headers = {"Authorization": f"Bearer {auth_token}"}

    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            # Success - return results
            return response.json()

        elif response.status_code == 202:
            # Still processing - wait and retry
            print(f"Job processing... (attempt {attempt + 1}/{max_retries})")
            time.sleep(3)
            continue

        elif response.status_code == 410:
            # Job failed
            error_data = response.json()
            raise Exception(f"Job failed: {error_data.get('error')}")

        elif response.status_code == 404:
            raise Exception("Job not found")

        else:
            response.raise_for_status()

    raise TimeoutError("Job did not complete within maximum retries")

# Usage
try:
    results = get_job_results(
        "550e8400-e29b-41d4-a716-446655440000",
        "<your_token>"
    )
    print("Results:", results["data"])
except Exception as e:
    print(f"Error: {e}")

Data Storage

Screenshot Storage: SpiderSite job screenshots are stored in Cloudflare R2 and accessible via CDN at cdn.spideriq.di-atomic.com. URLs are permanent and do not expire.

Best Practices

Don’t poll too frequently: Respect the 100 requests/minute rate limit. Poll every 3-5 seconds for optimal balance between responsiveness and rate limit compliance.
Save job IDs: Store job IDs in your database to retrieve results later. Results remain available indefinitely.