Google Maps Business Scraping

Overview

SpiderMaps scrapes business listings from Google Maps using search queries - the same way you would search on Google Maps. This allows you to extract up to 100-120 businesses per query, making it perfect for bulk lead generation, market research, and building comprehensive business directories.

Primary Method: Search queries like “Restaurant Berlin, Germany” or “Coffee shops 10711, Germany”Alternative Method: Individual business URLs (for specific businesses you already know)

What You Can Extract

Each business listing includes:

Basic Information

Business name
Full address
Phone number
Website URL
Google Place ID

Ratings & Reviews

Google rating (1-5 stars)
Total review count
Business categories
Price level ($ to $$$$)

Operational Data

Business hours (by day)
Business status (open/closed)
Popular times (if available)

Location Data

Latitude/longitude coordinates
Google Maps link
Photo URLs (optional)

Understanding Search Queries

Query Format

Search queries follow this simple pattern:

[Category/Keyword] + [Location]

Examples:

"Restaurant Berlin, Germany"
"Coffee shops 10711, Germany" (with postal code)
"Non-Profit Organization in Randers, Denmark"
"Italian restaurant Madrid, España"
"Hotels near Times Square, New York"

Why Search Queries?

Up to 120 Results Per QueryEach search query can return 100-120 business listings, making it far more efficient than scraping individual business URLs.

Comparison:

❌ Individual URLs: 1 request = 1 business
✅ Search Query: 1 request = 100-120 businesses

Strategic Query Design

Small Cities (< 100,000 people)

For smaller cities, use Keyword + City name:

queries = [
    "Restaurant Randers, Denmark",
    "Hotel Randers, Denmark",
    "Cafe Randers, Denmark"
]

Small cities typically have fewer than 100-120 businesses per category, so a single query will capture all results.

Large Cities (> 100,000 people)

For large cities like New York, Berlin, or London, use Keyword + Postal Code or Keyword + Neighborhood:

# Berlin postal codes
queries = [
    "Restaurant 10711, Germany",  # Wilmersdorf
    "Restaurant 10115, Germany",  # Mitte
    "Restaurant 10247, Germany",  # Friedrichshain
    "Restaurant 10178, Germany",  # Mitte (Alexanderplatz)
]

Why Postal Codes Matter:NYC has 25,000+ restaurants. Using just “Restaurant New York” returns only 100-120 results, missing 99% of businesses.Breaking it down by postal codes ensures complete coverage.

Multi-Language Support

Use the local language for better results:

# German
"Restaurants in Berlin, Deutschland"

# Spanish
"Restaurantes en Madrid, España"

# French
"Restaurants à Paris, France"

# Danish
"Restauranter i København, Danmark"

Specify the lang parameter to match:

{
    "search_query": "Restaurantes en Madrid, España",
    "lang": "es"  # Spanish
}

Finding the Right Keywords

1. Google Business Categories

Use official Google Maps categories for best results:

Restaurants & Food

Restaurant
Italian restaurant
Chinese restaurant
Fast food restaurant
Cafe
Bakery
Bar
Pizza restaurant

Services

Lawyer
Dentist
Hair salon
Real estate agency
Insurance agency
Accounting firm
Marketing agency

Retail

Clothing store
Electronics store
Grocery store
Pharmacy
Bookstore
Furniture store

Health & Wellness

Hospital
Gym
Yoga studio
Spa
Physical therapist
Chiropractor

Professional Services

Hotel
Event venue
Coworking space
Non-profit organization
Government office

2. Category Subcategories

Combine main category with subcategory for targeted results:

queries = [
    "Italian restaurant Berlin",      # Specific cuisine
    "Boutique hotel Paris",           # Specific hotel type
    "Organic grocery store Munich",   # Specific store type
    "Corporate law firm Frankfurt",   # Specific practice area
]

3. Custom Search Terms

You can also use descriptive search terms:

queries = [
    "vegan restaurants Berlin",
    "24 hour pharmacy Munich",
    "pet friendly hotels Barcelona",
    "disability rights organization Denmark",
]

Basic Usage

Submit a Search Query Job

import requests

url = "https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/submit"
headers = {
    "Authorization": "Bearer <your_token>",
    "Content-Type": "application/json"
}

data = {
    "payload": {
        "search_query": "Restaurant Berlin, Germany",
        "max_results": 100,
        "lang": "en"
    }
}

response = requests.post(url, headers=headers, json=data)
job = response.json()
print(f"Job submitted: {job['job_id']}")

Response:

{
  "success": true,
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "type": "spiderMaps",
  "status": "queued",
  "message": "SpiderMaps job queued successfully"
}

Retrieve Results

import requests
import time

job_id = "660e8400-e29b-41d4-a716-446655440001"
headers = {"Authorization": "Bearer <your_token>"}

while True:
    response = requests.get(
        f"https://spideriq.di-atomic.com/api/v1/jobs/{job_id}/results",
        headers=headers
    )

    if response.status_code == 200:
        # Job completed!
        result = response.json()
        businesses = result['data']['businesses']

        print(f"Found {len(businesses)} businesses")
        for biz in businesses[:5]:  # Show first 5
            print(f"  - {biz['name']}")
            print(f"    {biz['address']}")
            print(f"    Rating: {biz.get('rating', 'N/A')}⭐ ({biz.get('reviews_count', 0)} reviews)")
            print(f"    Phone: {biz.get('phone', 'N/A')}")
            print()
        break

    elif response.status_code == 202:
        # Still processing
        print("Waiting for results...")
        time.sleep(3)
    else:
        print(f"Error: {response.json()}")
        break

Results Structure

When your job completes, you’ll receive an array of businesses:

{
  "success": true,
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "type": "spiderMaps",
  "status": "completed",
  "processing_time_seconds": 45.2,
  "data": {
    "query": "Restaurant Berlin, Germany",
    "results_count": 100,
    "businesses": [
      {
        "name": "Restaurant Zur letzten Instanz",
        "place_id": "ChIJN1t_tDeuEmsRUsoyG83frY4",
        "rating": 4.3,
        "reviews_count": 2847,
        "address": "Waisenstraße 14-16, 10179 Berlin, Germany",
        "phone": "+49 30 2425528",
        "website": "https://www.zurletzteninstanz.de/",
        "categories": ["German restaurant", "Traditional restaurant"],
        "coordinates": {
          "latitude": 52.5170365,
          "longitude": 13.4174634
        },
        "link": "https://www.google.com/maps/place/...",
        "business_status": "OPERATIONAL",
        "price_range": "$$",
        "working_hours": {
          "Monday": "12:00 PM - 11:00 PM",
          "Tuesday": "12:00 PM - 11:00 PM",
          "Wednesday": "12:00 PM - 11:00 PM",
          "Thursday": "12:00 PM - 11:00 PM",
          "Friday": "12:00 PM - 11:30 PM",
          "Saturday": "12:00 PM - 11:30 PM",
          "Sunday": "12:00 PM - 11:00 PM"
        }
      },
      // ... 99 more businesses
    ],
    "metadata": {
      "max_results": 100,
      "extract_reviews": false,
      "extract_photos": false,
      "language": "en"
    }
  }
}

Bulk Scraping Strategy

Complete City Coverage

For comprehensive coverage of large cities, break down by postal codes:

import requests

# Berlin postal codes (example subset)
berlin_postal_codes = [
    "10115", "10117", "10119", "10178", "10179",  # Mitte
    "10243", "10245", "10247", "10249",           # Friedrichshain
    "10551", "10553", "10555", "10557",           # Tiergarten
    "10711", "10713", "10715", "10717",           # Wilmersdorf
]

category = "Restaurant"
headers = {"Authorization": "Bearer <your_token>"}
submit_url = "https://spideriq.di-atomic.com/api/v1/jobs/spiderMaps/submit"

job_ids = []

for postal_code in berlin_postal_codes:
    query = f"{category} {postal_code}, Germany"

    data = {
        "payload": {
            "search_query": query,
            "max_results": 100,
            "lang": "de"
        }
    }

    response = requests.post(submit_url, headers=headers, json=data)
    job_id = response.json()['job_id']
    job_ids.append((postal_code, job_id))

    print(f"✓ Submitted: {query} (Job ID: {job_id})")

print(f"\nTotal jobs submitted: {len(job_ids)}")
print(f"Expected businesses: {len(job_ids) * 100} (assuming 100 per zone)")

Coverage Calculation

Coverage Estimation:

Single query: 100-120 businesses
Small city (< 100k people): 1-5 queries for complete coverage
Large city (> 100k people): 10-50+ queries (by postal code)

Example:

Berlin has ~95 postal codes
“Restaurant” query per postal code = 95 queries
95 queries × 100 businesses = ~9,500 restaurants

Common Use Cases

1. Lead Generation by Category & Location

Extract businesses for B2B outreach:

# Target: Marketing agencies in Munich
query = "Marketing agency Munich, Germany"

data = {
    "payload": {
        "search_query": query,
        "max_results": 100,
        "lang": "de"
    }
}

# Submit and retrieve
job_id = submit_job(data)
results = get_results(job_id)

# Extract contact info for CRM
for biz in results['data']['businesses']:
    lead = {
        'company_name': biz['name'],
        'phone': biz.get('phone'),
        'website': biz.get('website'),
        'address': biz.get('address'),
        'rating': biz.get('rating'),
        'google_maps_link': biz['link']
    }
    # Add to your CRM
    add_to_crm(lead)

2. Market Research for Specific Industries

Analyze competition in target markets:

# Research hotel market in major European cities
cities = [
    ("Hotel Barcelona, Spain", "es"),
    ("Hotel Paris, France", "fr"),
    ("Hotel Berlin, Germany", "de"),
    ("Hotel Amsterdam, Netherlands", "nl"),
    ("Hotel Rome, Italy", "it")
]

market_data = {}

for query, lang in cities:
    city_name = query.split()[1].rstrip(',')

    data = {
        "payload": {
            "search_query": query,
            "max_results": 100,
            "lang": lang
        }
    }

    # Submit and collect
    job_id = submit_job(data)
    results = get_results(job_id)

    businesses = results['data']['businesses']

    # Analyze
    market_data[city_name] = {
        'total_hotels': len(businesses),
        'avg_rating': sum(b.get('rating', 0) for b in businesses) / len(businesses),
        'price_distribution': count_price_ranges(businesses),
        'top_rated': sorted(businesses, key=lambda x: x.get('rating', 0), reverse=True)[:10]
    }

print(json.dumps(market_data, indent=2))

3. Building Comprehensive Directories

Create location-based business directories:

# Build restaurant directory for tourist district
districts = [
    "Restaurant Kreuzberg, Berlin",
    "Restaurant Mitte, Berlin",
    "Restaurant Prenzlauer Berg, Berlin",
    "Restaurant Charlottenburg, Berlin"
]

directory = []

for district_query in districts:
    district_name = district_query.split()[1].rstrip(',')

    data = {
        "payload": {
            "search_query": district_query,
            "max_results": 100,
            "lang": "de",
            "extract_photos": True  # Include photos for directory
        }
    }

    job_id = submit_job(data)
    results = get_results(job_id)

    for biz in results['data']['businesses']:
        directory.append({
            'name': biz['name'],
            'district': district_name,
            'address': biz.get('address'),
            'phone': biz.get('phone'),
            'website': biz.get('website'),
            'rating': biz.get('rating'),
            'price_range': biz.get('price_range'),
            'categories': biz.get('categories', []),
            'photo_url': biz.get('photos', [None])[0]  # First photo
        })

# Export to CSV or database
export_to_csv(directory, 'berlin_restaurants.csv')

4. Competitor Analysis by Region

Monitor competitor locations and ratings:

# Track competitor coffee chain locations
competitor_name = "Starbucks"
cities_to_monitor = [
    ("Starbucks Berlin, Germany", "de"),
    ("Starbucks Munich, Germany", "de"),
    ("Starbucks Hamburg, Germany", "de")
]

competitor_report = {}

for query, lang in cities_to_monitor:
    city = query.split()[1].rstrip(',')

    data = {
        "payload": {
            "search_query": query,
            "max_results": 100,
            "lang": lang
        }
    }

    job_id = submit_job(data)
    results = get_results(job_id)

    businesses = results['data']['businesses']

    competitor_report[city] = {
        'location_count': len(businesses),
        'average_rating': sum(b.get('rating', 0) for b in businesses) / len(businesses) if businesses else 0,
        'locations': [
            {
                'address': b.get('address'),
                'rating': b.get('rating'),
                'reviews': b.get('reviews_count')
            }
            for b in businesses
        ]
    }

print(f"Competitor Analysis Report:")
for city, data in competitor_report.items():
    print(f"\n{city}:")
    print(f"  Locations: {data['location_count']}")
    print(f"  Avg Rating: {data['average_rating']:.2f}⭐")

Advanced Strategies

Combining Multiple Keywords

Cast a wider net by combining related keywords:

keywords = [
    "Italian restaurant",
    "Pizza restaurant",
    "Pasta restaurant",
    "Trattoria"
]

location = "Berlin, Germany"
all_results = []

for keyword in keywords:
    query = f"{keyword} {location}"

    data = {
        "payload": {
            "search_query": query,
            "max_results": 100,
            "lang": "de"
        }
    }

    job_id = submit_job(data)
    results = get_results(job_id)
    all_results.extend(results['data']['businesses'])

# Deduplicate by place_id
unique_businesses = {biz['place_id']: biz for biz in all_results}
print(f"Found {len(unique_businesses)} unique Italian restaurants")

Language Optimization

Use local language for better, more complete results:

# Compare English vs local language results
queries = [
    ("Restaurant Copenhagen, Denmark", "en"),
    ("Restauranter København, Danmark", "da")  # Danish
]

for query, lang in queries:
    data = {
        "payload": {
            "search_query": query,
            "max_results": 100,
            "lang": lang
        }
    }

    job_id = submit_job(data)
    results = get_results(job_id)

    print(f"{query} ({lang}): {len(results['data']['businesses'])} results")

# Danish query typically returns MORE results

Language Best Practice:Always use the local language when possible. For example:

Copenhagen: Use Danish ("da")
Berlin: Use German ("de")
Barcelona: Use Spanish or Catalan ("es", "ca")

Review & Photo Extraction

Extract additional data for richer insights:

data = {
    "payload": {
        "search_query": "Hotel Paris, France",
        "max_results": 50,
        "lang": "fr",
        "extract_reviews": True,   # Include customer reviews
        "extract_photos": True      # Include photo URLs
    }
}

# Processing time increases with these options:
# Base query: ~30-60 seconds
# With reviews: +20-40 seconds
# With photos: +10-20 seconds

Processing Time:Enabling extract_reviews and extract_photos significantly increases processing time:

Base query: 30-60 seconds
With reviews: 50-100 seconds
With both: 60-120 seconds

Only enable when you need this data.

Individual Business URL Method

For specific businesses you already know:

# When you have a specific Google Maps URL
data = {
    "payload": {
        "url": "https://www.google.com/maps/place/Googleplex/@37.4220656,-122.0840897",
        "max_results": 1
    }
}

# Or use Place ID directly
data = {
    "payload": {
        "url": "ChIJN1t_tDeuEmsRUsoyG83frY4",  # Place ID
        "max_results": 1
    }
}

When to Use URLs:

You already have specific business URLs
Verifying/updating existing business data
Single business lookups

Prefer search queries for:

Bulk scraping
Lead generation
Market research
Building directories

Best Practices

Start Small, Scale UpTest with max_results: 20 first to verify your query, then increase to 100 for production runs.

Use Postal Codes for Large CitiesBreak down cities >100k population by postal codes for complete coverage.

Respect Rate LimitsSpiderIQ allows 100 requests per minute. For bulk scraping, batch your submissions:

# Submit in batches of 10
for i in range(0, len(queries), 10):
    batch = queries[i:i+10]
    for query in batch:
        submit_job(query)

    # Wait 6 seconds between batches
    if i + 10 < len(queries):
        time.sleep(6)

Cache Results by Place IDStore Place IDs in your database to avoid duplicate scraping:

existing_place_ids = load_from_database()

for biz in results['data']['businesses']:
    if biz['place_id'] not in existing_place_ids:
        save_to_database(biz)
        existing_place_ids.add(biz['place_id'])

Use Local LanguageAlways specify the local language (lang parameter) for:

More complete results
Better category matching
Local business names

Terms of ServiceEnsure your use case complies with:

Google Maps Terms of Service
SpiderIQ Acceptable Use Policy
Local data protection laws (GDPR, CCPA, etc.)

Do not use for spam, unauthorized marketing, or malicious purposes.

Complete Workflow Example

Here’s a complete workflow for bulk lead generation:

import requests
import time
import csv
from concurrent.futures import ThreadPoolExecutor

# Configuration
API_BASE = "https://spideriq.di-atomic.com/api/v1"
AUTH_TOKEN = "<your_token>"
headers = {"Authorization": f"Bearer {AUTH_TOKEN}"}

# Step 1: Define your target queries
queries = [
    ("Marketing agency Berlin, Germany", "de"),
    ("Marketing agency Munich, Germany", "de"),
    ("Marketing agency Hamburg, Germany", "de"),
    ("Marketing agency Frankfurt, Germany", "de"),
]

print(f"🎯 Target: {len(queries)} cities for marketing agency leads\n")

# Step 2: Submit all jobs
job_mapping = []

for query, lang in queries:
    data = {
        "payload": {
            "search_query": query,
            "max_results": 100,
            "lang": lang
        }
    }

    response = requests.post(
        f"{API_BASE}/jobs/spiderMaps/submit",
        headers=headers,
        json=data
    )

    job_id = response.json()['job_id']
    job_mapping.append((query, job_id))
    print(f"✓ Submitted: {query} (Job ID: {job_id})")

print(f"\n⏳ Waiting for {len(job_mapping)} jobs to complete...\n")

# Step 3: Poll for results (parallel)
def get_job_results(query_and_job):
    query, job_id = query_and_job
    max_wait = 120
    start_time = time.time()

    while time.time() - start_time < max_wait:
        response = requests.get(
            f"{API_BASE}/jobs/{job_id}/results",
            headers=headers
        )

        if response.status_code == 200:
            result = response.json()
            businesses = result['data']['businesses']
            print(f"✓ {query}: {len(businesses)} businesses retrieved")
            return (query, businesses)
        elif response.status_code == 202:
            time.sleep(3)
        else:
            print(f"✗ {query}: Error {response.status_code}")
            return (query, [])

    print(f"⏱️  {query}: Timeout")
    return (query, [])

# Fetch results in parallel (max 5 concurrent)
with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(get_job_results, job_mapping))

# Step 4: Process and export
all_leads = []

for query, businesses in results:
    city = query.split()[2].rstrip(',')

    for biz in businesses:
        lead = {
            'company_name': biz['name'],
            'city': city,
            'address': biz.get('address', ''),
            'phone': biz.get('phone', ''),
            'website': biz.get('website', ''),
            'rating': biz.get('rating', ''),
            'reviews': biz.get('reviews_count', 0),
            'categories': ', '.join(biz.get('categories', [])),
            'google_maps': biz['link']
        }
        all_leads.append(lead)

# Step 5: Export to CSV
output_file = 'marketing_agencies_germany.csv'

with open(output_file, 'w', newline='', encoding='utf-8') as f:
    if all_leads:
        writer = csv.DictWriter(f, fieldnames=all_leads[0].keys())
        writer.writeheader()
        writer.writerows(all_leads)

print(f"\n✅ Complete!")
print(f"📊 Total leads extracted: {len(all_leads)}")
print(f"💾 Exported to: {output_file}")
print(f"📈 Average leads per city: {len(all_leads) / len(queries):.0f}")

Performance Tips

Processing Times

Typical processing times per query:

Configuration	Time
Basic query (max 20 results)	20-30 seconds
Standard query (max 100 results)	45-75 seconds
With reviews extraction	60-100 seconds
With photos extraction	50-90 seconds
With both reviews & photos	80-120 seconds

Optimal Polling Interval

# Recommended polling strategy
time.sleep(5)   # First check after 5 seconds
time.sleep(5)   # Then check every 5 seconds

# For queries with reviews/photos, poll less frequently
time.sleep(10)  # Check every 10 seconds

Parallel Processing

Process multiple results concurrently:

from concurrent.futures import ThreadPoolExecutor

def fetch_result(job_id):
    # Your polling logic here
    return get_result(job_id)

# Process up to 10 jobs in parallel
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(fetch_result, job_ids))

Next Steps

SpiderMaps API Reference

Complete API documentation with all parameters

Job Status

Check job processing status

Get Results

Retrieve job results

Error Handling

Handle errors gracefully

Guides

​Overview

​What You Can Extract

Basic Information

Ratings & Reviews

Operational Data

Location Data

​Understanding Search Queries

​Query Format

​Why Search Queries?

​Strategic Query Design

​Small Cities (< 100,000 people)

​Large Cities (> 100,000 people)

​Multi-Language Support

​Finding the Right Keywords

​1. Google Business Categories

​2. Category Subcategories

​3. Custom Search Terms

​Basic Usage

​Submit a Search Query Job

​Retrieve Results

​Results Structure

​Bulk Scraping Strategy

​Complete City Coverage

​Coverage Calculation

​Common Use Cases

​1. Lead Generation by Category & Location

​2. Market Research for Specific Industries

​3. Building Comprehensive Directories

​4. Competitor Analysis by Region

​Advanced Strategies

​Combining Multiple Keywords

​Language Optimization

​Review & Photo Extraction

​Individual Business URL Method

​Best Practices

​Complete Workflow Example

​Performance Tips

​Processing Times

​Optimal Polling Interval

​Parallel Processing

​Next Steps

SpiderMaps API Reference

Job Status

Get Results

Error Handling

Overview

What You Can Extract

Understanding Search Queries

Query Format

Why Search Queries?

Strategic Query Design

Small Cities (< 100,000 people)

Large Cities (> 100,000 people)

Multi-Language Support

Finding the Right Keywords

1. Google Business Categories

2. Category Subcategories

3. Custom Search Terms

Basic Usage

Submit a Search Query Job

Retrieve Results

Results Structure

Bulk Scraping Strategy

Complete City Coverage

Coverage Calculation

Common Use Cases

1. Lead Generation by Category & Location

2. Market Research for Specific Industries

3. Building Comprehensive Directories

4. Competitor Analysis by Region

Advanced Strategies

Combining Multiple Keywords

Language Optimization

Review & Photo Extraction

Individual Business URL Method

Best Practices

Complete Workflow Example

Performance Tips

Processing Times

Optimal Polling Interval

Parallel Processing

Next Steps