Google Maps Scraper with Playwright + Python
Why I Built This
Every lead generation agency I spoke to had the same problem: they needed business contact data from Google Maps, but existing tools were either too expensive, too unreliable, or couldn't extract emails. I decided to build something better. You can see the full ScraperBot project case study to learn more about the end result.
The Architecture
ScraperBot has three main layers:
1. Browser Automation Layer Using Playwright with stealth plugins, the scraper navigates Google Maps like a human - random scroll speeds, natural mouse movements, and realistic timing between actions. This was critical because Google's anti-bot detection is sophisticated.
python# Human-like scrolling pattern
async def scroll_results(page):
for _ in range(random.randint(3, 6)):
await page.mouse.wheel(0, random.randint(300, 600))
await asyncio.sleep(random.uniform(0.8, 2.0))2. Data Extraction Layer For each business, we extract: name, address, phone, website, rating, review count, category, and operating hours. The tricky part was handling Google's dynamic DOM - elements load asynchronously and class names change frequently.
3. Email Harvesting Pipeline This is where it gets interesting. For every business with a website, we run a four-method email extraction pipeline:
- •mailto: links - The most reliable source
- •CloudFlare email obfuscation decoding - Many sites use this, and we reverse the hex encoding
- •JSON-LD structured data - Business schemas often include email addresses
- •Regex scanning - As a fallback, with validation to filter junk addresses
Parallel Processing
To handle 10 categories in a single run, I implemented a ThreadPoolExecutor with up to 3 workers. Each worker manages its own browser instance, and results merge into a shared DataFrame with thread-safe locking.
The Checkpoint System
The most important reliability feature: every 10 businesses, we save progress to a JSON checkpoint file. If the scrape is interrupted (network drop, crash, rate limit), running the same command again picks up exactly where it left off.
Results
- •5,000+ businesses extracted in under 30 minutes
- •98% data accuracy verified against manual sampling
- •Multi-format export to both Excel (with per-category sheets) and JSON
- •Dual interface - full CLI with interactive prompts and a tkinter GUI with live results table
Lessons Learned
- Anti-detection is an arms race. Human-like behaviour patterns matter more than proxy rotation.
- Build your checkpoint system first, not last. Data pipelines will fail - plan for it.
- Decouple your core logic from presentation. The same scraping engine powers both CLI and GUI without any code duplication.
The project reinforced my belief that the best automation tools are the ones that handle failure gracefully. Users don't care how fast your scraper is if it can't recover from a network timeout.
---
*Need a custom scraper or data extraction tool? View services and pricing or book a free discovery call.*
Need help building something similar?
I build production-grade web applications with transparent pricing and clear timelines.