DexStudio

PythonPlaywrightWeb ScrapingAutomation

Google Maps Scraper with Playwright + Python

15 February 20268 min read

Why I Built This

Every lead generation agency I spoke to had the same problem: they needed business contact data from Google Maps, but existing tools were either too expensive, too unreliable, or couldn't extract emails. I decided to build something better. You can see the full ScraperBot project case study to learn more about the end result.

The Architecture

ScraperBot has three main layers:

1. Browser Automation Layer Using Playwright with stealth plugins, the scraper navigates Google Maps like a human - random scroll speeds, natural mouse movements, and realistic timing between actions. This was critical because Google's anti-bot detection is sophisticated.

python# Human-like scrolling pattern
async def scroll_results(page):
    for _ in range(random.randint(3, 6)):
        await page.mouse.wheel(0, random.randint(300, 600))
        await asyncio.sleep(random.uniform(0.8, 2.0))

2. Data Extraction Layer For each business, we extract: name, address, phone, website, rating, review count, category, and operating hours. The tricky part was handling Google's dynamic DOM - elements load asynchronously and class names change frequently.

3. Email Harvesting Pipeline This is where it gets interesting. For every business with a website, we run a four-method email extraction pipeline:

•mailto: links - The most reliable source
•CloudFlare email obfuscation decoding - Many sites use this, and we reverse the hex encoding
•JSON-LD structured data - Business schemas often include email addresses
•Regex scanning - As a fallback, with validation to filter junk addresses

Parallel Processing

To handle 10 categories in a single run, I implemented a ThreadPoolExecutor with up to 3 workers. Each worker manages its own browser instance, and results merge into a shared DataFrame with thread-safe locking.

The Checkpoint System

The most important reliability feature: every 10 businesses, we save progress to a JSON checkpoint file. If the scrape is interrupted (network drop, crash, rate limit), running the same command again picks up exactly where it left off.

Results

•5,000+ businesses extracted in under 30 minutes
•98% data accuracy verified against manual sampling
•Multi-format export to both Excel (with per-category sheets) and JSON
•Dual interface - full CLI with interactive prompts and a tkinter GUI with live results table

Lessons Learned

Anti-detection is an arms race. Human-like behaviour patterns matter more than proxy rotation.
Build your checkpoint system first, not last. Data pipelines will fail - plan for it.
Decouple your core logic from presentation. The same scraping engine powers both CLI and GUI without any code duplication.

The project reinforced my belief that the best automation tools are the ones that handle failure gracefully. Users don't care how fast your scraper is if it can't recover from a network timeout.

---

*Need a custom scraper or data extraction tool? View services and pricing or book a free discovery call.*

Need help building something similar?

I build production-grade web applications with transparent pricing and clear timelines.

View Services & Pricing Get in Touch

FirebaseReact

Real-time Firebase Features in GaragePRO

7 min read

Next.jsTypeScript

Why Next.js + TypeScript is My Go-To Stack in 2026

6 min read

ProcessFreelancing

My Client Process: Discovery to Launch

5 min read

Back to Blog

PythonPlaywrightWeb ScrapingAutomation

Google Maps Scraper with Playwright + Python

15 February 20268 min read

Why I Built This

The Architecture

ScraperBot has three main layers:

python# Human-like scrolling pattern
async def scroll_results(page):
    for _ in range(random.randint(3, 6)):
        await page.mouse.wheel(0, random.randint(300, 600))
        await asyncio.sleep(random.uniform(0.8, 2.0))

3. Email Harvesting Pipeline This is where it gets interesting. For every business with a website, we run a four-method email extraction pipeline:

•mailto: links - The most reliable source
•CloudFlare email obfuscation decoding - Many sites use this, and we reverse the hex encoding
•JSON-LD structured data - Business schemas often include email addresses
•Regex scanning - As a fallback, with validation to filter junk addresses

Parallel Processing

The Checkpoint System

Results

•5,000+ businesses extracted in under 30 minutes
•98% data accuracy verified against manual sampling
•Multi-format export to both Excel (with per-category sheets) and JSON
•Dual interface - full CLI with interactive prompts and a tkinter GUI with live results table

Lessons Learned

Anti-detection is an arms race. Human-like behaviour patterns matter more than proxy rotation.
Build your checkpoint system first, not last. Data pipelines will fail - plan for it.
Decouple your core logic from presentation. The same scraping engine powers both CLI and GUI without any code duplication.

The project reinforced my belief that the best automation tools are the ones that handle failure gracefully. Users don't care how fast your scraper is if it can't recover from a network timeout.

---

*Need a custom scraper or data extraction tool? View services and pricing or book a free discovery call.*

Need help building something similar?

I build production-grade web applications with transparent pricing and clear timelines.

View Services & Pricing Get in Touch

FirebaseReact

Google Maps Scraper with Playwright + Python

Why I Built This

The Architecture

Parallel Processing

The Checkpoint System

Results

Lessons Learned

Related Posts

Real-time Firebase Features in GaragePRO

Why Next.js + TypeScript is My Go-To Stack in 2026

My Client Process: Discovery to Launch

Google Maps Scraper with Playwright + Python

Why I Built This

The Architecture

Parallel Processing

The Checkpoint System

Results

Lessons Learned

Related Posts

Real-time Firebase Features in GaragePRO

Why Next.js + TypeScript is My Go-To Stack in 2026

My Client Process: Discovery to Launch