Dynamic Web Scraper for E‑Commerce Sites
A Python tool that automates navigation of JavaScript‑heavy e‑commerce pages, extracts product data, and saves it to JSON. It balances speed and reliability with Selenium and BeautifulSoup.
The Idea
I started this project when I needed a quick way to pull product listings from a new online store. The site was a mess of JavaScript, so my first instinct was to try pure requests and BeautifulSoup. It failed. I switched to Selenium.
How it Works
The scraper has four parts:
- scrape_url_script.py sets up the driver, scrolls to load content, and saves the page.
- extract_product_information.py parses the saved page, grabs title, price, and link.
- detailed_product_information.py visits each product link for extras like color, size, and images.
- navigator.py stitches everything together, starting from a cached home page.
I built a small testing mode that limits scrolling, category count, and product depth. That keeps early runs fast and prevents accidental overload.
Why It Matters
Most e‑commerce sites now render content with JavaScript. A simple scraper will never cut it. By combining Selenium for rendering and BeautifulSoup for parsing, this tool pulls data reliably while keeping the code readable.
Next Steps
Once the site structure changes, just tweak the config variables in navigator.py. That’s all you need to adapt to a new layout.