Python is the go-to language for web scraping: requests, Scrapy and Selenium cover almost every data-collection task. But without proxies, any scraper quickly hits IP bans, CAPTCHAs and rate limits. Let's look at how to plug proxies into each library, how to set up rotation and how to avoid getting blocked — with code examples.
Why a scraper needs proxies
- Bypass IP limits. Sites count requests per address and return 429/CAPTCHA once you exceed them.
- Geo access. Prices, search results and content often depend on country — you need an IP from the right region.
- Concurrency. A pool of IPs lets you run many threads without a ban.
- Reliability. If one IP goes down, rotation switches to a live one.
Which proxy to choose for scraping
The IP type depends on how aggressive the site's anti-bot is:
| Site / task | Proxy type | Rotation |
|---|---|---|
| Simple sites, APIs, catalogs | Datacenter | By timer / on error |
| Marketplaces, aggregators | Residential | Per request or session |
| Social networks, anti-bot | Mobile | Per request |
For more on the differences, see "Mobile vs Residential vs Datacenter Proxies". Ready-made plans for data collection are on our proxies for scraping page.
requests: adding a proxy
The basic approach is a proxies dictionary with protocols. Proxy string format: schema://login:password@ip:port.
import requests
proxies = {
"http": "http://login:password@45.86.1.10:8000",
"https": "http://login:password@45.86.1.10:8000",
}
r = requests.get("https://httpbin.org/ip", proxies=proxies, timeout=15)
print(r.json()) # you'll see the proxy IP, not your own
For SOCKS5, install requests[socks] and use the socks5:// scheme. On protocol differences, see "SOCKS5 or HTTP".
Rotating IPs from a pool
If you have a list of addresses, pick a random one for each request so load is spread across the pool:
import random, requests
pool = [
"http://user:pass@45.86.1.10:8000",
"http://user:pass@45.86.1.11:8000",
"http://user:pass@45.86.1.12:8000",
]
def get(url):
p = random.choice(pool)
return requests.get(url, proxies={"http": p, "https": p}, timeout=15)
Scrapy: proxy middleware
In Scrapy the proxy is set via meta or middleware. The simplest option is to pass the address into every request:
import scrapy
class PriceSpider(scrapy.Spider):
name = "prices"
start_urls = ["https://example.com/catalog"]
def start_requests(self):
proxy = "http://user:pass@45.86.1.10:8000"
for url in self.start_urls:
yield scrapy.Request(url, meta={"proxy": proxy})
For automatic rotation, use packages like scrapy-rotating-proxies or your own middleware that picks an IP from the pool and bans "dead" addresses.
Selenium: a proxy in the browser
For dynamic JS sites you need a real browser. A proxy without authentication is set via a launch argument:
from selenium import webdriver
opts = webdriver.ChromeOptions()
opts.add_argument("--proxy-server=http://45.86.1.10:8000")
driver = webdriver.Chrome(options=opts)
driver.get("https://httpbin.org/ip")
If the proxy uses a login and password, Chrome won't accept them in the string — you need a small auth extension or an anti-detect browser that handles proxies out of the box.
How to avoid bans: the rules
- Add delays and a random
User-Agent— don't blast hundreds of requests per second from one IP. - Handle 429 and CAPTCHA — when they appear, switch the IP and slow down.
- Test your proxies for speed and leaks before work: "How to Test a Proxy".
- Respect robots.txt and don't collect personal data.
Case study: scraping prices in 20 threads
A team was collecting 500,000 products a day and got banned within 10–15 minutes from a single datacenter IP. They switched to a pool of residential IPs with per-request rotation and load balancing: the scraper ran in 20 threads without CAPTCHAs, and a full catalog crawl dropped from 9 hours to 40 minutes. Separate static mobile IPs were assigned for dashboards and authorized areas.
FAQ
Which proxies are best for requests and Scrapy?
For simple sites — datacenter; for marketplaces and aggregators — residential with rotation; for social networks — mobile. Pick yours on the proxies for scraping page.
How do I rotate IPs in Python?
Either pick a random IP from a pool per request, or use a mobile proxy with auto-rotation by timer/link — then you don't need rotation in code.
Why doesn't a password-protected proxy work in Selenium?
Chrome won't accept login:password in --proxy-server. You need an auth extension or an anti-detect browser.
Where can I get proxies for scraping?
In the PROXYLEET catalog or on the proxies for scraping page — choose the type based on anti-bot strength.