Fix: Python requests.exceptions.ConnectionError: Max retries exceeded
Part of: Python Errors
Quick Answer
How to fix Python requests ConnectionError Max retries exceeded caused by wrong URL, DNS failure, server down, SSL errors, connection pool exhaustion, and firewall blocks.
The Retry Budget Ran Out
Personally, I rate this error in my top three Python misdiagnosis traps. The outer message looks like one thing (“retries failed”) but the actual cause is always buried in the parenthetical. I have watched smart engineers chase the wrong fix for an hour because they did not read past the first line of the traceback. You run a Python script using the requests library and get:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.example.com', port=443):
Max retries exceeded with url: /endpoint
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x...>:
Failed to establish a new connection: [Errno -2] Name or service not known'))Or variations:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080):
Max retries exceeded with url: /api/data
(Caused by NewConnectionError('... [Errno 111] Connection refused'))requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.example.com', port=443):
Max retries exceeded with url: /endpoint
(Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED]')))urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='...', port=443):
Max retries exceeded with url: /pathThe requests library tried to connect to a server multiple times and failed every time. The underlying cause is in the parenthetical message after “Caused by.”
Quick Reference Before You Dive In
If you arrived here from Google with a fresh traceback, the five facts that resolve roughly 90 percent of cases:
- READ THE “Caused by” MESSAGE FIRST. The outer “Max retries exceeded” is generic; the inner message names the actual cause (DNS failure, connection refused, SSL error, timeout). The
requestsexceptions reference and the urllib3 exceptions docs are the canonical sources. Connection refusedmeans the server is not listening. Either the process is down, listening on a different port, or a firewall is blocking the local-to-server path. Test withnc -zv host port.Name or service not knownis DNS. The hostname does not resolve. Test withnslookupordig. Inside Docker, check the container’s/etc/resolv.conf.- A slow “Max retries exceeded” (multi-second) usually means downstream slowness or 5xx retries, NOT a hard misconfiguration. A fast one (milliseconds) usually means DNS or connection refused. The timing is a strong diagnostic signal.
requests.get(url, timeout=N)is NOT optional in production. Withouttimeout, the call can hang forever. Always set both connect and read timeouts:timeout=(5, 30).
The rest of this article walks through each cause in detail, plus the failure modes most other guides skip.
Why the Outer Message Hides the Real Cause
The requests library (via urllib3) automatically retries failed connections. When all retries are exhausted, it raises ConnectionError with “Max retries exceeded.” The real error is the nested cause wrapped inside the exception, not the outer message itself. Reading only the top line is the most common reason developers get stuck; every fix below depends on which inner error you actually have.
Common causes (look at the “Caused by” message):
Name or service not known/getaddrinfo failed: DNS resolution failed. The hostname does not exist or DNS is unreachable.Connection refused: the server is not running or not listening on that port.Connection timed out: the server is unreachable (firewall, wrong IP, network issue).SSLError/CERTIFICATE_VERIFY_FAILED: SSL/TLS certificate verification failed.Too many open files: connection pool or file descriptor exhaustion.Network is unreachable: no network connectivity at all.
There is also a subtler structural cause. By default requests uses zero urllib3 retries on connection-level failures, but it does retry on transport-level events like dropped connections during keep-alive. If you mount a custom adapter with Retry(total=5, ...) and a downstream service starts returning 503, requests quietly retries five times before raising. That means a “Max retries exceeded” error that took 30+ seconds to appear is usually a downstream slowness or intermittent failure, not a hard misconfiguration. Distinguishing fast failures (millisecond: DNS, refused) from slow ones (multi-second: timeout, downstream 5xx) is the single best heuristic for picking the right fix.
In Production: Incident Lens
In production this error means a dependency has degraded. The downstream service is slow, returning 503s, or unreachable; your client exhausts its retry budget and fails the request. The blast radius depends entirely on which dependency. If the failed call is to your auth service, every request fails and the incident is global. If it is to a recommendations service that you wrap in a cache fallback, the user sees stale recommendations and no one pages.
The monitoring signal is downstream-side first: error rate and 99p latency on the upstream service the client is calling. On the client side, watch outbound HTTP error rate per host, retry count per request, and saturation of the connection pool. A spike in retry count without a corresponding spike in actual failures means your client is masking the issue and burning latency budget; the request eventually succeeds but takes 10x longer. The correct alert is on the ratio of retried-then-succeeded requests, not just on outright failures.
Recovery is circuit breaking: detect the elevated error rate and stop sending new requests for a cool-down window, returning a fallback or a fast 503 instead. Without a breaker, every caller piles into the dying dependency, exhausts your thread pool, and the failure spreads to unrelated endpoints. Postmortem preventives are a strict timeout budget (every outbound call has both a connect and read timeout, and the sum is less than the parent request’s deadline), retry with jitter (do not retry instantly; spread the retries with random backoff so all clients do not slam the recovering service simultaneously), and a bulkhead (cap concurrent calls per dependency so one slow service cannot exhaust your whole worker pool).
When to Use Which Fix
The next eight sections cover the fixes in detail. The table below maps your “Caused by” message to the recommended fix.
| Inner cause message | Recommended fix | Why |
|---|---|---|
| Whatever you put in the URL | Fix 1: verify the URL with curl / nslookup | Most common is a typo or wrong scheme |
Connection refused | Fix 2: confirm server is listening | Server down or wrong port |
| Transient 5xx or dropped connection | Fix 3: retry with backoff, jitter, status_forcelist | Use urllib3 Retry adapter |
SSLError / CERTIFICATE_VERIFY_FAILED | Fix 4: fix CA bundle, NOT verify=False | Trust chain problem |
Name or service not known / getaddrinfo | Fix 5: check DNS resolver, /etc/hosts, Docker --dns | DNS layer broken |
| Connection pool / too many open files | Fix 6: reuse Session, increase pool size | Per-request connection creation exhausts FDs |
| Behind corporate proxy or firewall | Fix 7: configure proxies dict or env vars | Network requires proxy |
Connection timed out | Fix 8: tune timeout per call, distinguish connect vs read | Need explicit timeouts |
If multiple rows apply, pick the topmost match for your inner cause.
Fix 1: Check the URL
The most common cause is a wrong URL:
import requests
# Wrong: typo in hostname
response = requests.get("https://api.exmple.com/data") # "exmple" not "example"
# Wrong: HTTP vs HTTPS
response = requests.get("https://localhost:8080/api") # Server only supports HTTP
response = requests.get("http://localhost:8080/api") # Fixed
# Wrong: missing port
response = requests.get("http://localhost/api") # Tries port 80, server is on 8080
response = requests.get("http://localhost:8080/api") # Fixed
# Wrong: trailing slash matters for some APIs
response = requests.get("https://api.example.com/users")
response = requests.get("https://api.example.com/users/") # Try with/withoutVerify the URL is reachable:
# Test from the command line
curl -v https://api.example.com/endpoint
ping api.example.com
nslookup api.example.comA small habit that has saved me hours: always log the full URL one line above the failing requests.get call during local debugging. Empty f-string variables, double slashes, and missing schemes are responsible for an embarrassing share of “Max retries exceeded” reports. One print(url) would have caught them at the source.
Fix 2: Check if the Server is Running
If the error says “Connection refused”:
# The server at localhost:8080 is not running
requests.get("http://localhost:8080/api")
# ConnectionError: ... Connection refusedCheck the server:
# Is the process running?
ps aux | grep my_server
# Is something listening on the port?
ss -tlnp | grep 8080
# or
netstat -tlnp | grep 8080
# Start the server
python manage.py runserver 0.0.0.0:8080For Docker services:
docker ps # Check if the container is running
docker logs my-container # Check for startup errorsCommon in development: You started your client script before the server finished starting up. Add a startup delay or retry logic.
Fix 3: Add Retry Logic with Backoff
For transient network issues, add proper retry handling:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
retries = Retry(
total=5, # Total number of retries
backoff_factor=1, # Wait 1, 2, 4, 8, 16 seconds between retries
status_forcelist=[500, 502, 503, 504], # Retry on these HTTP status codes
allowed_methods=["GET", "POST"], # Which methods to retry
)
adapter = HTTPAdapter(max_retries=retries)
session.mount("http://", adapter)
session.mount("https://", adapter)
response = session.get("https://api.example.com/data", timeout=10)Simple retry with exponential backoff:
import time
import requests
def fetch_with_retry(url, max_retries=3, timeout=10):
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=timeout)
response.raise_for_status()
return response
except requests.exceptions.ConnectionError as e:
if attempt < max_retries - 1:
wait = 2 ** attempt # 1, 2, 4 seconds
print(f"Connection failed, retrying in {wait}s...")
time.sleep(wait)
else:
raiseA specific failure I have shipped once and watched colleagues ship many times: omitting timeout on requests.get(). Without it, the call can hang indefinitely, which in a worker thread means that worker is dead until the OS kills the process. Make timeout=(connect, read) a baseline; for most calls timeout=(5, 30) is sensible.
Fix 4: Fix SSL Certificate Errors
If the error contains SSLError or CERTIFICATE_VERIFY_FAILED:
# Quick fix for development only: disable SSL verification
response = requests.get("https://api.example.com/data", verify=False)
# Warning: this disables ALL certificate checks; never use in production!Proper fix: specify the CA bundle:
response = requests.get("https://api.example.com/data", verify="/path/to/ca-bundle.crt")Fix: update certifi (Python’s CA bundle):
pip install --upgrade certifiFix: install system certificates:
# macOS
/Applications/Python\ 3.x/Install\ Certificates.command
# Linux
sudo apt install ca-certificates
sudo update-ca-certificates
# pip behind corporate proxy with custom CA
pip install --cert /path/to/corporate-ca.pem requestsFor self-signed certificates:
# Add the self-signed cert to the trusted bundle
import certifi
import os
# Option 1: Point to your certificate
response = requests.get("https://internal-api.company.com", verify="/path/to/self-signed.crt")
# Option 2: Set environment variable
os.environ["REQUESTS_CA_BUNDLE"] = "/path/to/custom-ca-bundle.crt"For general SSL certificate issues, see Fix: Python SSL certificate verify failed.
Fix 5: Fix DNS Resolution Issues
If the error says “Name or service not known” or “getaddrinfo failed”:
# Test DNS resolution
import socket
try:
ip = socket.gethostbyname("api.example.com")
print(f"Resolved to: {ip}")
except socket.gaierror as e:
print(f"DNS resolution failed: {e}")Common DNS fixes:
# Check DNS resolution
nslookup api.example.com
dig api.example.com
# Flush DNS cache
# Linux
sudo systemd-resolve --flush-caches
# macOS
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
# Windows
ipconfig /flushdnsFor Docker containers (DNS often broken):
# docker-compose.yml
services:
app:
dns:
- 8.8.8.8
- 8.8.4.4Use IP address instead of hostname as a workaround:
# If DNS is the issue, connect directly to the IP
response = requests.get("https://93.184.216.34/data",
headers={"Host": "api.example.com"})Fix 6: Fix Connection Pool Exhaustion
If you make many requests in rapid succession, the connection pool can run out:
# Wrong: creates a new session for every request
for url in thousands_of_urls:
response = requests.get(url) # Each creates a new connection
# Fixed: reuse a session
session = requests.Session()
for url in thousands_of_urls:
response = session.get(url) # Reuses connections via keep-aliveIncrease the pool size for concurrent requests:
from requests.adapters import HTTPAdapter
session = requests.Session()
adapter = HTTPAdapter(
pool_connections=20, # Number of connection pools
pool_maxsize=20, # Connections per pool
)
session.mount("http://", adapter)
session.mount("https://", adapter)Close connections properly:
# Use context manager
with requests.Session() as session:
response = session.get("https://api.example.com/data")
# Session is closed when the block exitsFix 7: Fix Firewall and Proxy Issues
Check if a firewall is blocking the connection:
# Test TCP connectivity
nc -zv api.example.com 443
# or
telnet api.example.com 443Configure proxy settings:
proxies = {
"http": "http://proxy.company.com:8080",
"https": "http://proxy.company.com:8080",
}
response = requests.get("https://api.example.com/data", proxies=proxies)Or set environment variables:
export HTTP_PROXY="http://proxy.company.com:8080"
export HTTPS_PROXY="http://proxy.company.com:8080"
export NO_PROXY="localhost,127.0.0.1,.internal.company.com"Bypass proxy for local connections:
response = requests.get("http://localhost:8080/api", proxies={"http": None, "https": None})Fix 8: Fix Timeout Issues
If the error mentions “timed out”, the server is too slow or unreachable:
# Set explicit timeouts (connect_timeout, read_timeout)
response = requests.get("https://slow-api.example.com/data", timeout=(5, 30))
# 5 seconds to establish the connection
# 30 seconds to receive the responseFor very slow APIs:
response = requests.get("https://slow-api.example.com/large-export", timeout=(10, 300))
# 5 minutes read timeout for large responsesWith streaming for large responses:
with requests.get("https://example.com/large-file.zip", stream=True, timeout=10) as r:
r.raise_for_status()
with open("large-file.zip", "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)For deeper timeout tuning and read vs connect distinctions, see Fix: Python requests timeout.
Stranger Causes I Have Tracked Down
Check for rate limiting. Some APIs block you after too many requests:
response = requests.get("https://api.example.com/data")
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
time.sleep(retry_after)Check for IPv6 issues. If the hostname resolves to both IPv4 and IPv6, and IPv6 is not configured properly:
# Force IPv4
import requests
from urllib3.util.connection import allowed_gai_family
import socket
# Monkey-patch to force IPv4
requests.packages.urllib3.util.connection.allowed_gai_family = lambda: socket.AF_INETCheck system resource limits:
# Check file descriptor limit
ulimit -n
# Increase if needed
ulimit -n 65536Check for a stale keep-alive connection. Long-lived requests.Session objects keep connections in a pool and reuse them. If the server, an intermediate load balancer, or a NAT device closes the connection silently after an idle period (commonly 60 to 300 seconds), the next request you send on that connection hits a half-closed socket and fails with a connection-reset variant of ConnectionError. The retry usually succeeds because it opens a fresh connection. Either set Connection: close on long-lived sessions, periodically rebuild the session, or set a shorter pool_block timeout. This is by far the most common cause of intermittent “Max retries exceeded” in long-running background workers.
Check for MTU mismatches in containerized environments. If your container is on a Docker bridge network with default 1500-byte MTU but the underlying host network uses a smaller MTU (commonly 1450 on cloud VPNs or overlay networks), large request bodies hang and eventually time out. The connection establishes (small SYN/ACK fits) but the first big POST never completes. Set the container network MTU explicitly to match the host:
# docker-compose.yml
networks:
default:
driver: bridge
driver_opts:
com.docker.network.driver.mtu: "1450"Check for DNS resolution that succeeds but returns the wrong IP. Split-horizon DNS, stale entries cached by the JVM-style gethostbyname cache, or a misconfigured /etc/hosts entry can make nslookup show the right answer while Python connects to the wrong IP. Capture what Python actually resolves:
import socket
print(socket.getaddrinfo("api.example.com", 443))If the returned IP differs from dig output, you have a name resolution mismatch, usually a NSCD or systemd-resolved cache. Restart the resolver.
Debug the exact connection failure:
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("urllib3").setLevel(logging.DEBUG)
response = requests.get("https://api.example.com/data")
# Shows detailed connection attempts and failuresWhat Other Tutorials Get Wrong About This Error
Most Python networking tutorials list the same fixes but frame them in ways that produce subtle bugs.
They recommend wrapping every call in try/except ConnectionError: retry. Without backoff and a status_forcelist, you create retry storms that overwhelm a recovering service. The right pattern is the urllib3 Retry adapter with exponential backoff and jitter, not a bare loop.
They omit the “read the inner Caused by” rule. The outer message is generic; the inner cause names the actual problem. Tutorials that show “Max retries exceeded” without explaining how to find the real error send readers chasing the wrong fix for each cause.
They recommend verify=False for SSL errors. This is the same antipattern as in Python SSL tutorials: it disables certificate checks entirely. The right fix is to install the CA bundle or point REQUESTS_CA_BUNDLE at the corporate root cert. Articles that show verify=False train readers to ship insecure scripts.
They omit timeout from every example. Without timeout, the call can hang indefinitely. Tutorials that copy requests.get(url) snippets without timeouts produce production code that locks up worker threads on first slow server.
They confuse retry storms with helpful resilience. Setting total=10 retries with no backoff means a misbehaving downstream gets hammered ten times in a second. This makes the outage worse. Use Retry(total=5, backoff_factor=1, status_forcelist=[500,502,503,504]) so retries actually space out.
They miss the keep-alive staleness pattern. A requests.Session reuses connections. NAT and load balancers silently close idle connections after a minute or two. The next request on the stale connection raises this exact error. Tutorials that recommend sessions for performance without flagging this make intermittent bugs harder to diagnose.
Frequently Asked Questions
How do I know which inner cause my error has?
Read the parenthetical after “Caused by.” Common inner messages: NewConnectionError (DNS or connection-level failure), SSLError (certificate problem), ReadTimeoutError (server too slow), ProtocolError (server closed the connection). The fix differs by cause; the outer “Max retries exceeded” is just the wrapper.
Why does my call succeed in a browser but fail in Python?
Three common reasons. First, browsers send headers (User-Agent, Accept, cookies) that many APIs require; Python’s defaults are different. Second, browsers fetch missing TLS intermediate certificates via AIA; Python does not. Third, browsers use the system trust store; Python uses certifi or its own CA bundle. Set explicit headers and check the trust chain.
Should I set a connect timeout shorter than the read timeout?
Yes. TCP connection establishment should be fast (sub-second on healthy networks); a slow connect almost always means an unreachable host. Reading a response body can legitimately take longer for large payloads. A typical pattern is timeout=(5, 30): fail fast on connect, allow slow downloads.
What is the difference between requests.get(timeout=N) and Retry(total=N)?
timeout is per-attempt: how long one request can take. Retry(total=N) is across attempts: how many times urllib3 will retry the request if it fails. They are complementary: set a tight timeout to fail fast per attempt, then let Retry handle transient failures.
Why does my Docker container fail with Name or service not known when my host works?
The container’s DNS is likely broken. Docker copies /etc/resolv.conf from the host on container start; if the host’s resolver points to 127.0.0.53 (systemd-resolved) the container cannot reach it. Pass --dns 8.8.8.8 to docker run or add dns: to your docker-compose.yml service.
Is verify=False ever acceptable?
For one-off debugging against a known endpoint, briefly, yes. For scripts that handle data or are checked into version control, never. The same Stack Overflow answer that recommends verify=False is the seed of half the security incidents in small Python codebases.
For Python import errors when installing requests, see Fix: Python ModuleNotFoundError: No module named. For general connection refused errors, see Fix: ERR_CONNECTION_REFUSED localhost.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: joblib Not Working — Parallel Backends, Memory Cache, and Pickling Errors
How to fix joblib errors — Parallel n_jobs slower than expected, Memory cache miss, backend loky vs threading vs multiprocessing, pickling lambda not supported, dump load file size, and pytest interference.
Fix: Marshmallow Not Working — Schema Errors, Load vs Dump, and Field Validation
How to fix Marshmallow errors — Schema not validated on dump, ValidationError messages format, unknown field handling, missing vs default, post_load object construction, and Marshmallow 3 to 4 migration.
Fix: Pipenv Not Working — Lock File Generation, Shell Activation, and Dependency Resolution
How to fix Pipenv errors — pipenv lock takes forever, Pipfile.lock not generated, shell activation broken, no virtualenv created, dependency conflict, and migration to uv or Poetry.
Fix: Copier Not Working — Template Updates, Question Conditions, and Migrations
How to fix Copier errors — copier.yml not found, conditional questions not appearing, update breaks generated project, migrations between versions, Jinja vs YAML escaping, and answers file conflict.