Fix: Python requests.exceptions.ConnectionError: Max retries exceeded

Q: How do I fix "Python requests.exceptions.ConnectionError: Max retries exceeded"?

How to fix Python requests ConnectionError Max retries exceeded caused by wrong URL, DNS failure, server down, SSL errors, connection pool exhaustion, and firewall blocks.

The Retry Budget Ran Out

Personally, I rate this error in my top three Python misdiagnosis traps. The outer message looks like one thing (“retries failed”) but the actual cause is always buried in the parenthetical. I have watched smart engineers chase the wrong fix for an hour because they did not read past the first line of the traceback. You run a Python script using the requests library and get:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.example.com', port=443):
Max retries exceeded with url: /endpoint
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x...>:
Failed to establish a new connection: [Errno -2] Name or service not known'))

Or variations:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080):
Max retries exceeded with url: /api/data
(Caused by NewConnectionError('... [Errno 111] Connection refused'))

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.example.com', port=443):
Max retries exceeded with url: /endpoint
(Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED]')))

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='...', port=443):
Max retries exceeded with url: /path

The requests library tried to connect to a server multiple times and failed every time. The underlying cause is in the parenthetical message after “Caused by.”

Quick Reference Before You Dive In

If you arrived here from Google with a fresh traceback, the five facts that resolve roughly 90 percent of cases:

READ THE “Caused by” MESSAGE FIRST. The outer “Max retries exceeded” is generic; the inner message names the actual cause (DNS failure, connection refused, SSL error, timeout). The requests exceptions reference and the urllib3 exceptions docs are the canonical sources.
Connection refused means the server is not listening. Either the process is down, listening on a different port, or a firewall is blocking the local-to-server path. Test with nc -zv host port.
Name or service not known is DNS. The hostname does not resolve. Test with nslookup or dig. Inside Docker, check the container’s /etc/resolv.conf.
A slow “Max retries exceeded” (multi-second) usually means downstream slowness or 5xx retries, NOT a hard misconfiguration. A fast one (milliseconds) usually means DNS or connection refused. The timing is a strong diagnostic signal.
requests.get(url, timeout=N) is NOT optional in production. Without timeout, the call can hang forever. Always set both connect and read timeouts: timeout=(5, 30).

The rest of this article walks through each cause in detail, plus the failure modes most other guides skip.

Why the Outer Message Hides the Real Cause

The requests library (via urllib3) automatically retries failed connections. When all retries are exhausted, it raises ConnectionError with “Max retries exceeded.” The real error is the nested cause wrapped inside the exception, not the outer message itself. Reading only the top line is the most common reason developers get stuck; every fix below depends on which inner error you actually have.

Common causes (look at the “Caused by” message):

Name or service not known / getaddrinfo failed: DNS resolution failed. The hostname does not exist or DNS is unreachable.
Connection refused: the server is not running or not listening on that port.
Connection timed out: the server is unreachable (firewall, wrong IP, network issue).
SSLError / CERTIFICATE_VERIFY_FAILED: SSL/TLS certificate verification failed.
Too many open files: connection pool or file descriptor exhaustion.
Network is unreachable: no network connectivity at all.

There is also a subtler structural cause. By default requests uses zero urllib3 retries on connection-level failures, but it does retry on transport-level events like dropped connections during keep-alive. If you mount a custom adapter with Retry(total=5, ...) and a downstream service starts returning 503, requests quietly retries five times before raising. That means a “Max retries exceeded” error that took 30+ seconds to appear is usually a downstream slowness or intermittent failure, not a hard misconfiguration. Distinguishing fast failures (millisecond: DNS, refused) from slow ones (multi-second: timeout, downstream 5xx) is the single best heuristic for picking the right fix.

In Production: Incident Lens

In production this error means a dependency has degraded. The downstream service is slow, returning 503s, or unreachable; your client exhausts its retry budget and fails the request. The blast radius depends entirely on which dependency. If the failed call is to your auth service, every request fails and the incident is global. If it is to a recommendations service that you wrap in a cache fallback, the user sees stale recommendations and no one pages.

The monitoring signal is downstream-side first: error rate and 99p latency on the upstream service the client is calling. On the client side, watch outbound HTTP error rate per host, retry count per request, and saturation of the connection pool. A spike in retry count without a corresponding spike in actual failures means your client is masking the issue and burning latency budget; the request eventually succeeds but takes 10x longer. The correct alert is on the ratio of retried-then-succeeded requests, not just on outright failures.

Recovery is circuit breaking: detect the elevated error rate and stop sending new requests for a cool-down window, returning a fallback or a fast 503 instead. Without a breaker, every caller piles into the dying dependency, exhausts your thread pool, and the failure spreads to unrelated endpoints. Postmortem preventives are a strict timeout budget (every outbound call has both a connect and read timeout, and the sum is less than the parent request’s deadline), retry with jitter (do not retry instantly; spread the retries with random backoff so all clients do not slam the recovering service simultaneously), and a bulkhead (cap concurrent calls per dependency so one slow service cannot exhaust your whole worker pool).

When to Use Which Fix

The next eight sections cover the fixes in detail. The table below maps your “Caused by” message to the recommended fix.

Inner cause message	Recommended fix	Why
Whatever you put in the URL	Fix 1: verify the URL with curl / nslookup	Most common is a typo or wrong scheme
`Connection refused`	Fix 2: confirm server is listening	Server down or wrong port
Transient 5xx or dropped connection	Fix 3: retry with backoff, jitter, status_forcelist	Use urllib3 `Retry` adapter
`SSLError` / `CERTIFICATE_VERIFY_FAILED`	Fix 4: fix CA bundle, NOT `verify=False`	Trust chain problem
`Name or service not known` / `getaddrinfo`	Fix 5: check DNS resolver, `/etc/hosts`, Docker `--dns`	DNS layer broken
Connection pool / too many open files	Fix 6: reuse `Session`, increase pool size	Per-request connection creation exhausts FDs
Behind corporate proxy or firewall	Fix 7: configure `proxies` dict or env vars	Network requires proxy
`Connection timed out`	Fix 8: tune `timeout` per call, distinguish connect vs read	Need explicit timeouts

If multiple rows apply, pick the topmost match for your inner cause.

Fix 1: Check the URL

The most common cause is a wrong URL:

import requests

# Wrong: typo in hostname
response = requests.get("https://api.exmple.com/data")  # "exmple" not "example"

# Wrong: HTTP vs HTTPS
response = requests.get("https://localhost:8080/api")  # Server only supports HTTP
response = requests.get("http://localhost:8080/api")   # Fixed

# Wrong: missing port
response = requests.get("http://localhost/api")   # Tries port 80, server is on 8080
response = requests.get("http://localhost:8080/api")  # Fixed

# Wrong: trailing slash matters for some APIs
response = requests.get("https://api.example.com/users")
response = requests.get("https://api.example.com/users/")  # Try with/without

Verify the URL is reachable:

# Test from the command line
curl -v https://api.example.com/endpoint
ping api.example.com
nslookup api.example.com

A small habit that has saved me hours: always log the full URL one line above the failing requests.get call during local debugging. Empty f-string variables, double slashes, and missing schemes are responsible for an embarrassing share of “Max retries exceeded” reports. One print(url) would have caught them at the source.

Fix 2: Check if the Server is Running

If the error says “Connection refused”:

# The server at localhost:8080 is not running
requests.get("http://localhost:8080/api")
# ConnectionError: ... Connection refused

Check the server:

# Is the process running?
ps aux | grep my_server

# Is something listening on the port?
ss -tlnp | grep 8080
# or
netstat -tlnp | grep 8080

# Start the server
python manage.py runserver 0.0.0.0:8080

For Docker services:

docker ps  # Check if the container is running
docker logs my-container  # Check for startup errors

Common in development: You started your client script before the server finished starting up. Add a startup delay or retry logic.

Fix 3: Add Retry Logic with Backoff

For transient network issues, add proper retry handling:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()

retries = Retry(
    total=5,              # Total number of retries
    backoff_factor=1,     # Wait 1, 2, 4, 8, 16 seconds between retries
    status_forcelist=[500, 502, 503, 504],  # Retry on these HTTP status codes
    allowed_methods=["GET", "POST"],         # Which methods to retry
)

adapter = HTTPAdapter(max_retries=retries)
session.mount("http://", adapter)
session.mount("https://", adapter)

response = session.get("https://api.example.com/data", timeout=10)

Simple retry with exponential backoff:

import time
import requests

def fetch_with_retry(url, max_retries=3, timeout=10):
    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=timeout)
            response.raise_for_status()
            return response
        except requests.exceptions.ConnectionError as e:
            if attempt < max_retries - 1:
                wait = 2 ** attempt  # 1, 2, 4 seconds
                print(f"Connection failed, retrying in {wait}s...")
                time.sleep(wait)
            else:
                raise

A specific failure I have shipped once and watched colleagues ship many times: omitting timeout on requests.get(). Without it, the call can hang indefinitely, which in a worker thread means that worker is dead until the OS kills the process. Make timeout=(connect, read) a baseline; for most calls timeout=(5, 30) is sensible.

Fix 4: Fix SSL Certificate Errors

If the error contains SSLError or CERTIFICATE_VERIFY_FAILED:

# Quick fix for development only: disable SSL verification
response = requests.get("https://api.example.com/data", verify=False)
# Warning: this disables ALL certificate checks; never use in production!

Proper fix: specify the CA bundle:

response = requests.get("https://api.example.com/data", verify="/path/to/ca-bundle.crt")

Fix: update certifi (Python’s CA bundle):

pip install --upgrade certifi

Fix: install system certificates:

# macOS
/Applications/Python\ 3.x/Install\ Certificates.command

# Linux
sudo apt install ca-certificates
sudo update-ca-certificates

# pip behind corporate proxy with custom CA
pip install --cert /path/to/corporate-ca.pem requests

For self-signed certificates:

# Add the self-signed cert to the trusted bundle
import certifi
import os

# Option 1: Point to your certificate
response = requests.get("https://internal-api.company.com", verify="/path/to/self-signed.crt")

# Option 2: Set environment variable
os.environ["REQUESTS_CA_BUNDLE"] = "/path/to/custom-ca-bundle.crt"

For general SSL certificate issues, see Fix: Python SSL certificate verify failed.

Fix 5: Fix DNS Resolution Issues

If the error says “Name or service not known” or “getaddrinfo failed”:

# Test DNS resolution
import socket

try:
    ip = socket.gethostbyname("api.example.com")
    print(f"Resolved to: {ip}")
except socket.gaierror as e:
    print(f"DNS resolution failed: {e}")

Common DNS fixes:

# Check DNS resolution
nslookup api.example.com
dig api.example.com

# Flush DNS cache
# Linux
sudo systemd-resolve --flush-caches
# macOS
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
# Windows
ipconfig /flushdns

For Docker containers (DNS often broken):

# docker-compose.yml
services:
  app:
    dns:
      - 8.8.8.8
      - 8.8.4.4

Use IP address instead of hostname as a workaround:

# If DNS is the issue, connect directly to the IP
response = requests.get("https://93.184.216.34/data",
                        headers={"Host": "api.example.com"})

Fix 6: Fix Connection Pool Exhaustion

If you make many requests in rapid succession, the connection pool can run out:

# Wrong: creates a new session for every request
for url in thousands_of_urls:
    response = requests.get(url)  # Each creates a new connection

# Fixed: reuse a session
session = requests.Session()
for url in thousands_of_urls:
    response = session.get(url)  # Reuses connections via keep-alive

Increase the pool size for concurrent requests:

from requests.adapters import HTTPAdapter

session = requests.Session()
adapter = HTTPAdapter(
    pool_connections=20,   # Number of connection pools
    pool_maxsize=20,       # Connections per pool
)
session.mount("http://", adapter)
session.mount("https://", adapter)

Close connections properly:

# Use context manager
with requests.Session() as session:
    response = session.get("https://api.example.com/data")
    # Session is closed when the block exits

Fix 7: Fix Firewall and Proxy Issues

Check if a firewall is blocking the connection:

# Test TCP connectivity
nc -zv api.example.com 443
# or
telnet api.example.com 443

Configure proxy settings:

proxies = {
    "http": "http://proxy.company.com:8080",
    "https": "http://proxy.company.com:8080",
}

response = requests.get("https://api.example.com/data", proxies=proxies)

Or set environment variables:

export HTTP_PROXY="http://proxy.company.com:8080"
export HTTPS_PROXY="http://proxy.company.com:8080"
export NO_PROXY="localhost,127.0.0.1,.internal.company.com"

Bypass proxy for local connections:

response = requests.get("http://localhost:8080/api", proxies={"http": None, "https": None})

Fix 8: Fix Timeout Issues

If the error mentions “timed out”, the server is too slow or unreachable:

# Set explicit timeouts (connect_timeout, read_timeout)
response = requests.get("https://slow-api.example.com/data", timeout=(5, 30))

# 5 seconds to establish the connection
# 30 seconds to receive the response

For very slow APIs:

response = requests.get("https://slow-api.example.com/large-export", timeout=(10, 300))
# 5 minutes read timeout for large responses

With streaming for large responses:

with requests.get("https://example.com/large-file.zip", stream=True, timeout=10) as r:
    r.raise_for_status()
    with open("large-file.zip", "wb") as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

For deeper timeout tuning and read vs connect distinctions, see Fix: Python requests timeout.

Stranger Causes I Have Tracked Down

Check for rate limiting. Some APIs block you after too many requests:

response = requests.get("https://api.example.com/data")
if response.status_code == 429:
    retry_after = int(response.headers.get("Retry-After", 60))
    time.sleep(retry_after)

Check for IPv6 issues. If the hostname resolves to both IPv4 and IPv6, and IPv6 is not configured properly:

# Force IPv4
import requests
from urllib3.util.connection import allowed_gai_family
import socket

# Monkey-patch to force IPv4
requests.packages.urllib3.util.connection.allowed_gai_family = lambda: socket.AF_INET

Check system resource limits:

# Check file descriptor limit
ulimit -n

# Increase if needed
ulimit -n 65536

Check for a stale keep-alive connection. Long-lived requests.Session objects keep connections in a pool and reuse them. If the server, an intermediate load balancer, or a NAT device closes the connection silently after an idle period (commonly 60 to 300 seconds), the next request you send on that connection hits a half-closed socket and fails with a connection-reset variant of ConnectionError. The retry usually succeeds because it opens a fresh connection. Either set Connection: close on long-lived sessions, periodically rebuild the session, or set a shorter pool_block timeout. This is by far the most common cause of intermittent “Max retries exceeded” in long-running background workers.

Check for MTU mismatches in containerized environments. If your container is on a Docker bridge network with default 1500-byte MTU but the underlying host network uses a smaller MTU (commonly 1450 on cloud VPNs or overlay networks), large request bodies hang and eventually time out. The connection establishes (small SYN/ACK fits) but the first big POST never completes. Set the container network MTU explicitly to match the host:

# docker-compose.yml
networks:
  default:
    driver: bridge
    driver_opts:
      com.docker.network.driver.mtu: "1450"

Check for DNS resolution that succeeds but returns the wrong IP. Split-horizon DNS, stale entries cached by the JVM-style gethostbyname cache, or a misconfigured /etc/hosts entry can make nslookup show the right answer while Python connects to the wrong IP. Capture what Python actually resolves:

import socket
print(socket.getaddrinfo("api.example.com", 443))

If the returned IP differs from dig output, you have a name resolution mismatch, usually a NSCD or systemd-resolved cache. Restart the resolver.

Debug the exact connection failure:

import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("urllib3").setLevel(logging.DEBUG)

response = requests.get("https://api.example.com/data")
# Shows detailed connection attempts and failures

What Other Tutorials Get Wrong About This Error

Most Python networking tutorials list the same fixes but frame them in ways that produce subtle bugs.

They recommend wrapping every call in try/except ConnectionError: retry. Without backoff and a status_forcelist, you create retry storms that overwhelm a recovering service. The right pattern is the urllib3 Retry adapter with exponential backoff and jitter, not a bare loop.

They omit the “read the inner Caused by” rule. The outer message is generic; the inner cause names the actual problem. Tutorials that show “Max retries exceeded” without explaining how to find the real error send readers chasing the wrong fix for each cause.

They recommend verify=False for SSL errors. This is the same antipattern as in Python SSL tutorials: it disables certificate checks entirely. The right fix is to install the CA bundle or point REQUESTS_CA_BUNDLE at the corporate root cert. Articles that show verify=False train readers to ship insecure scripts.

They omit timeout from every example. Without timeout, the call can hang indefinitely. Tutorials that copy requests.get(url) snippets without timeouts produce production code that locks up worker threads on first slow server.

They confuse retry storms with helpful resilience. Setting total=10 retries with no backoff means a misbehaving downstream gets hammered ten times in a second. This makes the outage worse. Use Retry(total=5, backoff_factor=1, status_forcelist=[500,502,503,504]) so retries actually space out.

They miss the keep-alive staleness pattern. A requests.Session reuses connections. NAT and load balancers silently close idle connections after a minute or two. The next request on the stale connection raises this exact error. Tutorials that recommend sessions for performance without flagging this make intermittent bugs harder to diagnose.

Frequently Asked Questions

How do I know which inner cause my error has?

Read the parenthetical after “Caused by.” Common inner messages: NewConnectionError (DNS or connection-level failure), SSLError (certificate problem), ReadTimeoutError (server too slow), ProtocolError (server closed the connection). The fix differs by cause; the outer “Max retries exceeded” is just the wrapper.

Why does my call succeed in a browser but fail in Python?

Three common reasons. First, browsers send headers (User-Agent, Accept, cookies) that many APIs require; Python’s defaults are different. Second, browsers fetch missing TLS intermediate certificates via AIA; Python does not. Third, browsers use the system trust store; Python uses certifi or its own CA bundle. Set explicit headers and check the trust chain.

Should I set a connect timeout shorter than the read timeout?

Yes. TCP connection establishment should be fast (sub-second on healthy networks); a slow connect almost always means an unreachable host. Reading a response body can legitimately take longer for large payloads. A typical pattern is timeout=(5, 30): fail fast on connect, allow slow downloads.

What is the difference between requests.get(timeout=N) and Retry(total=N)?

timeout is per-attempt: how long one request can take. Retry(total=N) is across attempts: how many times urllib3 will retry the request if it fails. They are complementary: set a tight timeout to fail fast per attempt, then let Retry handle transient failures.

Why does my Docker container fail with Name or service not known when my host works?

The container’s DNS is likely broken. Docker copies /etc/resolv.conf from the host on container start; if the host’s resolver points to 127.0.0.53 (systemd-resolved) the container cannot reach it. Pass --dns 8.8.8.8 to docker run or add dns: to your docker-compose.yml service.

Is verify=False ever acceptable?

For one-off debugging against a known endpoint, briefly, yes. For scripts that handle data or are checked into version control, never. The same Stack Overflow answer that recommends verify=False is the seed of half the security incidents in small Python codebases.

For Python import errors when installing requests, see Fix: Python ModuleNotFoundError: No module named. For general connection refused errors, see Fix: ERR_CONNECTION_REFUSED localhost.