Fix: Kubernetes ImagePullBackOff - Failed to Pull Image
Part of: Docker, DevOps & Infrastructure
Quick Answer
How to fix the Kubernetes ImagePullBackOff and ErrImagePull errors when a pod fails to pull a container image from a registry.
The Pod That Will Not Start
I have triaged this in two flavors: the first deployment of a new cluster (where it is almost always a private-registry credential issue) and a deployment that worked yesterday but suddenly fails (where it is almost always a tag that was overwritten, a registry rate limit, or a node that lost its credential refresh). Both classes look identical in kubectl get pods. The diagnostic question I ask first is always: did this image ever pull successfully on this cluster? You deploy a pod to Kubernetes. It never starts. You check the status:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-7c4b6d9f8-k3m2n 0/1 ImagePullBackOff 0 2m15sOr you see the closely related status:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-7c4b6d9f8-k3m2n 0/1 ErrImagePull 0 45sYou describe the pod and find something like:
$ kubectl describe pod myapp-7c4b6d9f8-k3m2n
...
Events:
Warning Failed 12s (x3 over 58s) kubelet Failed to pull image "myapp:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied
Warning Failed 12s (x3 over 58s) kubelet Error: ErrImagePull
Normal BackOff 1s (x4 over 57s) kubelet Back-off pulling image "myapp:latest"
Warning Failed 1s (x4 over 57s) kubelet Error: ImagePullBackOffThe pod is stuck. Kubernetes cannot pull the container image, and after repeated failures it backs off with increasing delays before retrying.
Quick Reference Before You Dive In
If you arrived here from Google with a stuck pod, the five facts that resolve roughly 90 percent of the cases I have triaged:
ErrImagePullandImagePullBackOffare the same problem at different stages. The former is the most recent pull attempt failing; the latter is the back-off state after several failures. The fix is identical.- Run
kubectl describe pod <name>first. The Events section contains the literal error message from the kubelet, which is far more diagnostic than the status field alone. Status tells you “something is wrong”; Events tells you what. - Almost every cause falls into one of four buckets: wrong image reference (typo, tag, missing registry prefix), missing or wrong authentication (
imagePullSecrets), network reachability (firewall, proxy, DNS), or CPU architecture mismatch (most often arm64 nodes pulling amd64-only images). imagePullSecretsare namespace-scoped. A secret indefaultdoes not work for a pod inproduction. Recreate it in every namespace that needs it, or attach it to the namespace’s default ServiceAccount.- For ECR/GCR/ACR the cloud-native path is the kubelet CredentialProvider plugin (stable since Kubernetes 1.26), not static
imagePullSecrets. The plugin mints short-lived tokens automatically and avoids the 12-hour ECR token expiry problem entirely.
The rest of this article walks through each of those in detail, plus the failure modes most other guides skip.
What the Kubelet Is Actually Doing
ImagePullBackOff is Kubernetes telling you that the kubelet tried to pull a container image and failed. After the initial failure (ErrImagePull), Kubernetes applies an exponential back-off (10s, 20s, 40s, up to 5 minutes) before retrying. The pod stays in this state until the pull succeeds or you fix the underlying problem. The official image-pulling reference on kubernetes.io covers the full state machine.
The kubelet on the node is responsible for pulling images. When you create a pod, the kubelet checks if the image already exists locally. If it does not (or if imagePullPolicy forces a pull), it contacts the container registry, authenticates if needed, and downloads the image layers. Any failure in this chain results in ErrImagePull, which quickly becomes ImagePullBackOff after a few retries.
The root cause is always one of these:
- Wrong image name or tag — a typo, a tag that does not exist, or a missing registry prefix
- Private registry without credentials — the registry requires authentication and the pod has no
imagePullSecrets - Docker Hub rate limits — anonymous or free-tier pulls exceeded the rate limit
- Network issues — the node cannot reach the registry due to firewalls, proxies, or network policies
- Architecture mismatch — the image exists but not for the node’s CPU architecture (e.g., arm64 vs amd64)
- Local image with wrong pull policy — the image exists on the node but Kubernetes tries to pull it from a remote registry anyway
Fix 1: Check the Image Name and Tag
This is the most common cause. A single typo in the image name, registry URL, or tag breaks the pull.
Run kubectl describe pod and look at the exact image reference in the error:
kubectl describe pod myapp-7c4b6d9f8-k3m2n | grep -i imageYou see something like:
Image: myapp:latestCheck for these common mistakes:
- Typos in the image name:
ngixninstead ofnginx,postgressinstead ofpostgres - Missing registry prefix: If your image is on a private registry like
ghcr.ioorregistry.example.com, you need the full path:ghcr.io/myorg/myapp:v1.2.0 - Tag does not exist: You specified
myapp:v2.0but onlyv2.0.0exists. Tags are case-sensitive. - Using
latestwhen nolatesttag exists: Not every image has alatesttag. Some projects only publish versioned tags.
Verify the image exists by pulling it manually on your local machine:
docker pull myapp:latestIf it fails locally, the image reference is wrong. Fix it in your deployment:
spec:
containers:
- name: myapp
image: registry.example.com/myapp:v1.2.0 # full path with valid tagApply the fix:
kubectl apply -f deployment.yamlMy standing rule on every cluster I manage: no latest tag in production manifests, ever. The tag is ambiguous and rollbacks become guesswork because you cannot tell which build is actually running. I pin to either a semantic version (v1.2.0) or, for the strict cases, a SHA digest (myapp@sha256:abc123...). The digest is the only form that survives an upstream registry tag overwrite, which I have seen happen during a contractor incident that took an afternoon to unwind.
If you have dealt with image reference issues in Docker before, the troubleshooting steps overlap. See Fix: Docker Image Not Found for more on resolving image name and registry URL problems.
Kubernetes Version History for Image Pulling
The image-pull surface in Kubernetes has changed enough over the past few years that the right fix depends on which version your cluster runs. The table below lines up the milestones I track on incidents.
| Kubernetes version | Released | What changed for image pulling |
|---|---|---|
| 1.24 | May 2022 | dockershim removed. Nodes must use containerd, CRI-O, or another CRI-compliant runtime. Many pre-1.24 troubleshooting guides reference Docker daemon commands that no longer apply on modern clusters. |
| 1.26 | Dec 2022 | Kubelet CredentialProvider plugins reach stable. ECR, GCR, ACR, and custom registry auth can now use short-lived tokens minted on each pull, replacing static imagePullSecrets with their 12-hour expiry problem. |
| 1.29 | Dec 2023 | SidecarContainers feature reaches stable. Worth knowing because the pull behavior of restartable init containers (sidecars) differs subtly from regular init containers, and “pod stuck because sidecar will not pull” is a new failure shape. |
| 1.31 | Aug 2024 | ImageVolume source (alpha). Pods can mount OCI images as read-only volumes without running them as containers. Opens a new path for shipping ML model weights and shared assets, with its own pull-related failure modes. |
Cross-check exact dates and feature stage transitions against the official Kubernetes changelog before relying on them for a specific patch version; the table above tracks the milestones I have personally referenced on incidents.
The single most important date in the table for this bug is 1.24 (dockershim removal). If a tutorial tells you to “SSH into the node and run docker pull,” and your cluster is on 1.24 or later, that command will not exist. The equivalent is crictl pull on containerd or crictl pull on CRI-O. Pre-1.24 advice that assumes a Docker daemon on every node is silently outdated.
How Other Tools Handle This
Kubernetes is not the only container platform that fails on image pulls, but each runtime surfaces the error differently, and the recovery path varies.
Docker (docker run) fails immediately with Error response from daemon: pull access denied or manifest unknown and exits non-zero. There is no back-off loop. You re-run the command after fixing credentials with docker login. Credentials live in ~/.docker/config.json as base64 (or in a credential helper like osxkeychain, wincred, or secretservice). Kubernetes essentially wraps this file into a dockerconfigjson secret.
Podman behaves like Docker but stores credentials in ${XDG_RUNTIME_DIR}/containers/auth.json by default. A rootless Podman pull failing with errors: denied usually means the user-scoped auth file is missing. Running podman login writes the right path. Unlike Docker, Podman has no daemon, so a pull failure is purely a client-side issue and cannot be caused by daemon state.
containerd via ctr uses ctr image pull --user user:password. It does not read Docker’s config file by default, so a working docker pull does not guarantee a working ctr pull on the same machine. When Kubernetes uses containerd as its CRI, the kubelet hands credentials to containerd through the CRI interface; the on-disk auth files are bypassed entirely.
CRI-O reads /etc/containers/auth.json and ${XDG_RUNTIME_DIR}/containers/auth.json, plus per-registry policy in /etc/containers/policy.json that can outright block unsigned images. An ImagePullBackOff on a CRI-O node sometimes traces back to a signature policy denial rather than auth. Check the kubelet logs for policy rejected messages.
Image pull secrets vs ServiceAccount tokens vs registry mirrors. imagePullSecrets on a pod and on the pod’s ServiceAccount are unioned: both lists are tried. For cloud registries (ECR, GCR, ACR), the cleaner path since Kubernetes 1.26 is the CredentialProvider plugin interface: the kubelet calls an external binary (like ecr-credential-provider) that mints a short-lived token on every pull. This avoids the 12-hour ECR token expiry problem entirely. For repeated pulls of the same image across many nodes, a registry mirror (Harbor, AWS ECR pull-through cache, Google Artifact Registry remote repository) eliminates rate-limit and bandwidth issues at the cluster level rather than per-pod.
Fix 2: Configure imagePullSecrets for Private Registries
If your image is in a private registry (Docker Hub private repos, AWS ECR, GCR, Azure ACR, GitHub Container Registry, or a self-hosted registry), Kubernetes needs credentials to pull it.
First, confirm this is the issue. Run kubectl describe pod and look for an error like:
Failed to pull image "registry.example.com/myapp:v1": unauthorized: authentication requiredor:
pull access denied for myapp, repository does not exist or may require 'docker login'Step 1: Create a Docker registry secret:
kubectl create secret docker-registry my-registry-creds \
--docker-server=registry.example.com \
--docker-username=myuser \
--docker-password=mypassword \
--docker-email=myuser@example.com \
-n my-namespaceFor Docker Hub, use https://index.docker.io/v1/ as the server:
kubectl create secret docker-registry dockerhub-creds \
--docker-server=https://index.docker.io/v1/ \
--docker-username=myuser \
--docker-password=mypassword \
-n my-namespaceStep 2: Reference the secret in your pod spec:
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: myapp
image: registry.example.com/myapp:v1.2.0
imagePullSecrets:
- name: my-registry-credsFor Deployments, the imagePullSecrets goes inside the spec.template.spec:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: registry.example.com/myapp:v1.2.0
imagePullSecrets:
- name: my-registry-credsStep 3: Verify the secret exists in the correct namespace:
kubectl get secret my-registry-creds -n my-namespaceCommon Mistake: The secret must be in the same namespace as the pod. If your pod is in production but the secret is in default, the pull will fail with the same authentication error. Secrets are namespace-scoped, so they do not span across namespaces.
If you want every pod in a namespace to use the same registry credentials without adding imagePullSecrets to every spec, attach the secret to the namespace’s default service account:
kubectl patch serviceaccount default -n my-namespace \
-p '{"imagePullSecrets": [{"name": "my-registry-creds"}]}'AWS ECR, GCR, and Azure ACR
Cloud-managed registries have their own authentication methods:
AWS ECR tokens expire every 12 hours. You need a cron job or a controller like ecr-credential-helper to refresh the secret. Alternatively, use IAM roles for service accounts (IRSA) with the ECR pull policy attached to the node’s IAM role.
GCR / Artifact Registry works automatically if your GKE nodes have the cloud-platform scope or the appropriate IAM permissions. For non-GKE clusters, create a service account key and use it as a Docker registry secret.
Azure ACR integrates with AKS through managed identity. For non-AKS clusters, create a service principal and use it as registry credentials.
Fix 3: Handle Docker Hub Rate Limits
Docker Hub enforces pull rate limits:
- Anonymous users: 100 pulls per 6 hours per IP
- Authenticated free users: 200 pulls per 6 hours
- Paid subscriptions: Higher or unlimited
If your cluster has many nodes pulling images through the same public IP (common with NAT gateways), you hit these limits fast.
The error in kubectl describe pod looks like:
Failed to pull image "nginx:latest": toomanyrequests: You have reached your pull rate limitSolutions:
Authenticate to Docker Hub even for public images: this raises your limit to 200 pulls. Create a secret and add
imagePullSecretsas shown in Fix 2.Use a pull-through cache: Set up a registry mirror (like Harbor or a cloud-provider registry proxy) that caches Docker Hub images. This reduces direct pulls to Docker Hub significantly.
Pre-pull images on nodes: Use a DaemonSet to pull commonly used images to every node during off-peak hours. Once cached locally, the kubelet does not need to pull again (unless
imagePullPolicy: Alwaysis set).Switch registries: Many popular images are mirrored on other registries. For example, use
gcr.io/google-containers/nginxorpublic.ecr.aws/nginx/nginxinstead of pulling from Docker Hub directly.
Fix 4: Fix Network Issues Blocking the Pull
If the node cannot reach the container registry over the network, every pull fails. This is common in air-gapped environments, clusters behind corporate proxies, or when network policies are too restrictive.
Check if the node can reach the registry:
# SSH into the node or use a debug pod
kubectl run debug --rm -it --image=busybox -- sh
# Inside the debug pod, test connectivity
wget -qO- https://registry.example.com/v2/ --timeout=5If this times out, investigate:
- Firewall rules: Ensure the node’s egress allows HTTPS traffic (port 443) to the registry. In cloud environments, check security groups and firewall rules.
- Proxy configuration: If your cluster requires an HTTP proxy, configure the container runtime (Docker or containerd) to use it. For containerd, add proxy settings in
/etc/systemd/system/containerd.service.d/http-proxy.conf. - Network policies: A Kubernetes NetworkPolicy might be blocking egress from the pod’s namespace. Check with:
kubectl get networkpolicies -n my-namespaceIf a policy exists, make sure it allows egress to the registry’s IP or domain.
- DNS resolution: The node must resolve the registry hostname. Test with:
nslookup registry.example.comIf DNS fails, check the node’s /etc/resolv.conf and any custom CoreDNS configuration.
If broader cluster connectivity is failing too, the registry pull is downstream of that — fix the node-to-control-plane network first, then revisit the image pull.
Fix 5: Fix Architecture Mismatches (arm64 vs amd64)
You pull an image that exists, the credentials are correct, but the pull still fails. The error might say:
no matching manifest for linux/arm64 in the manifest list entriesThis happens when the image was built only for one CPU architecture (usually amd64) but your node runs a different one (usually arm64). This is increasingly common with the rise of ARM-based nodes like AWS Graviton, Apple Silicon development environments, and ARM-based cloud instances.
Check your node’s architecture:
kubectl get nodes -o wideLook at the ARCH column. Or inspect a specific node:
kubectl describe node <node-name> | grep -i archYou see something like kubernetes.io/arch=arm64.
Solutions:
- Use multi-arch images: Many popular images support multiple architectures. Check the image’s registry page or inspect the manifest:
docker manifest inspect nginx:latestThis shows which platforms the image supports. If linux/arm64 is listed, the image works on ARM nodes.
- Build your image for multiple architectures using Docker Buildx:
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:v1.2.0 --push .- Use node affinity to schedule the pod only on nodes with the matching architecture:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64- Add a nodeSelector for simpler cases:
spec:
nodeSelector:
kubernetes.io/arch: amd64Fix 6: Set the Correct imagePullPolicy for Local Images
If you built an image locally (for example during development with Minikube or kind) and the image only exists on the node, not in any remote registry, Kubernetes might still try to pull it.
The imagePullPolicy controls this behavior:
Always: Always pull from the registry. This is the default when using thelatesttag.IfNotPresent: Only pull if the image is not already on the node. This is the default for images with a specific tag (e.g.,myapp:v1.2.0).Never: Never pull. Only use the local image.
If you are using a local image with the latest tag, Kubernetes defaults to Always and tries to pull from a registry — which fails because the image is not there.
Fix it by setting the policy explicitly:
spec:
containers:
- name: myapp
image: myapp:latest
imagePullPolicy: Never # or IfNotPresentFor Minikube, point your Docker client to Minikube’s Docker daemon so images you build are available inside the cluster:
eval $(minikube docker-env)
docker build -t myapp:latest .For kind, load images into the cluster:
kind load docker-image myapp:latest --name my-clusterIn production I set imagePullPolicy: IfNotPresent with pinned version tags as the default, because it is the only combination where I can reason about which image is actually running on the node. Never is only sensible for local development. Always with latest in production is what produces those “the deployment worked last hour but is now ImagePullBackOff” incidents when the registry has a transient outage, because the pull happens on every pod restart instead of using the local cache.
Fix 7: Debug with kubectl describe pod
When none of the above fixes are obvious, kubectl describe pod is your best diagnostic tool. It shows the full event history and reveals exactly what went wrong.
kubectl describe pod myapp-7c4b6d9f8-k3m2nFocus on three sections:
1. The Containers section — shows the exact image reference being used:
Containers:
myapp:
Image: registry.example.com/myapp:v1.2.0
Image ID:
State: Waiting
Reason: ImagePullBackOff2. The Events section — shows the timeline of what happened:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned default/myapp to node-1
Normal Pulling 90s (x4 over 2m) kubelet Pulling image "registry.example.com/myapp:v1.2.0"
Warning Failed 89s (x4 over 2m) kubelet Failed to pull image "registry.example.com/myapp:v1.2.0": ...
Warning Failed 89s (x4 over 2m) kubelet Error: ErrImagePull
Normal BackOff 65s (x6 over 2m) kubelet Back-off pulling image "registry.example.com/myapp:v1.2.0"
Warning Failed 65s (x6 over 2m) kubelet Error: ImagePullBackOffThe Message column contains the actual error from the container runtime. Read it carefully — it tells you if the problem is authentication, a missing tag, a network timeout, or something else.
3. The imagePullSecrets field — check if secrets are attached:
...
Image Pull Secrets: my-registry-creds
...If this is empty and you are pulling from a private registry, that is your problem.
Additional debugging commands:
Check if the secret is correct by decoding it:
kubectl get secret my-registry-creds -o jsonpath='{.data.\.dockerconfigjson}' | base64 -dThis prints the stored credentials. Verify the server URL, username, and password are correct.
Check events across the namespace for broader patterns:
kubectl get events -n my-namespace --sort-by='.lastTimestamp'Try pulling the image directly on the node (if you have SSH access):
crictl pull registry.example.com/myapp:v1.2.0If crictl pull fails with the same error, the issue is at the container runtime or node level, not Kubernetes. If it succeeds, the problem is likely with imagePullSecrets configuration.
If your pod gets past the image pull but then crashes on startup, the issue is different — see Fix: Kubernetes CrashLoopBackOff for debugging container crashes.
Less Obvious Pull Failures I Have Hit in the Wild
If you have checked everything above and the pod is still stuck in ImagePullBackOff, these are the less obvious causes I have personally tracked down:
Expired credentials: Registry tokens and service account keys expire. ECR tokens last 12 hours. GCR service account keys can be rotated or disabled. Regenerate the credentials and recreate the Kubernetes secret.
Image was deleted from the registry: Someone may have deleted the tag or the entire repository. Check the registry’s web UI or API to confirm the image still exists.
Registry is down: Check the registry’s status page. Docker Hub has had outages. Your private registry’s storage backend might be full or unreachable.
Node disk pressure: If the node’s disk is full, the container runtime cannot download image layers. Check node conditions:
kubectl describe node <node-name> | grep -i pressureIf DiskPressure is True, free up space or add more disk.
Containerd or Docker daemon issues: Restart the container runtime on the node:
sudo systemctl restart containerdImage manifest corruption: Rarely, an image manifest in the registry can be corrupted. Try pushing the image again with a new tag and updating your deployment.
Pod security policies or admission webhooks: A webhook might be mutating or rejecting the pod spec before the kubelet sees it. Check for any admission controllers that modify image references:
kubectl get mutatingwebhookconfigurations
kubectl get validatingwebhookconfigurationsImage too large for the runtime’s pull timeout: Very large images (multi-gigabyte ML model containers, for example) can exceed the kubelet’s runtime-request-timeout (default 2 minutes for some operations) on slow networks. The pull starts, runs, then aborts midway. Increase the timeout in the kubelet config or split the image into a smaller base plus a sidecar that pulls model weights at startup.
Container runtime garbage collection mid-pull: If the node is under disk pressure, the runtime may garbage-collect layers from a previous (related) image while the new pull is still resolving shared blobs. This shows up as failed to register layer or layer does not exist. Free disk first, then retry.
hostNetwork: true and custom DNS: Pods using hostNetwork: true inherit the node’s /etc/resolv.conf instead of the cluster’s. If the node points at an internal DNS server that does not resolve your registry hostname, the pull will fail even though regular pods on the same node succeed.
What Other Tutorials Get Wrong About ImagePullBackOff
Most tutorials I have read on this error list the same fixes but frame them in ways that mislead. The gaps I see most often:
They recommend docker login on the node as the fix. This was correct on pre-1.24 clusters that used Docker as the container runtime. After dockershim removal in Kubernetes 1.24, modern clusters run containerd or CRI-O and there is no Docker daemon on the node. The equivalent commands are crictl pull and per-runtime auth files (containerd reads /etc/containerd/config.toml; CRI-O reads /etc/containers/auth.json). Tutorials that have not updated for the dockershim removal silently mislead readers on every modern cluster.
They suggest imagePullPolicy: Never as a workaround. This only works for images already loaded on the node (Minikube, kind, or pre-pulled via a DaemonSet). In production it produces the worse failure mode of pods scheduling on nodes that do not have the image yet and failing silently in a different way. Anti-pattern outside local development.
They treat ErrImagePull and ImagePullBackOff as different problems. They are the same problem at different stages: ErrImagePull is the most recent attempt failing, ImagePullBackOff is the back-off state. The fix is identical. Articles that separate them confuse readers into thinking two different things are happening.
They skip the namespace scoping of imagePullSecrets. This is the single most common silent failure I have seen on private-registry pulls. The secret exists in default and the pod is in production; nothing in the error message hints at the namespace mismatch, and the fix (recreate the secret per namespace, or attach to the namespace’s default ServiceAccount) is not in the first ten Google results.
They miss the architecture mismatch on Apple Silicon developers. A Mac user builds an image locally with docker build (default platform: arm64), pushes it to a registry, and watches it fail on amd64 cluster nodes with no matching manifest for linux/amd64. The fix is multi-platform builds with docker buildx, not anything in the cluster. Many tutorials skip this entirely because they were written before Apple Silicon was common.
They omit the kubelet CredentialProvider plugin as the modern alternative to static imagePullSecrets. For ECR, GCR, and ACR, the credential provider plugin (stable since 1.26) is the cloud-native path. Static secrets work but require manual or scripted rotation; the credential provider mints fresh tokens on every pull. Articles that only show imagePullSecrets are stuck in the 2019 mental model.
Frequently Asked Questions
What is the difference between ErrImagePull and ImagePullBackOff?
They are two states of the same problem. ErrImagePull is the kubelet reporting that the most recent pull attempt failed. ImagePullBackOff is the kubelet pausing before retrying, with an exponentially growing delay (10s, 20s, 40s, up to 5 minutes). A pod can flip between the two as it retries: each retry shows ErrImagePull again briefly, then returns to ImagePullBackOff while it waits for the next attempt. The fix is the same for both.
What is the difference between Always, IfNotPresent, and Never for imagePullPolicy?
Always makes the kubelet contact the registry on every pod start to check whether the image has changed (it then downloads only changed layers). This is the default for the latest tag. IfNotPresent only pulls when the image is not on the node, and is the default for pinned tags like v1.2.0. Never skips the pull entirely and fails if the image is not present locally; this is only useful for Minikube / kind local-development workflows where you load images directly into the node’s runtime. In production I default to pinned tags with the implicit IfNotPresent policy.
Why does my pod work but my teammate’s identical-looking pod does not?
Three usual suspects, in roughly the order I check them:
- Different namespace (the
imagePullSecretexists in yours but not theirs). - Different node (yours has the image cached from a previous pull; theirs has not pulled yet and the credentials are wrong).
- Different architecture (yours is on an amd64 node, theirs is on an arm64 node, and the image is not multi-arch).
kubectl get pod -o wide shows the node, and kubectl describe pod shows the actual pull error. Comparing those between the two pods identifies the cause in under a minute.
Should I use imagePullSecrets or the kubelet CredentialProvider plugin?
For ECR, GCR, and ACR, use the CredentialProvider plugin whenever you are on Kubernetes 1.26 or later. It eliminates the manual or scripted rotation of static secrets and avoids the 12-hour ECR token expiry problem. For self-hosted registries with long-lived credentials (Harbor, Nexus, GitHub Container Registry with a PAT), static imagePullSecrets are still fine. The two are not mutually exclusive: you can have both in the same cluster.
How do I cache images across nodes to avoid rate limits?
Three patterns, each with different trade-offs:
- Registry mirror (Harbor, AWS ECR pull-through cache, Google Artifact Registry remote repository): the cluster pulls once from upstream, then serves to every node from the mirror. Best for shared base images like
nginxoralpine. - DaemonSet pre-pull: a DaemonSet runs on every node and pulls the images you know you will need at off-peak times. Works for a small static set of images; does not scale to dynamic workloads.
- Image cloning at deploy time: your CI pushes the image to your own registry instead of pulling from upstream at pod start. Sidesteps rate limits entirely but moves the bandwidth cost to your registry.
Most production clusters end up with a combination of all three.
Does this happen the same way on OpenShift, GKE, EKS, AKS?
The error itself is identical across all managed Kubernetes distributions because it comes from the kubelet, which is shared code. The differences are in how each distribution handles registry auth: GKE has Workload Identity for GCR/Artifact Registry, EKS has IRSA for ECR, AKS has managed identity for ACR. OpenShift adds its own ImageStream abstraction that introduces an extra layer of redirection. The fixes in this article apply unchanged; the cloud-specific credential setup differs and is best read from each provider’s docs.
If you are also having trouble with your kubectl context configuration while debugging, see Fix: kubectl context not found. For Docker socket permission issues that might affect local builds, check Fix: Docker Permission Denied Socket.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Kubernetes Pod CrashLoopBackOff (Back-off restarting failed container)
How to fix the Kubernetes CrashLoopBackOff error when a pod repeatedly crashes and Kubernetes keeps restarting it with increasing back-off delays.
Fix: YAML 'mapping values are not allowed here' and Other YAML Syntax Errors
How to fix 'mapping values are not allowed here', 'could not find expected :', 'did not find expected key', and other YAML indentation and syntax errors in Docker Compose, Kubernetes manifests, GitHub Actions, and config files.
Fix: Docker Container Exited (137) OOMKilled / Killed Signal 9
How to fix Docker container 'Exited (137)', OOMKilled, and 'Killed' signal 9 errors caused by out-of-memory conditions in Docker, Docker Compose, and Kubernetes.
Fix: The Connection to the Server localhost:8080 Was Refused (kubectl)
How to fix 'the connection to the server localhost:8080 was refused' and other kubectl connection errors when the Kubernetes API server is unreachable.