Fix: Kubernetes ImagePullBackOff - Failed to Pull Image

Q: How do I fix "Kubernetes ImagePullBackOff - Failed to Pull Image"?

How to fix the Kubernetes ImagePullBackOff and ErrImagePull errors when a pod fails to pull a container image from a registry.

The Pod That Will Not Start

I have triaged this in two flavors: the first deployment of a new cluster (where it is almost always a private-registry credential issue) and a deployment that worked yesterday but suddenly fails (where it is almost always a tag that was overwritten, a registry rate limit, or a node that lost its credential refresh). Both classes look identical in kubectl get pods. The diagnostic question I ask first is always: did this image ever pull successfully on this cluster? You deploy a pod to Kubernetes. It never starts. You check the status:

$ kubectl get pods
NAME                     READY   STATUS             RESTARTS   AGE
myapp-7c4b6d9f8-k3m2n   0/1     ImagePullBackOff   0          2m15s

Or you see the closely related status:

$ kubectl get pods
NAME                     READY   STATUS         RESTARTS   AGE
myapp-7c4b6d9f8-k3m2n   0/1     ErrImagePull   0          45s

You describe the pod and find something like:

$ kubectl describe pod myapp-7c4b6d9f8-k3m2n
...
Events:
  Warning  Failed     12s (x3 over 58s)  kubelet  Failed to pull image "myapp:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied
  Warning  Failed     12s (x3 over 58s)  kubelet  Error: ErrImagePull
  Normal   BackOff    1s (x4 over 57s)   kubelet  Back-off pulling image "myapp:latest"
  Warning  Failed     1s (x4 over 57s)   kubelet  Error: ImagePullBackOff

The pod is stuck. Kubernetes cannot pull the container image, and after repeated failures it backs off with increasing delays before retrying.

Quick Reference Before You Dive In

If you arrived here from Google with a stuck pod, the five facts that resolve roughly 90 percent of the cases I have triaged:

ErrImagePull and ImagePullBackOff are the same problem at different stages. The former is the most recent pull attempt failing; the latter is the back-off state after several failures. The fix is identical.
Run kubectl describe pod <name> first. The Events section contains the literal error message from the kubelet, which is far more diagnostic than the status field alone. Status tells you “something is wrong”; Events tells you what.
Almost every cause falls into one of four buckets: wrong image reference (typo, tag, missing registry prefix), missing or wrong authentication (imagePullSecrets), network reachability (firewall, proxy, DNS), or CPU architecture mismatch (most often arm64 nodes pulling amd64-only images).
imagePullSecrets are namespace-scoped. A secret in default does not work for a pod in production. Recreate it in every namespace that needs it, or attach it to the namespace’s default ServiceAccount.
For ECR/GCR/ACR the cloud-native path is the kubelet CredentialProvider plugin (stable since Kubernetes 1.26), not static imagePullSecrets. The plugin mints short-lived tokens automatically and avoids the 12-hour ECR token expiry problem entirely.

The rest of this article walks through each of those in detail, plus the failure modes most other guides skip.

What the Kubelet Is Actually Doing

ImagePullBackOff is Kubernetes telling you that the kubelet tried to pull a container image and failed. After the initial failure (ErrImagePull), Kubernetes applies an exponential back-off (10s, 20s, 40s, up to 5 minutes) before retrying. The pod stays in this state until the pull succeeds or you fix the underlying problem. The official image-pulling reference on kubernetes.io covers the full state machine.

The kubelet on the node is responsible for pulling images. When you create a pod, the kubelet checks if the image already exists locally. If it does not (or if imagePullPolicy forces a pull), it contacts the container registry, authenticates if needed, and downloads the image layers. Any failure in this chain results in ErrImagePull, which quickly becomes ImagePullBackOff after a few retries.

The root cause is always one of these:

Wrong image name or tag — a typo, a tag that does not exist, or a missing registry prefix
Private registry without credentials — the registry requires authentication and the pod has no imagePullSecrets
Docker Hub rate limits — anonymous or free-tier pulls exceeded the rate limit
Network issues — the node cannot reach the registry due to firewalls, proxies, or network policies
Architecture mismatch — the image exists but not for the node’s CPU architecture (e.g., arm64 vs amd64)
Local image with wrong pull policy — the image exists on the node but Kubernetes tries to pull it from a remote registry anyway

Fix 1: Check the Image Name and Tag

This is the most common cause. A single typo in the image name, registry URL, or tag breaks the pull.

Run kubectl describe pod and look at the exact image reference in the error:

kubectl describe pod myapp-7c4b6d9f8-k3m2n | grep -i image

You see something like:

Image:         myapp:latest

Check for these common mistakes:

Typos in the image name: ngixn instead of nginx, postgress instead of postgres
Missing registry prefix: If your image is on a private registry like ghcr.io or registry.example.com, you need the full path: ghcr.io/myorg/myapp:v1.2.0
Tag does not exist: You specified myapp:v2.0 but only v2.0.0 exists. Tags are case-sensitive.
Using latest when no latest tag exists: Not every image has a latest tag. Some projects only publish versioned tags.

Verify the image exists by pulling it manually on your local machine:

docker pull myapp:latest

If it fails locally, the image reference is wrong. Fix it in your deployment:

spec:
  containers:
    - name: myapp
      image: registry.example.com/myapp:v1.2.0  # full path with valid tag

Apply the fix:

kubectl apply -f deployment.yaml

My standing rule on every cluster I manage: no latest tag in production manifests, ever. The tag is ambiguous and rollbacks become guesswork because you cannot tell which build is actually running. I pin to either a semantic version (v1.2.0) or, for the strict cases, a SHA digest (myapp@sha256:abc123...). The digest is the only form that survives an upstream registry tag overwrite, which I have seen happen during a contractor incident that took an afternoon to unwind.

If you have dealt with image reference issues in Docker before, the troubleshooting steps overlap. See Fix: Docker Image Not Found for more on resolving image name and registry URL problems.

Kubernetes Version History for Image Pulling

The image-pull surface in Kubernetes has changed enough over the past few years that the right fix depends on which version your cluster runs. The table below lines up the milestones I track on incidents.

Kubernetes version	Released	What changed for image pulling
1.24	May 2022	dockershim removed. Nodes must use containerd, CRI-O, or another CRI-compliant runtime. Many pre-1.24 troubleshooting guides reference Docker daemon commands that no longer apply on modern clusters.
1.26	Dec 2022	Kubelet `CredentialProvider` plugins reach stable. ECR, GCR, ACR, and custom registry auth can now use short-lived tokens minted on each pull, replacing static `imagePullSecrets` with their 12-hour expiry problem.
1.29	Dec 2023	`SidecarContainers` feature reaches stable. Worth knowing because the pull behavior of restartable init containers (sidecars) differs subtly from regular init containers, and “pod stuck because sidecar will not pull” is a new failure shape.
1.31	Aug 2024	`ImageVolume` source (alpha). Pods can mount OCI images as read-only volumes without running them as containers. Opens a new path for shipping ML model weights and shared assets, with its own pull-related failure modes.

Cross-check exact dates and feature stage transitions against the official Kubernetes changelog before relying on them for a specific patch version; the table above tracks the milestones I have personally referenced on incidents.

The single most important date in the table for this bug is 1.24 (dockershim removal). If a tutorial tells you to “SSH into the node and run docker pull,” and your cluster is on 1.24 or later, that command will not exist. The equivalent is crictl pull on containerd or crictl pull on CRI-O. Pre-1.24 advice that assumes a Docker daemon on every node is silently outdated.

How Other Tools Handle This

Kubernetes is not the only container platform that fails on image pulls, but each runtime surfaces the error differently, and the recovery path varies.

Docker (docker run) fails immediately with Error response from daemon: pull access denied or manifest unknown and exits non-zero. There is no back-off loop. You re-run the command after fixing credentials with docker login. Credentials live in ~/.docker/config.json as base64 (or in a credential helper like osxkeychain, wincred, or secretservice). Kubernetes essentially wraps this file into a dockerconfigjson secret.

Podman behaves like Docker but stores credentials in ${XDG_RUNTIME_DIR}/containers/auth.json by default. A rootless Podman pull failing with errors: denied usually means the user-scoped auth file is missing. Running podman login writes the right path. Unlike Docker, Podman has no daemon, so a pull failure is purely a client-side issue and cannot be caused by daemon state.

containerd via ctr uses ctr image pull --user user:password. It does not read Docker’s config file by default, so a working docker pull does not guarantee a working ctr pull on the same machine. When Kubernetes uses containerd as its CRI, the kubelet hands credentials to containerd through the CRI interface; the on-disk auth files are bypassed entirely.

CRI-O reads /etc/containers/auth.json and ${XDG_RUNTIME_DIR}/containers/auth.json, plus per-registry policy in /etc/containers/policy.json that can outright block unsigned images. An ImagePullBackOff on a CRI-O node sometimes traces back to a signature policy denial rather than auth. Check the kubelet logs for policy rejected messages.

Image pull secrets vs ServiceAccount tokens vs registry mirrors. imagePullSecrets on a pod and on the pod’s ServiceAccount are unioned: both lists are tried. For cloud registries (ECR, GCR, ACR), the cleaner path since Kubernetes 1.26 is the CredentialProvider plugin interface: the kubelet calls an external binary (like ecr-credential-provider) that mints a short-lived token on every pull. This avoids the 12-hour ECR token expiry problem entirely. For repeated pulls of the same image across many nodes, a registry mirror (Harbor, AWS ECR pull-through cache, Google Artifact Registry remote repository) eliminates rate-limit and bandwidth issues at the cluster level rather than per-pod.

Fix 2: Configure imagePullSecrets for Private Registries

If your image is in a private registry (Docker Hub private repos, AWS ECR, GCR, Azure ACR, GitHub Container Registry, or a self-hosted registry), Kubernetes needs credentials to pull it.

First, confirm this is the issue. Run kubectl describe pod and look for an error like:

Failed to pull image "registry.example.com/myapp:v1": unauthorized: authentication required

or:

pull access denied for myapp, repository does not exist or may require 'docker login'

Step 1: Create a Docker registry secret:

kubectl create secret docker-registry my-registry-creds \
  --docker-server=registry.example.com \
  --docker-username=myuser \
  --docker-password=mypassword \
  --docker-email=myuser@example.com \
  -n my-namespace

For Docker Hub, use https://index.docker.io/v1/ as the server:

kubectl create secret docker-registry dockerhub-creds \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=myuser \
  --docker-password=mypassword \
  -n my-namespace

Step 2: Reference the secret in your pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
    - name: myapp
      image: registry.example.com/myapp:v1.2.0
  imagePullSecrets:
    - name: my-registry-creds

For Deployments, the imagePullSecrets goes inside the spec.template.spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: registry.example.com/myapp:v1.2.0
      imagePullSecrets:
        - name: my-registry-creds

Step 3: Verify the secret exists in the correct namespace:

kubectl get secret my-registry-creds -n my-namespace

Common Mistake: The secret must be in the same namespace as the pod. If your pod is in production but the secret is in default, the pull will fail with the same authentication error. Secrets are namespace-scoped, so they do not span across namespaces.

If you want every pod in a namespace to use the same registry credentials without adding imagePullSecrets to every spec, attach the secret to the namespace’s default service account:

kubectl patch serviceaccount default -n my-namespace \
  -p '{"imagePullSecrets": [{"name": "my-registry-creds"}]}'

AWS ECR, GCR, and Azure ACR

Cloud-managed registries have their own authentication methods:

AWS ECR tokens expire every 12 hours. You need a cron job or a controller like ecr-credential-helper to refresh the secret. Alternatively, use IAM roles for service accounts (IRSA) with the ECR pull policy attached to the node’s IAM role.

GCR / Artifact Registry works automatically if your GKE nodes have the cloud-platform scope or the appropriate IAM permissions. For non-GKE clusters, create a service account key and use it as a Docker registry secret.

Azure ACR integrates with AKS through managed identity. For non-AKS clusters, create a service principal and use it as registry credentials.

Fix 3: Handle Docker Hub Rate Limits

Docker Hub enforces pull rate limits:

Anonymous users: 100 pulls per 6 hours per IP
Authenticated free users: 200 pulls per 6 hours
Paid subscriptions: Higher or unlimited

If your cluster has many nodes pulling images through the same public IP (common with NAT gateways), you hit these limits fast.

The error in kubectl describe pod looks like:

Failed to pull image "nginx:latest": toomanyrequests: You have reached your pull rate limit

Solutions:

Authenticate to Docker Hub even for public images: this raises your limit to 200 pulls. Create a secret and add imagePullSecrets as shown in Fix 2.
Use a pull-through cache: Set up a registry mirror (like Harbor or a cloud-provider registry proxy) that caches Docker Hub images. This reduces direct pulls to Docker Hub significantly.
Pre-pull images on nodes: Use a DaemonSet to pull commonly used images to every node during off-peak hours. Once cached locally, the kubelet does not need to pull again (unless imagePullPolicy: Always is set).
Switch registries: Many popular images are mirrored on other registries. For example, use gcr.io/google-containers/nginx or public.ecr.aws/nginx/nginx instead of pulling from Docker Hub directly.

Fix 4: Fix Network Issues Blocking the Pull

If the node cannot reach the container registry over the network, every pull fails. This is common in air-gapped environments, clusters behind corporate proxies, or when network policies are too restrictive.

Check if the node can reach the registry:

# SSH into the node or use a debug pod
kubectl run debug --rm -it --image=busybox -- sh

# Inside the debug pod, test connectivity
wget -qO- https://registry.example.com/v2/ --timeout=5

If this times out, investigate:

Firewall rules: Ensure the node’s egress allows HTTPS traffic (port 443) to the registry. In cloud environments, check security groups and firewall rules.
Proxy configuration: If your cluster requires an HTTP proxy, configure the container runtime (Docker or containerd) to use it. For containerd, add proxy settings in /etc/systemd/system/containerd.service.d/http-proxy.conf.
Network policies: A Kubernetes NetworkPolicy might be blocking egress from the pod’s namespace. Check with:

kubectl get networkpolicies -n my-namespace

If a policy exists, make sure it allows egress to the registry’s IP or domain.

DNS resolution: The node must resolve the registry hostname. Test with:

nslookup registry.example.com

If DNS fails, check the node’s /etc/resolv.conf and any custom CoreDNS configuration.

If broader cluster connectivity is failing too, the registry pull is downstream of that — fix the node-to-control-plane network first, then revisit the image pull.

Fix 5: Fix Architecture Mismatches (arm64 vs amd64)

You pull an image that exists, the credentials are correct, but the pull still fails. The error might say:

no matching manifest for linux/arm64 in the manifest list entries

This happens when the image was built only for one CPU architecture (usually amd64) but your node runs a different one (usually arm64). This is increasingly common with the rise of ARM-based nodes like AWS Graviton, Apple Silicon development environments, and ARM-based cloud instances.

Check your node’s architecture:

kubectl get nodes -o wide

Look at the ARCH column. Or inspect a specific node:

kubectl describe node <node-name> | grep -i arch

You see something like kubernetes.io/arch=arm64.

Solutions:

Use multi-arch images: Many popular images support multiple architectures. Check the image’s registry page or inspect the manifest:

docker manifest inspect nginx:latest

This shows which platforms the image supports. If linux/arm64 is listed, the image works on ARM nodes.

Build your image for multiple architectures using Docker Buildx:

docker buildx build --platform linux/amd64,linux/arm64 -t myapp:v1.2.0 --push .

Use node affinity to schedule the pod only on nodes with the matching architecture:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - amd64

Add a nodeSelector for simpler cases:

spec:
  nodeSelector:
    kubernetes.io/arch: amd64

Fix 6: Set the Correct imagePullPolicy for Local Images

If you built an image locally (for example during development with Minikube or kind) and the image only exists on the node, not in any remote registry, Kubernetes might still try to pull it.

The imagePullPolicy controls this behavior:

Always: Always pull from the registry. This is the default when using the latest tag.
IfNotPresent: Only pull if the image is not already on the node. This is the default for images with a specific tag (e.g., myapp:v1.2.0).
Never: Never pull. Only use the local image.

If you are using a local image with the latest tag, Kubernetes defaults to Always and tries to pull from a registry — which fails because the image is not there.

Fix it by setting the policy explicitly:

spec:
  containers:
    - name: myapp
      image: myapp:latest
      imagePullPolicy: Never       # or IfNotPresent

For Minikube, point your Docker client to Minikube’s Docker daemon so images you build are available inside the cluster:

eval $(minikube docker-env)
docker build -t myapp:latest .

For kind, load images into the cluster:

kind load docker-image myapp:latest --name my-cluster

In production I set imagePullPolicy: IfNotPresent with pinned version tags as the default, because it is the only combination where I can reason about which image is actually running on the node. Never is only sensible for local development. Always with latest in production is what produces those “the deployment worked last hour but is now ImagePullBackOff” incidents when the registry has a transient outage, because the pull happens on every pod restart instead of using the local cache.

Fix 7: Debug with kubectl describe pod

When none of the above fixes are obvious, kubectl describe pod is your best diagnostic tool. It shows the full event history and reveals exactly what went wrong.

kubectl describe pod myapp-7c4b6d9f8-k3m2n

Focus on three sections:

1. The Containers section — shows the exact image reference being used:

Containers:
  myapp:
    Image:          registry.example.com/myapp:v1.2.0
    Image ID:
    State:          Waiting
      Reason:       ImagePullBackOff

2. The Events section — shows the timeline of what happened:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  2m                 default-scheduler  Successfully assigned default/myapp to node-1
  Normal   Pulling    90s (x4 over 2m)   kubelet            Pulling image "registry.example.com/myapp:v1.2.0"
  Warning  Failed     89s (x4 over 2m)   kubelet            Failed to pull image "registry.example.com/myapp:v1.2.0": ...
  Warning  Failed     89s (x4 over 2m)   kubelet            Error: ErrImagePull
  Normal   BackOff    65s (x6 over 2m)   kubelet            Back-off pulling image "registry.example.com/myapp:v1.2.0"
  Warning  Failed     65s (x6 over 2m)   kubelet            Error: ImagePullBackOff

The Message column contains the actual error from the container runtime. Read it carefully — it tells you if the problem is authentication, a missing tag, a network timeout, or something else.

3. The imagePullSecrets field — check if secrets are attached:

...
  Image Pull Secrets:  my-registry-creds
...

If this is empty and you are pulling from a private registry, that is your problem.

Additional debugging commands:

Check if the secret is correct by decoding it:

kubectl get secret my-registry-creds -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d

This prints the stored credentials. Verify the server URL, username, and password are correct.

Check events across the namespace for broader patterns:

kubectl get events -n my-namespace --sort-by='.lastTimestamp'

Try pulling the image directly on the node (if you have SSH access):

crictl pull registry.example.com/myapp:v1.2.0

If crictl pull fails with the same error, the issue is at the container runtime or node level, not Kubernetes. If it succeeds, the problem is likely with imagePullSecrets configuration.

If your pod gets past the image pull but then crashes on startup, the issue is different — see Fix: Kubernetes CrashLoopBackOff for debugging container crashes.

Less Obvious Pull Failures I Have Hit in the Wild

If you have checked everything above and the pod is still stuck in ImagePullBackOff, these are the less obvious causes I have personally tracked down:

Expired credentials: Registry tokens and service account keys expire. ECR tokens last 12 hours. GCR service account keys can be rotated or disabled. Regenerate the credentials and recreate the Kubernetes secret.

Image was deleted from the registry: Someone may have deleted the tag or the entire repository. Check the registry’s web UI or API to confirm the image still exists.

Registry is down: Check the registry’s status page. Docker Hub has had outages. Your private registry’s storage backend might be full or unreachable.

Node disk pressure: If the node’s disk is full, the container runtime cannot download image layers. Check node conditions:

kubectl describe node <node-name> | grep -i pressure

If DiskPressure is True, free up space or add more disk.

Containerd or Docker daemon issues: Restart the container runtime on the node:

sudo systemctl restart containerd

Image manifest corruption: Rarely, an image manifest in the registry can be corrupted. Try pushing the image again with a new tag and updating your deployment.

Pod security policies or admission webhooks: A webhook might be mutating or rejecting the pod spec before the kubelet sees it. Check for any admission controllers that modify image references:

kubectl get mutatingwebhookconfigurations
kubectl get validatingwebhookconfigurations

Image too large for the runtime’s pull timeout: Very large images (multi-gigabyte ML model containers, for example) can exceed the kubelet’s runtime-request-timeout (default 2 minutes for some operations) on slow networks. The pull starts, runs, then aborts midway. Increase the timeout in the kubelet config or split the image into a smaller base plus a sidecar that pulls model weights at startup.

Container runtime garbage collection mid-pull: If the node is under disk pressure, the runtime may garbage-collect layers from a previous (related) image while the new pull is still resolving shared blobs. This shows up as failed to register layer or layer does not exist. Free disk first, then retry.

hostNetwork: true and custom DNS: Pods using hostNetwork: true inherit the node’s /etc/resolv.conf instead of the cluster’s. If the node points at an internal DNS server that does not resolve your registry hostname, the pull will fail even though regular pods on the same node succeed.

What Other Tutorials Get Wrong About ImagePullBackOff

Most tutorials I have read on this error list the same fixes but frame them in ways that mislead. The gaps I see most often:

They recommend docker login on the node as the fix. This was correct on pre-1.24 clusters that used Docker as the container runtime. After dockershim removal in Kubernetes 1.24, modern clusters run containerd or CRI-O and there is no Docker daemon on the node. The equivalent commands are crictl pull and per-runtime auth files (containerd reads /etc/containerd/config.toml; CRI-O reads /etc/containers/auth.json). Tutorials that have not updated for the dockershim removal silently mislead readers on every modern cluster.

They suggest imagePullPolicy: Never as a workaround. This only works for images already loaded on the node (Minikube, kind, or pre-pulled via a DaemonSet). In production it produces the worse failure mode of pods scheduling on nodes that do not have the image yet and failing silently in a different way. Anti-pattern outside local development.

They treat ErrImagePull and ImagePullBackOff as different problems. They are the same problem at different stages: ErrImagePull is the most recent attempt failing, ImagePullBackOff is the back-off state. The fix is identical. Articles that separate them confuse readers into thinking two different things are happening.

They skip the namespace scoping of imagePullSecrets. This is the single most common silent failure I have seen on private-registry pulls. The secret exists in default and the pod is in production; nothing in the error message hints at the namespace mismatch, and the fix (recreate the secret per namespace, or attach to the namespace’s default ServiceAccount) is not in the first ten Google results.

They miss the architecture mismatch on Apple Silicon developers. A Mac user builds an image locally with docker build (default platform: arm64), pushes it to a registry, and watches it fail on amd64 cluster nodes with no matching manifest for linux/amd64. The fix is multi-platform builds with docker buildx, not anything in the cluster. Many tutorials skip this entirely because they were written before Apple Silicon was common.

They omit the kubelet CredentialProvider plugin as the modern alternative to static imagePullSecrets. For ECR, GCR, and ACR, the credential provider plugin (stable since 1.26) is the cloud-native path. Static secrets work but require manual or scripted rotation; the credential provider mints fresh tokens on every pull. Articles that only show imagePullSecrets are stuck in the 2019 mental model.

Frequently Asked Questions

What is the difference between ErrImagePull and ImagePullBackOff?

They are two states of the same problem. ErrImagePull is the kubelet reporting that the most recent pull attempt failed. ImagePullBackOff is the kubelet pausing before retrying, with an exponentially growing delay (10s, 20s, 40s, up to 5 minutes). A pod can flip between the two as it retries: each retry shows ErrImagePull again briefly, then returns to ImagePullBackOff while it waits for the next attempt. The fix is the same for both.

What is the difference between Always, IfNotPresent, and Never for imagePullPolicy?

Always makes the kubelet contact the registry on every pod start to check whether the image has changed (it then downloads only changed layers). This is the default for the latest tag. IfNotPresent only pulls when the image is not on the node, and is the default for pinned tags like v1.2.0. Never skips the pull entirely and fails if the image is not present locally; this is only useful for Minikube / kind local-development workflows where you load images directly into the node’s runtime. In production I default to pinned tags with the implicit IfNotPresent policy.

Why does my pod work but my teammate’s identical-looking pod does not?

Three usual suspects, in roughly the order I check them:

Different namespace (the imagePullSecret exists in yours but not theirs).
Different node (yours has the image cached from a previous pull; theirs has not pulled yet and the credentials are wrong).
Different architecture (yours is on an amd64 node, theirs is on an arm64 node, and the image is not multi-arch).

kubectl get pod -o wide shows the node, and kubectl describe pod shows the actual pull error. Comparing those between the two pods identifies the cause in under a minute.

Should I use imagePullSecrets or the kubelet CredentialProvider plugin?

For ECR, GCR, and ACR, use the CredentialProvider plugin whenever you are on Kubernetes 1.26 or later. It eliminates the manual or scripted rotation of static secrets and avoids the 12-hour ECR token expiry problem. For self-hosted registries with long-lived credentials (Harbor, Nexus, GitHub Container Registry with a PAT), static imagePullSecrets are still fine. The two are not mutually exclusive: you can have both in the same cluster.

How do I cache images across nodes to avoid rate limits?

Three patterns, each with different trade-offs:

Registry mirror (Harbor, AWS ECR pull-through cache, Google Artifact Registry remote repository): the cluster pulls once from upstream, then serves to every node from the mirror. Best for shared base images like nginx or alpine.
DaemonSet pre-pull: a DaemonSet runs on every node and pulls the images you know you will need at off-peak times. Works for a small static set of images; does not scale to dynamic workloads.
Image cloning at deploy time: your CI pushes the image to your own registry instead of pulling from upstream at pod start. Sidesteps rate limits entirely but moves the bandwidth cost to your registry.

Most production clusters end up with a combination of all three.

Does this happen the same way on OpenShift, GKE, EKS, AKS?

The error itself is identical across all managed Kubernetes distributions because it comes from the kubelet, which is shared code. The differences are in how each distribution handles registry auth: GKE has Workload Identity for GCR/Artifact Registry, EKS has IRSA for ECR, AKS has managed identity for ACR. OpenShift adds its own ImageStream abstraction that introduces an extra layer of redirection. The fixes in this article apply unchanged; the cloud-specific credential setup differs and is best read from each provider’s docs.

If you are also having trouble with your kubectl context configuration while debugging, see Fix: kubectl context not found. For Docker socket permission issues that might affect local builds, check Fix: Docker Permission Denied Socket.