Fix: Kubernetes Pod CrashLoopBackOff (Back-off restarting failed container)

The Error

Your Kubernetes pod won’t start. You check the status and see:

$ kubectl get pods
NAME                     READY   STATUS             RESTARTS      AGE
myapp-6d8f7b4c5-x9z2k   0/1     CrashLoopBackOff   5 (47s ago)   4m12s

If you describe the pod, you see:

$ kubectl describe pod myapp-6d8f7b4c5-x9z2k
...
Warning  BackOff  2s (x7 over 3m)  kubelet  Back-off restarting failed container

The pod starts, crashes immediately, and Kubernetes tries to restart it — but each restart takes longer. The back-off delay doubles each time: 10s, 20s, 40s, 80s, up to a maximum of 5 minutes. The pod is stuck in this loop.

Why This Happens

CrashLoopBackOff is not an error by itself. It is Kubernetes telling you that a container keeps crashing and it is applying an exponential back-off delay before restarting it again. The real question is: why is the container crashing?

The container’s process exits with a non-zero exit code (or gets killed by a signal), Kubernetes detects the failure, and the restart policy (restartPolicy: Always by default) kicks in. After repeated failures, Kubernetes slows down the restart attempts to avoid wasting resources.

The root cause is always one of these:

  • The application crashes on startup (unhandled exception, missing dependency, bad config)
  • The container runs out of memory (OOMKilled)
  • A health check (liveness probe) is failing and Kubernetes kills the container
  • The entrypoint or command is wrong
  • Required resources (ConfigMaps, Secrets, services) are missing
  • Init containers are failing before the main container starts
  • The container image is broken or missing

Fix 1: Read the Logs

The first step is always to check the container logs. This tells you what the application printed before it crashed.

Get logs from the current (or most recently crashed) container:

kubectl logs myapp-6d8f7b4c5-x9z2k

If the container has already restarted and you want to see the previous container’s logs:

kubectl logs myapp-6d8f7b4c5-x9z2k --previous

If the pod has multiple containers, specify which one:

kubectl logs myapp-6d8f7b4c5-x9z2k -c myapp

Get more context with describe:

kubectl describe pod myapp-6d8f7b4c5-x9z2k

Look at the Last State section:

Last State:     Terminated
  Reason:       Error
  Exit Code:    1
  Started:      Sun, 13 Apr 2026 10:22:01 +0000
  Finished:     Sun, 13 Apr 2026 10:22:01 +0000

The exit code tells you a lot:

  • Exit code 1 — generic application error. Read the logs.
  • Exit code 137 — the container was killed by SIGKILL (usually OOMKilled). See Fix: Docker Container Exited (137) OOMKilled for a deep dive.
  • Exit code 139 — segmentation fault (SIGSEGV). The application crashed due to a memory access violation.
  • Exit code 126 — command not executable (permission issue).
  • Exit code 127 — command not found (wrong entrypoint or missing binary).

If the logs are empty, the container is crashing before it can write anything. Skip to Fix 8 (wrong command/entrypoint) or Fix 10 (init container failures).

Fix 2: OOMKilled (Out of Memory)

If kubectl describe pod shows:

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

The container exceeded its memory limit and Kubernetes killed it.

Check current memory usage:

kubectl top pod myapp-6d8f7b4c5-x9z2k

Fix it by increasing the memory limit:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:latest
        resources:
          requests:
            memory: "256Mi"
          limits:
            memory: "512Mi"

If your application is a Java process, the JVM may not be respecting container limits. Set heap size explicitly:

env:
- name: JAVA_OPTS
  value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0"

For more on Java memory issues in containers, see Fix: Java OutOfMemoryError: Java heap space.

Fix 3: Application Crash on Startup

The most common cause of CrashLoopBackOff is the application itself failing to start. The logs (Fix 1) will usually tell you exactly what went wrong. Common patterns:

Node.js:

Error: Cannot find module '/app/server.js'

The file path in your Dockerfile CMD doesn’t match the actual file location. Verify your Dockerfile’s COPY instructions and working directory.

Python:

ModuleNotFoundError: No module named 'flask'

Dependencies aren’t installed in the image. Make sure your Dockerfile runs pip install -r requirements.txt and that the requirements file is copied into the image.

General:

Error: connect ECONNREFUSED 10.96.0.1:443

The application is trying to connect to a service that isn’t available yet. See Fix 7 (missing dependencies).

When startup errors reference undefined variables, check that your environment configuration is correct. See Fix: Environment Variable Is Undefined for common causes.

Fix 4: Missing ConfigMaps or Secrets

If the pod references a ConfigMap or Secret that doesn’t exist, the container will fail to start.

Check for events about missing resources:

kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -A 5 Events

You might see:

Warning  Failed  kubelet  Error: configmap "myapp-config" not found

or:

Warning  Failed  kubelet  Error: secret "myapp-secret" not found

Fix: Create the missing ConfigMap or Secret:

# Create a ConfigMap from a file
kubectl create configmap myapp-config --from-file=config.yaml

# Create a Secret
kubectl create secret generic myapp-secret \
  --from-literal=DB_PASSWORD=mysecretpassword

Verify the resources exist in the right namespace:

kubectl get configmaps -n <namespace>
kubectl get secrets -n <namespace>

A common mistake is creating the ConfigMap in the default namespace while the pod runs in a different namespace. ConfigMaps and Secrets are namespace-scoped — they must be in the same namespace as the pod.

You can also mark them as optional so the pod starts even if they’re missing:

env:
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: myapp-secret
      key: DB_PASSWORD
      optional: true

Fix 5: Wrong Image or Tag

If the container image doesn’t exist or the tag is wrong, the pod might briefly enter CrashLoopBackOff (though it more commonly shows ImagePullBackOff or ErrImagePull).

Check for image pull errors:

kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -i image

Look for:

Warning  Failed   kubelet  Failed to pull image "myapp:v2.1": rpc error: code = NotFound
Warning  Failed   kubelet  Error: ErrImagePull

Common causes:

  • Typo in the image name or tag
  • The tag latest was overwritten with a broken build
  • Private registry credentials are missing or expired
  • The image was built for a different architecture (e.g., amd64 image on an arm64 node)

Fix: Verify the image exists and is pullable:

# Check if the image exists in the registry
docker pull myapp:v2.1

# Check imagePullSecrets in the pod spec
kubectl get pod myapp-6d8f7b4c5-x9z2k -o jsonpath='{.spec.imagePullSecrets}'

Avoid using latest in production. Pin to a specific tag or digest:

containers:
- name: myapp
  image: myapp:v2.1.0@sha256:abc123...

Fix 6: Health Check Failures (Liveness and Readiness Probes)

A liveness probe that fails repeatedly causes Kubernetes to kill and restart the container, which looks exactly like CrashLoopBackOff. The application might actually be running fine but the probe is misconfigured.

Check if probes are causing the restarts:

kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -A 10 "Liveness\|Readiness"

Look for:

Warning  Unhealthy  kubelet  Liveness probe failed: HTTP probe failed with statuscode: 503
Warning  Unhealthy  kubelet  Liveness probe failed: Get "http://10.244.0.5:8080/health": dial tcp 10.244.0.5:8080: connect: connection refused

Common probe problems:

  1. The probe starts too early. The application needs 30 seconds to start, but the probe begins checking after 10 seconds.
  2. The probe endpoint is wrong. The app serves health on /healthz but the probe checks /health.
  3. The probe timeout is too short. The app takes 5 seconds to respond under load but the timeout is 1 second.

Fix: Adjust the probe timing:

containers:
- name: myapp
  image: myapp:latest
  livenessProbe:
    httpGet:
      path: /healthz
      port: 8080
    initialDelaySeconds: 30    # Wait 30s before first check
    periodSeconds: 10           # Check every 10s
    timeoutSeconds: 5           # Wait 5s for a response
    failureThreshold: 3         # Allow 3 failures before killing
  startupProbe:
    httpGet:
      path: /healthz
      port: 8080
    failureThreshold: 30        # Allow 30 failures during startup
    periodSeconds: 10           # Check every 10s (30 x 10 = 300s max startup time)

Use a startupProbe for slow-starting applications. The startup probe runs first and only hands off to the liveness probe once it succeeds. This is much better than setting a large initialDelaySeconds on the liveness probe.

Quick debugging shortcut — temporarily remove the probes:

kubectl edit deployment myapp

Comment out or delete the livenessProbe section. If the pod stabilizes, the probe was the problem.

Fix 7: Missing Dependencies or Services

The application starts but immediately crashes because it can’t connect to a required service (database, cache, message queue, external API).

Common log patterns:

Error: connect ECONNREFUSED 10.96.14.200:5432
FATAL: password authentication failed for user "myapp"
redis.exceptions.ConnectionError: Error 111 connecting to redis:6379

Check if the dependent services are running:

kubectl get pods -n <namespace>
kubectl get svc -n <namespace>

Check if DNS resolution works from the pod:

kubectl run debug --rm -it --image=busybox -- nslookup postgres.default.svc.cluster.local

Check if the service endpoint has ready pods:

kubectl get endpoints <service-name>

If the endpoints list is empty, the service has no matching pods or the matching pods aren’t ready.

Fix: Add an init container that waits for the dependency:

initContainers:
- name: wait-for-db
  image: busybox
  command: ['sh', '-c', 'until nc -z postgres 5432; do echo "waiting for db"; sleep 2; done']

This ensures the main container doesn’t start until the database is reachable.

Fix 8: Wrong Command or Entrypoint

If the container’s command or args field is wrong, the process exits immediately.

Check what command the pod is running:

kubectl get pod myapp-6d8f7b4c5-x9z2k -o jsonpath='{.spec.containers[0].command}'
kubectl get pod myapp-6d8f7b4c5-x9z2k -o jsonpath='{.spec.containers[0].args}'

Common mistakes:

  1. Shell form vs exec form confusion:
# WRONG -- this tries to run the entire string as a single binary
command: ["node server.js"]

# CORRECT -- each argument is a separate element
command: ["node", "server.js"]
  1. Overriding the image’s entrypoint unintentionally:
# This REPLACES the Dockerfile ENTRYPOINT entirely
command: ["/bin/sh", "-c", "echo hello"]

If you only want to pass arguments, use args instead of command:

# This uses the image's ENTRYPOINT and passes args
args: ["--port", "8080"]
  1. Wrong working directory:
containers:
- name: myapp
  image: myapp:latest
  workingDir: /app  # Make sure this directory exists in the image
  command: ["node", "server.js"]

Debug by overriding the command to keep the container alive:

kubectl run debug --image=myapp:latest --command -- sleep 3600
kubectl exec -it debug -- /bin/sh
# Now you can explore the filesystem and test commands manually

Fix 9: Resource Limits Too Low

CPU throttling doesn’t directly cause CrashLoopBackOff, but extremely low CPU limits can make the application so slow that it fails health checks or times out during startup.

Check if CPU is being throttled:

kubectl top pod myapp-6d8f7b4c5-x9z2k

If the CPU usage is constantly at the limit, the process is being throttled.

Fix: Set reasonable resource requests and limits:

resources:
  requests:
    cpu: "250m"        # 0.25 CPU cores
    memory: "256Mi"
  limits:
    cpu: "1000m"       # 1 CPU core
    memory: "512Mi"

Some guidelines:

  • Set requests to what the application needs under normal load.
  • Set limits to the maximum the application should ever need.
  • For CPU, consider not setting a limit at all. CPU is compressible — the process gets throttled, not killed. Many teams set only CPU requests and memory limits.
  • For memory, always set a limit. Memory is incompressible — if there’s no limit and the node runs out, the OOM killer picks a victim unpredictably.

Check for throttling in cgroup metrics:

kubectl exec myapp-6d8f7b4c5-x9z2k -- cat /sys/fs/cgroup/cpu/cpu.stat

Look at nr_throttled and throttled_time. High values mean the container is frequently being slowed down.

Fix 10: Init Container Failures

If an init container fails, the main container never starts. The pod status may show Init:CrashLoopBackOff instead of just CrashLoopBackOff.

Check init container status:

kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -A 20 "Init Containers"

Get init container logs:

kubectl logs myapp-6d8f7b4c5-x9z2k -c <init-container-name>

Common init container failures:

  • Waiting for a service that will never become available (wrong hostname, wrong port)
  • Permission errors when running setup scripts
  • Migration scripts that fail against the database

Debug the init container interactively:

# Override the init container command to keep it alive
kubectl run debug --image=<init-container-image> --command -- sleep 3600
kubectl exec -it debug -- /bin/sh

Fix 11: Image Pull Errors Leading to CrashLoopBackOff

Sometimes image pull issues manifest as CrashLoopBackOff rather than the more obvious ImagePullBackOff. This happens when the image pulls successfully but is corrupt, built for the wrong architecture, or has a broken entrypoint.

Check the image architecture:

docker manifest inspect myapp:latest | grep architecture

If you’re running arm64 nodes (e.g., AWS Graviton) but the image was built for amd64, the binary will fail to execute immediately. The error in logs may be cryptic:

exec format error

or simply no output at all.

Fix: Build multi-architecture images:

docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push .

Check for corrupt layers:

kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -i "image"

If you suspect corruption, force a fresh pull:

containers:
- name: myapp
  image: myapp:latest
  imagePullPolicy: Always

For more on Docker permission issues that can affect image pulls and builds, see Fix: Docker Permission Denied While Trying to Connect to the Docker Daemon Socket.

Still Not Working?

The pod crashes too fast to inspect

If the container exits in under a second and the logs are empty, override the command to keep it alive:

kubectl run debug --image=myapp:latest --restart=Never --command -- sleep infinity
kubectl exec -it debug -- /bin/sh

Now you can manually run the application’s entrypoint and see the error:

# Inside the debug pod
cd /app
node server.js  # or whatever the original command was

CrashLoopBackOff after a deployment update

If the pod was working before and broke after a deployment, compare the old and new configurations:

kubectl rollout history deployment/myapp
kubectl rollout history deployment/myapp --revision=2

Roll back to the previous working version while you investigate:

kubectl rollout undo deployment/myapp

YAML syntax issues in ConfigMaps or manifests

A surprisingly common cause of mysterious crashes is malformed YAML in ConfigMaps that get mounted as configuration files. The application reads a config file that has invalid syntax and exits immediately. Check your YAML files carefully — see Fix: YAML Mapping Values Are Not Allowed Here for common YAML pitfalls.

CrashLoopBackOff only in one namespace or on one node

If the same pod works in one namespace but not another, check:

# Compare resource quotas
kubectl get resourcequota -n <namespace>

# Compare limit ranges
kubectl get limitrange -n <namespace>

# Check for network policies blocking traffic
kubectl get networkpolicy -n <namespace>

If it only fails on specific nodes, check node taints and available resources:

kubectl describe node <node-name> | grep -A 5 Taints
kubectl top node <node-name>

The pod keeps crashing in a CI/CD pipeline

If the crash only happens during automated deployments and works fine when you deploy manually, the issue is often with how environment variables or secrets are injected. CI/CD tools may set variables differently than your local setup. Check that all required environment variables are present in the deployment manifest, not just in your local environment. For CI/CD debugging, see Fix: GitHub Actions Process Completed with Exit Code 1.

Back-off time is too long during development

During development, the 5-minute maximum back-off delay is painful. You can reset it by deleting and recreating the pod:

kubectl delete pod myapp-6d8f7b4c5-x9z2k

If the pod is managed by a Deployment, Kubernetes creates a new one automatically with the back-off timer reset. For a faster feedback loop during development, use kubectl logs -f to follow logs in real time as the pod restarts:

kubectl logs -f deployment/myapp

Related: If your Docker containers are exiting with code 137, see Fix: Docker Container Exited (137) OOMKilled for a detailed guide on memory-related container kills.

Related Articles