Fix: Kubernetes Pod CrashLoopBackOff (Back-off restarting failed container)
The Error
Your Kubernetes pod won’t start. You check the status and see:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-6d8f7b4c5-x9z2k 0/1 CrashLoopBackOff 5 (47s ago) 4m12sIf you describe the pod, you see:
$ kubectl describe pod myapp-6d8f7b4c5-x9z2k
...
Warning BackOff 2s (x7 over 3m) kubelet Back-off restarting failed containerThe pod starts, crashes immediately, and Kubernetes tries to restart it — but each restart takes longer. The back-off delay doubles each time: 10s, 20s, 40s, 80s, up to a maximum of 5 minutes. The pod is stuck in this loop.
Why This Happens
CrashLoopBackOff is not an error by itself. It is Kubernetes telling you that a container keeps crashing and it is applying an exponential back-off delay before restarting it again. The real question is: why is the container crashing?
The container’s process exits with a non-zero exit code (or gets killed by a signal), Kubernetes detects the failure, and the restart policy (restartPolicy: Always by default) kicks in. After repeated failures, Kubernetes slows down the restart attempts to avoid wasting resources.
The root cause is always one of these:
- The application crashes on startup (unhandled exception, missing dependency, bad config)
- The container runs out of memory (OOMKilled)
- A health check (liveness probe) is failing and Kubernetes kills the container
- The entrypoint or command is wrong
- Required resources (ConfigMaps, Secrets, services) are missing
- Init containers are failing before the main container starts
- The container image is broken or missing
Fix 1: Read the Logs
The first step is always to check the container logs. This tells you what the application printed before it crashed.
Get logs from the current (or most recently crashed) container:
kubectl logs myapp-6d8f7b4c5-x9z2kIf the container has already restarted and you want to see the previous container’s logs:
kubectl logs myapp-6d8f7b4c5-x9z2k --previousIf the pod has multiple containers, specify which one:
kubectl logs myapp-6d8f7b4c5-x9z2k -c myappGet more context with describe:
kubectl describe pod myapp-6d8f7b4c5-x9z2kLook at the Last State section:
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 13 Apr 2026 10:22:01 +0000
Finished: Sun, 13 Apr 2026 10:22:01 +0000The exit code tells you a lot:
- Exit code 1 — generic application error. Read the logs.
- Exit code 137 — the container was killed by SIGKILL (usually OOMKilled). See Fix: Docker Container Exited (137) OOMKilled for a deep dive.
- Exit code 139 — segmentation fault (SIGSEGV). The application crashed due to a memory access violation.
- Exit code 126 — command not executable (permission issue).
- Exit code 127 — command not found (wrong entrypoint or missing binary).
If the logs are empty, the container is crashing before it can write anything. Skip to Fix 8 (wrong command/entrypoint) or Fix 10 (init container failures).
Fix 2: OOMKilled (Out of Memory)
If kubectl describe pod shows:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137The container exceeded its memory limit and Kubernetes killed it.
Check current memory usage:
kubectl top pod myapp-6d8f7b4c5-x9z2kFix it by increasing the memory limit:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: myapp
image: myapp:latest
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"If your application is a Java process, the JVM may not be respecting container limits. Set heap size explicitly:
env:
- name: JAVA_OPTS
value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0"For more on Java memory issues in containers, see Fix: Java OutOfMemoryError: Java heap space.
Fix 3: Application Crash on Startup
The most common cause of CrashLoopBackOff is the application itself failing to start. The logs (Fix 1) will usually tell you exactly what went wrong. Common patterns:
Node.js:
Error: Cannot find module '/app/server.js'The file path in your Dockerfile CMD doesn’t match the actual file location. Verify your Dockerfile’s COPY instructions and working directory.
Python:
ModuleNotFoundError: No module named 'flask'Dependencies aren’t installed in the image. Make sure your Dockerfile runs pip install -r requirements.txt and that the requirements file is copied into the image.
General:
Error: connect ECONNREFUSED 10.96.0.1:443The application is trying to connect to a service that isn’t available yet. See Fix 7 (missing dependencies).
When startup errors reference undefined variables, check that your environment configuration is correct. See Fix: Environment Variable Is Undefined for common causes.
Fix 4: Missing ConfigMaps or Secrets
If the pod references a ConfigMap or Secret that doesn’t exist, the container will fail to start.
Check for events about missing resources:
kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -A 5 EventsYou might see:
Warning Failed kubelet Error: configmap "myapp-config" not foundor:
Warning Failed kubelet Error: secret "myapp-secret" not foundFix: Create the missing ConfigMap or Secret:
# Create a ConfigMap from a file
kubectl create configmap myapp-config --from-file=config.yaml
# Create a Secret
kubectl create secret generic myapp-secret \
--from-literal=DB_PASSWORD=mysecretpasswordVerify the resources exist in the right namespace:
kubectl get configmaps -n <namespace>
kubectl get secrets -n <namespace>A common mistake is creating the ConfigMap in the default namespace while the pod runs in a different namespace. ConfigMaps and Secrets are namespace-scoped — they must be in the same namespace as the pod.
You can also mark them as optional so the pod starts even if they’re missing:
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: myapp-secret
key: DB_PASSWORD
optional: trueFix 5: Wrong Image or Tag
If the container image doesn’t exist or the tag is wrong, the pod might briefly enter CrashLoopBackOff (though it more commonly shows ImagePullBackOff or ErrImagePull).
Check for image pull errors:
kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -i imageLook for:
Warning Failed kubelet Failed to pull image "myapp:v2.1": rpc error: code = NotFound
Warning Failed kubelet Error: ErrImagePullCommon causes:
- Typo in the image name or tag
- The tag
latestwas overwritten with a broken build - Private registry credentials are missing or expired
- The image was built for a different architecture (e.g., amd64 image on an arm64 node)
Fix: Verify the image exists and is pullable:
# Check if the image exists in the registry
docker pull myapp:v2.1
# Check imagePullSecrets in the pod spec
kubectl get pod myapp-6d8f7b4c5-x9z2k -o jsonpath='{.spec.imagePullSecrets}'Avoid using latest in production. Pin to a specific tag or digest:
containers:
- name: myapp
image: myapp:v2.1.0@sha256:abc123...Fix 6: Health Check Failures (Liveness and Readiness Probes)
A liveness probe that fails repeatedly causes Kubernetes to kill and restart the container, which looks exactly like CrashLoopBackOff. The application might actually be running fine but the probe is misconfigured.
Check if probes are causing the restarts:
kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -A 10 "Liveness\|Readiness"Look for:
Warning Unhealthy kubelet Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy kubelet Liveness probe failed: Get "http://10.244.0.5:8080/health": dial tcp 10.244.0.5:8080: connect: connection refusedCommon probe problems:
- The probe starts too early. The application needs 30 seconds to start, but the probe begins checking after 10 seconds.
- The probe endpoint is wrong. The app serves health on
/healthzbut the probe checks/health. - The probe timeout is too short. The app takes 5 seconds to respond under load but the timeout is 1 second.
Fix: Adjust the probe timing:
containers:
- name: myapp
image: myapp:latest
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30 # Wait 30s before first check
periodSeconds: 10 # Check every 10s
timeoutSeconds: 5 # Wait 5s for a response
failureThreshold: 3 # Allow 3 failures before killing
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30 # Allow 30 failures during startup
periodSeconds: 10 # Check every 10s (30 x 10 = 300s max startup time)Use a startupProbe for slow-starting applications. The startup probe runs first and only hands off to the liveness probe once it succeeds. This is much better than setting a large initialDelaySeconds on the liveness probe.
Quick debugging shortcut — temporarily remove the probes:
kubectl edit deployment myappComment out or delete the livenessProbe section. If the pod stabilizes, the probe was the problem.
Fix 7: Missing Dependencies or Services
The application starts but immediately crashes because it can’t connect to a required service (database, cache, message queue, external API).
Common log patterns:
Error: connect ECONNREFUSED 10.96.14.200:5432
FATAL: password authentication failed for user "myapp"
redis.exceptions.ConnectionError: Error 111 connecting to redis:6379Check if the dependent services are running:
kubectl get pods -n <namespace>
kubectl get svc -n <namespace>Check if DNS resolution works from the pod:
kubectl run debug --rm -it --image=busybox -- nslookup postgres.default.svc.cluster.localCheck if the service endpoint has ready pods:
kubectl get endpoints <service-name>If the endpoints list is empty, the service has no matching pods or the matching pods aren’t ready.
Fix: Add an init container that waits for the dependency:
initContainers:
- name: wait-for-db
image: busybox
command: ['sh', '-c', 'until nc -z postgres 5432; do echo "waiting for db"; sleep 2; done']This ensures the main container doesn’t start until the database is reachable.
Fix 8: Wrong Command or Entrypoint
If the container’s command or args field is wrong, the process exits immediately.
Check what command the pod is running:
kubectl get pod myapp-6d8f7b4c5-x9z2k -o jsonpath='{.spec.containers[0].command}'
kubectl get pod myapp-6d8f7b4c5-x9z2k -o jsonpath='{.spec.containers[0].args}'Common mistakes:
- Shell form vs exec form confusion:
# WRONG -- this tries to run the entire string as a single binary
command: ["node server.js"]
# CORRECT -- each argument is a separate element
command: ["node", "server.js"]- Overriding the image’s entrypoint unintentionally:
# This REPLACES the Dockerfile ENTRYPOINT entirely
command: ["/bin/sh", "-c", "echo hello"]If you only want to pass arguments, use args instead of command:
# This uses the image's ENTRYPOINT and passes args
args: ["--port", "8080"]- Wrong working directory:
containers:
- name: myapp
image: myapp:latest
workingDir: /app # Make sure this directory exists in the image
command: ["node", "server.js"]Debug by overriding the command to keep the container alive:
kubectl run debug --image=myapp:latest --command -- sleep 3600
kubectl exec -it debug -- /bin/sh
# Now you can explore the filesystem and test commands manuallyFix 9: Resource Limits Too Low
CPU throttling doesn’t directly cause CrashLoopBackOff, but extremely low CPU limits can make the application so slow that it fails health checks or times out during startup.
Check if CPU is being throttled:
kubectl top pod myapp-6d8f7b4c5-x9z2kIf the CPU usage is constantly at the limit, the process is being throttled.
Fix: Set reasonable resource requests and limits:
resources:
requests:
cpu: "250m" # 0.25 CPU cores
memory: "256Mi"
limits:
cpu: "1000m" # 1 CPU core
memory: "512Mi"Some guidelines:
- Set
requeststo what the application needs under normal load. - Set
limitsto the maximum the application should ever need. - For CPU, consider not setting a limit at all. CPU is compressible — the process gets throttled, not killed. Many teams set only CPU requests and memory limits.
- For memory, always set a limit. Memory is incompressible — if there’s no limit and the node runs out, the OOM killer picks a victim unpredictably.
Check for throttling in cgroup metrics:
kubectl exec myapp-6d8f7b4c5-x9z2k -- cat /sys/fs/cgroup/cpu/cpu.statLook at nr_throttled and throttled_time. High values mean the container is frequently being slowed down.
Fix 10: Init Container Failures
If an init container fails, the main container never starts. The pod status may show Init:CrashLoopBackOff instead of just CrashLoopBackOff.
Check init container status:
kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -A 20 "Init Containers"Get init container logs:
kubectl logs myapp-6d8f7b4c5-x9z2k -c <init-container-name>Common init container failures:
- Waiting for a service that will never become available (wrong hostname, wrong port)
- Permission errors when running setup scripts
- Migration scripts that fail against the database
Debug the init container interactively:
# Override the init container command to keep it alive
kubectl run debug --image=<init-container-image> --command -- sleep 3600
kubectl exec -it debug -- /bin/shFix 11: Image Pull Errors Leading to CrashLoopBackOff
Sometimes image pull issues manifest as CrashLoopBackOff rather than the more obvious ImagePullBackOff. This happens when the image pulls successfully but is corrupt, built for the wrong architecture, or has a broken entrypoint.
Check the image architecture:
docker manifest inspect myapp:latest | grep architectureIf you’re running arm64 nodes (e.g., AWS Graviton) but the image was built for amd64, the binary will fail to execute immediately. The error in logs may be cryptic:
exec format erroror simply no output at all.
Fix: Build multi-architecture images:
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push .Check for corrupt layers:
kubectl describe pod myapp-6d8f7b4c5-x9z2k | grep -i "image"If you suspect corruption, force a fresh pull:
containers:
- name: myapp
image: myapp:latest
imagePullPolicy: AlwaysFor more on Docker permission issues that can affect image pulls and builds, see Fix: Docker Permission Denied While Trying to Connect to the Docker Daemon Socket.
Still Not Working?
The pod crashes too fast to inspect
If the container exits in under a second and the logs are empty, override the command to keep it alive:
kubectl run debug --image=myapp:latest --restart=Never --command -- sleep infinity
kubectl exec -it debug -- /bin/shNow you can manually run the application’s entrypoint and see the error:
# Inside the debug pod
cd /app
node server.js # or whatever the original command wasCrashLoopBackOff after a deployment update
If the pod was working before and broke after a deployment, compare the old and new configurations:
kubectl rollout history deployment/myapp
kubectl rollout history deployment/myapp --revision=2Roll back to the previous working version while you investigate:
kubectl rollout undo deployment/myappYAML syntax issues in ConfigMaps or manifests
A surprisingly common cause of mysterious crashes is malformed YAML in ConfigMaps that get mounted as configuration files. The application reads a config file that has invalid syntax and exits immediately. Check your YAML files carefully — see Fix: YAML Mapping Values Are Not Allowed Here for common YAML pitfalls.
CrashLoopBackOff only in one namespace or on one node
If the same pod works in one namespace but not another, check:
# Compare resource quotas
kubectl get resourcequota -n <namespace>
# Compare limit ranges
kubectl get limitrange -n <namespace>
# Check for network policies blocking traffic
kubectl get networkpolicy -n <namespace>If it only fails on specific nodes, check node taints and available resources:
kubectl describe node <node-name> | grep -A 5 Taints
kubectl top node <node-name>The pod keeps crashing in a CI/CD pipeline
If the crash only happens during automated deployments and works fine when you deploy manually, the issue is often with how environment variables or secrets are injected. CI/CD tools may set variables differently than your local setup. Check that all required environment variables are present in the deployment manifest, not just in your local environment. For CI/CD debugging, see Fix: GitHub Actions Process Completed with Exit Code 1.
Back-off time is too long during development
During development, the 5-minute maximum back-off delay is painful. You can reset it by deleting and recreating the pod:
kubectl delete pod myapp-6d8f7b4c5-x9z2kIf the pod is managed by a Deployment, Kubernetes creates a new one automatically with the back-off timer reset. For a faster feedback loop during development, use kubectl logs -f to follow logs in real time as the pod restarts:
kubectl logs -f deployment/myappRelated: If your Docker containers are exiting with code 137, see Fix: Docker Container Exited (137) OOMKilled for a detailed guide on memory-related container kills.
Related Articles
Fix: YAML 'mapping values are not allowed here' and Other YAML Syntax Errors
How to fix 'mapping values are not allowed here', 'could not find expected :', 'did not find expected key', and other YAML indentation and syntax errors in Docker Compose, Kubernetes manifests, GitHub Actions, and config files.
Fix: Docker Container Exited (137) OOMKilled / Killed Signal 9
How to fix Docker container 'Exited (137)', OOMKilled, and 'Killed' signal 9 errors caused by out-of-memory conditions in Docker, Docker Compose, and Kubernetes.
Fix: The Connection to the Server localhost:8080 Was Refused (kubectl)
How to fix 'the connection to the server localhost:8080 was refused' and other kubectl connection errors when the Kubernetes API server is unreachable.
Fix: Docker Volume Permission Denied – Cannot Write to Mounted Volume
How to fix Docker permission denied errors on mounted volumes caused by UID/GID mismatch, read-only mounts, or SELinux labels.