How do I fix "Kubernetes Pod OOMKilled (Exit Code 137)"?

How to fix Kubernetes OOMKilled pod status caused by memory limit exceeded, container memory leaks, JVM heap misconfiguration, and resource requests/limits settings.

Fix: Kubernetes Pod OOMKilled (Exit Code 137)

The Error

You check your pod status and see:

$ kubectl get pods
NAME                     READY   STATUS      RESTARTS   AGE
my-app-7b9f4d8c5-x2k9l  0/1     OOMKilled   3          5m

Or in the pod description:

$ kubectl describe pod my-app-7b9f4d8c5-x2k9l
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

Or the pod is in CrashLoopBackOff and the previous termination reason is OOMKilled:

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

Kubernetes killed your container because it exceeded its memory limit. The Linux kernel’s OOM (Out Of Memory) killer terminated the process, and Kubernetes reports it as OOMKilled with exit code 137 (128 + signal 9 = SIGKILL).

Why This Happens

Kubernetes enforces memory limits set in the pod specification. When a container tries to use more memory than its limit, the Linux kernel kills it immediately. There is no warning, no graceful shutdown — the process is killed with SIGKILL.

Common causes:

Memory limit is too low. Your application genuinely needs more memory than the limit allows.
Memory leak. The application gradually consumes more memory until it hits the limit.
JVM heap misconfiguration. The JVM’s max heap exceeds the container’s memory limit, or non-heap memory (metaspace, thread stacks, native memory) is not accounted for.
No memory limit set. Without a limit, the container uses node memory until the node’s OOM killer steps in, which is worse.
Sidecar containers. Init containers or sidecars (like Istio’s envoy) consume memory that was not accounted for in the limit.
Temporary spikes. The application handles a burst of traffic that causes temporary memory spikes.

Fix 1: Increase the Memory Limit

The simplest fix. If your application legitimately needs more memory, increase the limit:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"

Key concepts:

requests — the amount of memory Kubernetes guarantees to the container. Used for scheduling.
limits — the maximum memory the container can use. Exceeding this triggers OOMKill.

Set requests to what the application typically uses and limits to handle peak usage:

resources:
  requests:
    memory: "512Mi"    # Normal usage
  limits:
    memory: "1Gi"      # Peak usage ceiling

Pro Tip: Do not set requests equal to limits unless you want guaranteed QoS (quality of service). Equal values put the pod in the Guaranteed QoS class, which means it is the last to be evicted under node pressure — but it also means no burst capacity. For most workloads, set limits to 1.5–2x the requests value.

Fix 2: Monitor Actual Memory Usage

Before changing limits, understand how much memory your application actually uses:

Real-time memory usage:

kubectl top pod my-app-7b9f4d8c5-x2k9l

Memory usage over time (requires metrics-server):

kubectl top pod --containers

Check the OOM event details:

kubectl describe pod my-app-7b9f4d8c5-x2k9l | grep -A 5 "Last State"

Check node-level memory pressure:

kubectl describe node <node-name> | grep -A 5 "Conditions"
kubectl top node

If kubectl top does not work, install the metrics-server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Use this data to set appropriate limits. If the application uses 400Mi normally and peaks at 600Mi, set requests to 512Mi and limits to 768Mi–1Gi.

Fix 3: Fix JVM Memory in Containers

Java applications are the most common source of OOMKilled in Kubernetes. The JVM allocates memory for heap, metaspace, thread stacks, code cache, and native memory — all of which count toward the container limit.

Use container-aware JVM flags:

containers:
  - name: my-app
    image: my-java-app:latest
    resources:
      limits:
        memory: "1Gi"
    env:
      - name: JAVA_OPTS
        value: "-XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0"

-XX:MaxRAMPercentage=75.0 sets the max heap to 75% of the container’s memory limit. The remaining 25% is for metaspace, thread stacks, and native memory.

Do NOT use -Xmx with a value close to the memory limit:

# WRONG — leaves no room for non-heap memory
env:
  - name: JAVA_OPTS
    value: "-Xmx1g"  # Container limit is also 1Gi — OOMKilled!

The JVM uses more than just the heap. With -Xmx1g in a 1Gi container, total memory exceeds the limit and the container gets killed.

Rule of thumb: Set -Xmx to about 70–75% of the container memory limit. Or better, use -XX:MaxRAMPercentage which calculates automatically.

For Java OutOfMemoryError within the JVM itself (not container-level), see Fix: Java OutOfMemoryError.

Fix 4: Fix Node.js Memory in Containers

Node.js has a default heap limit (around 1.5–2GB depending on the version). In a container with a lower memory limit, Node.js might try to use more memory than allowed.

Set the Node.js heap limit explicitly:

containers:
  - name: my-app
    image: my-node-app:latest
    resources:
      limits:
        memory: "512Mi"
    env:
      - name: NODE_OPTIONS
        value: "--max-old-space-size=384"

Set --max-old-space-size to about 75% of the container memory limit (in MB). The remaining 25% handles native memory, buffers, and other overhead.

For JavaScript heap out of memory errors specifically, see Fix: JavaScript heap out of memory.

Fix 5: Debug Memory Leaks

If the container is OOMKilled after running for hours or days (not immediately on startup), you likely have a memory leak.

Identify the pattern:

# Watch memory usage over time
watch kubectl top pod my-app-7b9f4d8c5-x2k9l --containers

If memory grows steadily without dropping, it is a leak.

For Node.js:

# Generate a heap snapshot
kubectl exec my-pod -- node -e "require('v8').writeHeapSnapshot()"
kubectl cp my-pod:/app/Heap.*.heapsnapshot ./heapdump.heapsnapshot

Open the heapsnapshot in Chrome DevTools (Memory tab) to identify leaked objects.

For Java:

# Trigger a heap dump
kubectl exec my-pod -- jcmd 1 GC.heap_dump /tmp/heapdump.hprof
kubectl cp my-pod:/tmp/heapdump.hprof ./heapdump.hprof

Analyze with Eclipse MAT or VisualVM.

For Python:

Use tracemalloc or objgraph to track memory allocations.

Common leak sources:

Unbounded caches or in-memory stores
Event listeners that are never removed
Database connection pools that grow without cleanup
Large objects stored in session state
Circular references preventing garbage collection

Fix 6: Handle Sidecar Container Memory

If your pod has sidecar containers (Istio envoy, log collectors, monitoring agents), their memory counts toward the pod’s total. But each container has its own resources section:

spec:
  containers:
    - name: my-app
      resources:
        limits:
          memory: "512Mi"
    - name: istio-proxy
      resources:
        limits:
          memory: "128Mi"
    - name: log-collector
      resources:
        limits:
          memory: "64Mi"

Check which container was OOMKilled:

kubectl describe pod my-app-pod | grep -B 2 "OOMKilled"

The output shows which specific container was terminated. Fix the limit for that container, not the others.

Common Mistake: Increasing the main container’s memory limit when the sidecar is the one being OOMKilled. Always check kubectl describe pod to identify which container hit its limit.

Fix 7: Set LimitRange and ResourceQuota

LimitRange sets default and maximum limits for containers in a namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: memory-limits
  namespace: default
spec:
  limits:
    - default:
        memory: "512Mi"
      defaultRequest:
        memory: "256Mi"
      max:
        memory: "2Gi"
      min:
        memory: "64Mi"
      type: Container

This ensures that every container has reasonable memory limits even if the deployment spec does not specify them.

ResourceQuota limits total memory for an entire namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: memory-quota
  namespace: default
spec:
  hard:
    requests.memory: "8Gi"
    limits.memory: "16Gi"

Fix 8: Use Vertical Pod Autoscaler (VPA)

If you are unsure what the right memory limit should be, use VPA to automatically recommend or set memory limits based on actual usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # "Off" = recommend only, "Auto" = apply automatically

With updateMode: "Off", check the recommendations:

kubectl describe vpa my-app-vpa

VPA analyzes historical memory usage and recommends appropriate requests and limits.

Still Not Working?

If the pod continues to be OOMKilled after increasing limits:

Check for node-level memory pressure. The node itself might be running out of memory, causing the kubelet to evict pods:

kubectl describe node <node-name> | grep -A 10 "Conditions"

Look for MemoryPressure: True. If the node is under pressure, scale up the cluster or add more nodes.

Check for init container memory. Init containers run before the main container and have their own memory limits. If an init container uses too much memory, the pod fails before the main container starts.

Check for ephemeral storage OOM. Kubernetes also evicts pods that exceed ephemeral storage limits (ephemeral-storage in resources). The symptoms look similar to OOMKill but the reason in kubectl describe will say Evicted rather than OOMKilled.

Check kernel overcommit settings. The node’s vm.overcommit_memory kernel parameter affects how the OOM killer behaves. A value of 0 (heuristic) or 2 (strict) can cause unexpected OOM kills.

If the pod is crashing for reasons other than OOM, see Fix: Kubernetes CrashLoopBackOff. If the pod cannot start because the image cannot be pulled, see Fix: Kubernetes ImagePullBackOff.

For the Docker-level equivalent of this error (outside Kubernetes), see Fix: Docker exited with code 137 OOMKilled.