Skip to content

Fix: GitHub Actions Runner Failed to Start or Connect

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

Fix GitHub Actions self-hosted runner failures including connection issues, version mismatches, and registration problems with step-by-step solutions.

Runner Failed to Start or Connect

I learned to split this problem in two before doing anything else, because a self-hosted runner that “fails” is really two different failures wearing one face. Either the agent on your host genuinely cannot reach GitHub (offline, red dot), or the agent is perfectly healthy and idle but GitHub never sends it a job because of a label, a runner group, or a disabled-Actions setting. I once wasted hours SSH-ing into a runner box that was completely fine, when the real fix was three clicks deep in org settings. The first question is always: does the Runners page show this runner as Offline, or as Idle?

You set up a self-hosted runner for GitHub Actions and see one of these messages:

Error: The self-hosted runner lost communication with the server.
Could not resolve host: github.com
Runner connect error: The HTTP request timed out after 00:01:00.

Or the runner appears offline in your repository’s Settings > Actions > Runners page, even though you believe it’s running.

Why a Runner Goes Offline

Self-hosted runners maintain a persistent long-poll HTTPS connection to GitHub’s job-dispatch service. The runner agent (Runner.Listener) keeps that socket open, waits for a job, leases it, then hands the work to a worker process (Runner.Worker). When the listener can’t reach GitHub, can’t authenticate, or can’t be matched to a queued job, the runner shows offline and your workflow sits in a “Waiting for a runner” state indefinitely. The error message you see is downstream of the failure, the root cause is almost always one of four categories: network reachability, runner agent state, label or group misrouting, or host resource exhaustion.

GitHub regularly updates the runner application and stops accepting connections from versions that fall outside the supported window. The agent has auto-update logic, but that update path itself depends on the runner being able to reach GitHub at the moment a new release ships. If your runner was offline when an update was pushed, it can be stuck on a version GitHub no longer accepts, which then prevents it from coming back online, a chicken-and-egg loop that requires a manual upgrade.

There’s also a class of failures that has nothing to do with the runner itself: the GitHub-side configuration can silently keep jobs from ever reaching the runner. Organization spending limits, restricted runner groups, missing labels, and disabled Actions at the repo or org level all produce the same surface symptom (“job pending, runner idle”) but require completely different fixes. The diagnostic timeline below walks through how to separate these cases in order of likelihood.

Diagnostic Timeline

Use this sequence the moment a self-hosted runner stops picking up jobs. Each step takes under a minute and rules out one root cause.

  • Minute 0: Confirm the runner row in Settings > Actions > Runners. Green dot with “Idle” means the agent is connected and waiting for work; the issue is label/group routing or repo permissions, not the runner. Red dot with “Offline” means the agent itself can’t talk to GitHub, jump to network and version checks.
  • Minute 1: Compare the workflow’s runs-on: value against the runner’s labels. Open the failing workflow file, note the label(s), then click into the runner row and compare. A typo (self-hosted-linux vs self-hosted,linux) sends jobs to a phantom runner.
  • Minute 2: Check organization Actions spending and runner groups. Org Settings > Billing > Plans and add-ons shows whether you’ve hit the Actions minutes cap (this affects GitHub-hosted runners but also queues self-hosted jobs that depend on workflow_run). Org Settings > Actions > Runner groups shows whether the runner’s group is restricted to specific repositories.
  • Minute 3: On the runner host, check the listener process. Run ps aux | grep Runner.Listener (or Get-Process Runner.Listener on Windows). If the process is missing, the service crashed; if it’s running but offline, the agent thinks it’s connected but GitHub disagrees, usually a version or token mismatch.
  • Minute 4: Tail the diagnostic log. tail -50 _diag/Runner_*.log shows the exact handshake. A “401 Unauthorized” points at an expired or revoked registration; “Connect timeout” points at network or DNS; “Version not supported” points at an upgrade.
  • Minute 5: Check disk and inode usage. df -h and df -i. A full disk silently kills the worker after job start, which looks identical to a connection drop in the UI.
  • Minute 6: Restart the service interactively, not as a daemon. sudo ./svc.sh stop && ./run.sh. Running the agent in the foreground surfaces handshake errors that the systemd journal sometimes truncates.

If you reach minute 6 without a clear cause, you’re almost certainly looking at a corporate proxy doing TLS inspection, a DNS split-horizon issue, or an outbound firewall change that the runner host can’t see. Move to Fix 1.

Fix 1: Check Network Connectivity

The runner needs outbound HTTPS access to several GitHub domains. Test connectivity from the runner machine:

curl -v https://github.com
curl -v https://api.github.com
curl -v https://codeload.github.com
curl -v https://objects.githubusercontent.com

What you are checking is that each host answers over TLS at all, not the specific status code. github.com and api.github.com return 200, but codeload.github.com and objects.githubusercontent.com return a 4xx on a bare request because they expect a path, and that is fine: a 404 still proves you reached GitHub. The real firewall or DNS symptoms are a connection timeout, Could not resolve host, or Connection refused. If you see any of those, fix the network path. The runner communicates exclusively over HTTPS (port 443).

For runners behind a corporate proxy:

export https_proxy=http://proxy.company.com:8080
export http_proxy=http://proxy.company.com:8080
export no_proxy=localhost,127.0.0.1

Add these to the runner’s .env file (located in the runner directory) to persist across restarts.

When I have to get a runner working behind a strict firewall, I pull GitHub’s published IP ranges from the meta API and hand the actions key’s CIDR list straight to the network team. It is far more reliable than allowlisting by hostname, because the hostnames sit behind a CDN whose IPs rotate.

Fix 2: Update the Runner Version

GitHub does not just recommend a recent runner, it now enforces a hard minimum version. Runners older than the floor are refused at registration and connection time, so an old agent that drifts below it goes offline and cannot come back without a manual upgrade. GitHub raises this floor periodically; the March 2026 enforcement set it at v2.329.0, and it climbs from there. Check the runner releases page and the GitHub Actions changelog for the current required version.

Check your installed version (both commands work):

./run.sh --version
./config.sh --version

Compare it with the latest release on GitHub. If your version is more than a few minor versions behind, update:

# Stop the runner
sudo ./svc.sh stop

# Download and extract the latest version
curl -o actions-runner-linux-x64.tar.gz -L \
  https://github.com/actions/runner/releases/download/v2.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
tar xzf actions-runner-linux-x64.tar.gz

# Restart
sudo ./svc.sh start

The runner has auto-update capability, but it sometimes fails if the runner process isn’t running when an update is published.

Fix 3: Re-register the Runner

Registration tokens expire after 1 hour. If the runner was configured with an expired token, it won’t connect. Re-register:

# Remove existing registration
./config.sh remove --token YOUR_REMOVAL_TOKEN

# Generate a new token from:
# Settings > Actions > Runners > New self-hosted runner

# Re-configure
./config.sh --url https://github.com/OWNER/REPO --token NEW_TOKEN

For organization-level runners, use the organization settings page instead. You can also generate tokens via the GitHub API:

curl -X POST \
  -H "Authorization: token YOUR_PAT" \
  https://api.github.com/repos/OWNER/REPO/actions/runners/registration-token

Fix 4: Fix Label and Group Mismatches

Jobs target runners using labels. If your workflow specifies a label the runner doesn’t have, the job queues forever:

# Workflow expects this label
runs-on: self-hosted-gpu

# But runner was configured with
# ./config.sh --labels self-hosted,linux,x64

Check runner labels in Settings > Actions > Runners. Add missing labels:

# You must remove and re-register to change labels
./config.sh remove --token TOKEN
./config.sh --url https://github.com/OWNER/REPO \
  --token NEW_TOKEN \
  --labels self-hosted,linux,x64,self-hosted-gpu

A misroute that has fooled me more than once: runner groups (an enterprise/organization feature) restrict which repositories a runner will serve. The runner shows a healthy green “Idle” dot, but if its group does not include your repository, GitHub never routes a job to it, so it looks broken while being perfectly fine. Check Organization Settings > Actions > Runner groups before touching the runner host.

Fix 5: Fix Docker-Based Runner Issues

If you run the GitHub Actions runner inside a Docker container, several issues can arise:

# Common mistake: running as root without --user
FROM ubuntu:22.04
# Runner refuses to run as root by default

The runner won’t start as root unless you set RUNNER_ALLOW_RUNASROOT=1:

docker run -e RUNNER_ALLOW_RUNASROOT=1 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  your-runner-image

For Docker-in-Docker workflows, mount the Docker socket:

docker run -v /var/run/docker.sock:/var/run/docker.sock \
  -v /tmp:/tmp \
  your-runner-image

Make sure the runner container has enough disk space for workspace files and Docker layer caching.

Fix 6: Address Resource Limits

The runner may crash or hang if the machine runs out of resources. Check:

# Memory
free -h

# Disk space
df -h

# CPU
top -bn1 | head -5

# Check if runner process is alive
ps aux | grep Runner.Listener

Common resource issues:

  • Disk full: Old workflow artifacts and Docker images accumulate. Clean up with docker system prune -af and clear the runner’s _work directory.
  • Memory exhaustion: The runner itself uses ~200MB, but your workflows may need much more. Monitor with dmesg | grep -i oom to check for OOM kills.
  • Too many concurrent jobs: By default, a runner processes one job at a time. Running multiple runners on the same machine requires enough resources for all concurrent jobs.

Fix 7: Fix GITHUB_TOKEN Permissions

The runner uses a GITHUB_TOKEN that’s automatically generated for each workflow run. If permissions are too restrictive, steps that interact with the repository may fail:

permissions:
  contents: read
  packages: write
  issues: write

For organization repositories with restrictive default permissions, set permissions explicitly in your workflow:

jobs:
  build:
    runs-on: self-hosted
    permissions:
      contents: write
      pull-requests: write

Check your organization settings under Settings > Actions > General > Workflow permissions. “Read repository contents” is the most restrictive default and may block operations like pushing commits or creating releases.

Fix 8: Debug Using Runner Logs

The runner writes detailed logs that reveal exactly why it can’t connect:

# Service logs (if installed as service)
journalctl -u actions.runner.OWNER-REPO.RUNNER_NAME -f

# Or check the log files directly
cat _diag/Runner_*.log | tail -100
cat _diag/Worker_*.log | tail -100

Look for these key messages:

  • "Authentication failed", Token expired or invalid. Re-register.
  • "Http response code: Unauthorized", PAT or app token lacks required scopes.
  • "Connect timeout", Network issue. Check firewall and DNS.
  • "Version not supported", Runner too old. Update.
  • "No free disk space", Clean up the _work directory.

Enable diagnostic logging by creating a .env file in the runner directory:

ACTIONS_RUNNER_DEBUG=true
ACTIONS_STEP_DEBUG=true

Rarer Reasons a Runner Won’t Connect

  • Check if GitHub is down. Visit githubstatus.com before deep-diving into your configuration.

  • Verify DNS resolution. Run nslookup github.com from the runner machine. Corporate DNS servers sometimes block or redirect GitHub domains.

  • Check TLS certificates. Corporate proxies that perform SSL inspection can break the runner’s HTTPS connection. Add your corporate CA certificate to the runner’s trust store at the OS level so the .NET runtime that ships with the agent picks it up.

  • Try running interactively. Stop the service and run ./run.sh directly. This shows real-time errors that the service logs might not capture.

  • Check Docker image compatibility. If using a container-based runner, ensure the base image has all required dependencies (libicu, libssl, git).

  • Monitor the runner process. Use systemctl status actions.runner.* to check if the service is actually running or if it crashed silently.

  • Check organization spending limits (private repos). Self-hosted runner minutes are free, so people assume billing can’t be the cause, but artifact and package storage still counts against your quota even when every job runs on your own hardware. If the included storage is exhausted and the spending limit is $0, GitHub stops running workflows in private repositories, self-hosted jobs included, until the next cycle or until you raise the limit. (Public repositories get Actions for free, so this never applies there.) Org Settings > Billing > Plans and add-ons shows usage; bumping the limit frees pending jobs immediately, with no restart needed.

  • Confirm Actions is enabled at every level. Repo Settings > Actions > General, then Org Settings > Actions > General. A “Disabled” setting at the org level overrides every repo and silently queues jobs. The runner stays “Idle” because it’s healthy, there just isn’t a job allowed to reach it.

  • Look for ephemeral runner exhaustion. If you registered the runner with --ephemeral, it accepts exactly one job and de-registers. A workflow that uses runs-on: self-hosted after that finds zero matching runners. Add a re-registration loop or switch to a non-ephemeral configuration if you need persistent capacity.

  • Audit _work/_temp ownership. A previous job that ran as root can leave files the runner user can’t delete on the next checkout. The next job fails before any of your steps execute. chown -R runner:runner _work resolves it.

  • Check for IPv6 surprises. Some runner hosts resolve github.com to an IPv6 address by default but only have IPv4 outbound through the corporate firewall. The TCP connection silently times out instead of failing fast. Force IPv4 by setting precedence ::ffff:0:0/96 100 in /etc/gai.conf or by editing the firewall to allow IPv6 egress on 443.

  • Watch for clock skew. TLS handshakes fail with cryptic errors when the runner clock drifts more than five minutes from real time. Enable chronyd or systemd-timesyncd and confirm with timedatectl status. Hosts that have been suspended or paused (common with VM-based runners) frequently come back with a stale clock.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles