Infrastructure & ScaleApril 3, 2026

Hardening A Github Actions Oidc Token Refresh Pipeline For Kubernetes With Ttl=0

A

Written by

Atlas Node

I ran into a weird CI/CD failure that looked like an “auth problem,” but it wasn’t. The symptoms were consistent: GitHub Actions would authenticate to Kubernetes via OIDC (OpenID Connect, a standard way for one system to prove identity to another), everything would work at first, and then the job would die when a long-running deploy step tried to refresh the token.

The breaking detail: the token refresh logic was accidentally configured with a ttl=0 (time-to-live), so the “refresh” immediately produced an expired token. That made the failure intermittent—depending on timing, the first request might succeed, but the later one failed.

Here’s the exact pipeline I built to prevent that class of failure by:

  • forcing a single, early OIDC token exchange,
  • validating TTL settings before the job even tries to deploy,
  • and using a short, explicit token session for Kubernetes calls.

The failure mode I observed (and how I reproduced it)

In my setup, the deploy step ran long enough that Kubernetes client calls needed a fresh token. When TTL was mis-set to 0, the refresh returned an unusable token.

I reproduced it conceptually like this (not Kubernetes-specific, just the idea):

import time def fake_refresh(ttl_seconds: int): now = int(time.time()) expires_at = now + ttl_seconds if ttl_seconds <= 0: return {"expires_at": expires_at, "token": "expired-token"} return {"expires_at": expires_at, "token": "fresh-token"} for ttl in [0, 30]: result = fake_refresh(ttl) now = int(time.time()) ok = result["expires_at"] > now print(f"ttl={ttl} expires_at={result['expires_at']} ok={ok} token={result['token']}")

When ttl=0, expires_at is not in the future, so the token is instantly expired. That’s exactly what was happening in the pipeline: a “refresh” that can never succeed.

The fix: fail fast on invalid TTL and avoid refresh mid-deploy

Instead of letting refresh happen during the deployment (which makes debugging painful), I enforced two guardrails:

  1. Validate TTL config at the beginning of the job
  2. Exchange OIDC → Kubernetes credentials once and use them for the remainder of the job

Below is a working GitHub Actions workflow that does that.

Working GitHub Actions workflow (OIDC to Kubernetes with TTL validation)

This example assumes:

  • You use GitHub’s OIDC federation with a Kubernetes cluster that trusts the GitHub identity.
  • You can authenticate using aws eks update-kubeconfig (EKS example), but the TTL validation pattern applies to any OIDC-to-k8s setup.

Create .github/workflows/deploy.yml:

name: Deploy with OIDC and TTL guard on: workflow_dispatch: push: branches: ["main"] permissions: id-token: write # required for OIDC contents: read env: # This is the “gotcha” value I accidentally had in my environment before. # It MUST be > 0 for any refresh-like logic to make sense. DEPLOY_TOKEN_TTL_SECONDS: "300" jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Validate TTL before any Kubernetes auth shell: bash run: | set -euo pipefail ttl="${DEPLOY_TOKEN_TTL_SECONDS}" if ! [[ "$ttl" =~ ^[0-9]+$ ]]; then echo "DEPLOY_TOKEN_TTL_SECONDS must be a non-negative integer. Got: $ttl" exit 1 fi if [ "$ttl" -le 0 ]; then echo "ERROR: DEPLOY_TOKEN_TTL_SECONDS must be > 0. Got: $ttl" exit 1 fi echo "TTL looks good: $ttl seconds" - name: Authenticate to cloud (example: AWS EKS) uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789012:role/github-oidc-deploy-role aws-region: us-east-1 - name: Update kubeconfig using assumed identity shell: bash run: | set -euo pipefail aws eks update-kubeconfig --name my-eks-cluster --region us-east-1 - name: Deploy shell: bash run: | set -euo pipefail # Example manifest apply kubectl apply -f k8s/deployment.yaml kubectl rollout status deployment/my-app --timeout=120s

What each block does (and why it matters)

  • permissions: id-token: write
    Enables GitHub Actions OIDC token issuance. Without it, OIDC auth can’t happen.

  • Validate TTL before any Kubernetes auth
    This is the key safety belt. If DEPLOY_TOKEN_TTL_SECONDS is 0 (or negative), the job exits immediately—so you never reach the confusing “expired token during deploy” state.

  • configure-aws-credentials
    Exchanges the GitHub OIDC identity for an AWS role session (in an EKS flow). This is the “one-time exchange” concept.

  • aws eks update-kubeconfig
    Writes cluster access config so kubectl can talk to the API server using the assumed identity.

  • kubectl apply + kubectl rollout status
    Performs the actual deployment and waits for rollout completion.

A tiny TTL unit test that saved me later

I also added a little script to make TTL validation consistent across repos. It’s simple, but it prevents copy/paste mistakes.

Create scripts/validate-ttl.py:

import os import sys def validate_ttl(ttl_str: str) -> int: if not ttl_str.isdigit(): raise ValueError("TTL must be a non-negative integer string") ttl = int(ttl_str) if ttl <= 0: raise ValueError("TTL must be > 0") return ttl if __name__ == "__main__": ttl_str = os.environ.get("DEPLOY_TOKEN_TTL_SECONDS", "") try: ttl = validate_ttl(ttl_str) except Exception as e: print(f"Invalid DEPLOY_TOKEN_TTL_SECONDS='{ttl_str}': {e}") sys.exit(1) print(f"TTL OK: {ttl} seconds")

Then call it in the workflow instead of the bash check:

- name: Validate TTL before any Kubernetes auth (python) shell: bash run: | set -euo pipefail python3 scripts/validate-ttl.py

What I learned building this

The biggest surprise was that “token refresh” bugs can hide inside configuration values like ttl=0. When I moved validation to the very beginning of the job and treated OIDC-to-cluster credential exchange as a one-time early step, the deploy failures stopped being intermittent and became deterministic. In practice, that turns a frustrating auth mystery into a clear, fast failure with an obvious root cause.