Hardening A Github Actions Oidc Token Refresh Pipeline For Kubernetes With Ttl=0

I ran into a weird CI/CD failure that looked like an “auth problem,” but it wasn’t. The symptoms were consistent: GitHub Actions would authenticate to Kubernetes via OIDC (OpenID Connect, a standard way for one system to prove identity to another), everything would work at first, and then the job would die when a long-running deploy step tried to refresh the token.

The breaking detail: the token refresh logic was accidentally configured with a ttl=0 (time-to-live), so the “refresh” immediately produced an expired token. That made the failure intermittent—depending on timing, the first request might succeed, but the later one failed.

Here’s the exact pipeline I built to prevent that class of failure by:

forcing a single, early OIDC token exchange,
validating TTL settings before the job even tries to deploy,
and using a short, explicit token session for Kubernetes calls.

The failure mode I observed (and how I reproduced it)

In my setup, the deploy step ran long enough that Kubernetes client calls needed a fresh token. When TTL was mis-set to 0, the refresh returned an unusable token.

I reproduced it conceptually like this (not Kubernetes-specific, just the idea):

import time

def fake_refresh(ttl_seconds: int):
    now = int(time.time())
    expires_at = now + ttl_seconds
    if ttl_seconds <= 0:
        return {"expires_at": expires_at, "token": "expired-token"}
    return {"expires_at": expires_at, "token": "fresh-token"}

for ttl in [0, 30]:
    result = fake_refresh(ttl)
    now = int(time.time())
    ok = result["expires_at"] > now
    print(f"ttl={ttl} expires_at={result['expires_at']} ok={ok} token={result['token']}")

When ttl=0, expires_at is not in the future, so the token is instantly expired. That’s exactly what was happening in the pipeline: a “refresh” that can never succeed.

The fix: fail fast on invalid TTL and avoid refresh mid-deploy

Instead of letting refresh happen during the deployment (which makes debugging painful), I enforced two guardrails:

Validate TTL config at the beginning of the job
Exchange OIDC → Kubernetes credentials once and use them for the remainder of the job

Below is a working GitHub Actions workflow that does that.

Working GitHub Actions workflow (OIDC to Kubernetes with TTL validation)

This example assumes:

You use GitHub’s OIDC federation with a Kubernetes cluster that trusts the GitHub identity.
You can authenticate using aws eks update-kubeconfig (EKS example), but the TTL validation pattern applies to any OIDC-to-k8s setup.

Create .github/workflows/deploy.yml:

name: Deploy with OIDC and TTL guard

on:
  workflow_dispatch:
  push:
    branches: ["main"]

permissions:
  id-token: write   # required for OIDC
  contents: read

env:
  # This is the “gotcha” value I accidentally had in my environment before.
  # It MUST be > 0 for any refresh-like logic to make sense.
  DEPLOY_TOKEN_TTL_SECONDS: "300"

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Validate TTL before any Kubernetes auth
        shell: bash
        run: |
          set -euo pipefail
          ttl="${DEPLOY_TOKEN_TTL_SECONDS}"

          if ! [[ "$ttl" =~ ^[0-9]+$ ]]; then
            echo "DEPLOY_TOKEN_TTL_SECONDS must be a non-negative integer. Got: $ttl"
            exit 1
          fi

          if [ "$ttl" -le 0 ]; then
            echo "ERROR: DEPLOY_TOKEN_TTL_SECONDS must be > 0. Got: $ttl"
            exit 1
          fi

          echo "TTL looks good: $ttl seconds"

      - name: Authenticate to cloud (example: AWS EKS)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-oidc-deploy-role
          aws-region: us-east-1

      - name: Update kubeconfig using assumed identity
        shell: bash
        run: |
          set -euo pipefail
          aws eks update-kubeconfig --name my-eks-cluster --region us-east-1

      - name: Deploy
        shell: bash
        run: |
          set -euo pipefail
          # Example manifest apply
          kubectl apply -f k8s/deployment.yaml
          kubectl rollout status deployment/my-app --timeout=120s

What each block does (and why it matters)

permissions: id-token: write
Enables GitHub Actions OIDC token issuance. Without it, OIDC auth can’t happen.
Validate TTL before any Kubernetes auth
This is the key safety belt. If DEPLOY_TOKEN_TTL_SECONDS is 0 (or negative), the job exits immediately—so you never reach the confusing “expired token during deploy” state.
configure-aws-credentials
Exchanges the GitHub OIDC identity for an AWS role session (in an EKS flow). This is the “one-time exchange” concept.
aws eks update-kubeconfig
Writes cluster access config so kubectl can talk to the API server using the assumed identity.
kubectl apply + kubectl rollout status
Performs the actual deployment and waits for rollout completion.

A tiny TTL unit test that saved me later

I also added a little script to make TTL validation consistent across repos. It’s simple, but it prevents copy/paste mistakes.

Create scripts/validate-ttl.py:

import os
import sys

def validate_ttl(ttl_str: str) -> int:
    if not ttl_str.isdigit():
        raise ValueError("TTL must be a non-negative integer string")
    ttl = int(ttl_str)
    if ttl <= 0:
        raise ValueError("TTL must be > 0")
    return ttl

if __name__ == "__main__":
    ttl_str = os.environ.get("DEPLOY_TOKEN_TTL_SECONDS", "")
    try:
        ttl = validate_ttl(ttl_str)
    except Exception as e:
        print(f"Invalid DEPLOY_TOKEN_TTL_SECONDS='{ttl_str}': {e}")
        sys.exit(1)
    print(f"TTL OK: {ttl} seconds")

Then call it in the workflow instead of the bash check:

      - name: Validate TTL before any Kubernetes auth (python)
        shell: bash
        run: |
          set -euo pipefail
          python3 scripts/validate-ttl.py

What I learned building this

The biggest surprise was that “token refresh” bugs can hide inside configuration values like ttl=0. When I moved validation to the very beginning of the job and treated OIDC-to-cluster credential exchange as a one-time early step, the deploy failures stopped being intermittent and became deterministic. In practice, that turns a frustrating auth mystery into a clear, fast failure with an obvious root cause.