Skip to main content

Working with GKE Autopilot

caution

This workflow is currently in preview status. Please provide feedback in our Slack community.

GKE Autopilot is Google's fully managed mode for GKE. Autopilot blocks privileged workloads by default through its Warden admission controller. The Speedscale eBPF capture agent (nettap) needs Linux capabilities, host namespace access, and a few hostPath mounts that Warden rejects unless the workload is explicitly allowed.

Capture Options on Autopilot

Autopilot's restrictions affect how Speedscale captures traffic. There are two supported approaches — pick one:

  • eBPF capture (the main path on this page). A privileged nettap DaemonSet captures traffic for whole namespaces. It needs a customer-owned WorkloadAllowlist to admit the privileged pods, which requires a one-time Google Cloud eligibility request (about one week of lead time). Best for broad, low-touch capture across many workloads. This path is needed until the Speedscale Autopilot partner allowlist is published globally.
  • Sidecar capture (dual proxy mode). A per-workload sidecar proxy — no privileged DaemonSet and no WorkloadAllowlist, so no eligibility request. Autopilot forbids the sidecar's transparent proxy, so you run it in dual proxy mode and point your app's outbound traffic at it. Best when you only need a few workloads and want to skip the eligibility step.

The rest of this page walks through the eBPF path. Jump to Sidecar Capture for the sidecar alternative.

Prerequisites

  • A GCP project with billing enabled.
  • A GCP Organization with a paid Customer Care tier (Standard or higher). Google requires an Organization resource to purchase any paid support tier, and you need a support case to request WorkloadAllowlist eligibility. A standalone billing account with no Organization (for example a personal account) cannot buy support and cannot complete Step 1.
  • Permission to set project org policies.
  • gcloud, kubectl, and helm.
  • A Speedscale API key.
  • Two files from Speedscale support, matched to a specific operator chart version:
    • workload-allowlist.yaml
    • allowlist-synchronizer.yaml

Contact Speedscale support to get the allowlist files. The workload-allowlist.yaml pins exact container image SHA-256 digests, so it is tied to one operator chart version. Install that exact chart version (see Step 6).

Set these variables for the commands below:

export PROJECT_ID="YOUR_PROJECT_ID"
export REGION="us-central1"
export CLUSTER_NAME="YOUR_CLUSTER_NAME"
export BUCKET_NAME="${PROJECT_ID}-speedscale-autopilot-allowlists"
export SPEEDSCALE_API_KEY="YOUR_SPEEDSCALE_API_KEY"
export CHART_VERSION="VERSION_MATCHED_TO_YOUR_ALLOWLIST"

Enable the required APIs:

gcloud services enable \
container.googleapis.com \
storage.googleapis.com \
cloudresourcemanager.googleapis.com \
orgpolicy.googleapis.com \
serviceusage.googleapis.com \
--project "${PROJECT_ID}"

1. Request WorkloadAllowlist Eligibility

Create a Google Cloud support case.

  • Category: GKE / Autopilot
  • Priority: P3
  • Subject: Grant WorkloadAllowlist eligibility for project PROJECT_ID

Description:

We need to run Speedscale's eBPF-based traffic capture agent, nettap, as a
privileged DaemonSet on GKE Autopilot. The agent requires capabilities BPF,
PERFMON, NET_ADMIN, SYS_ADMIN, SYS_PTRACE, and SYS_RESOURCE, plus hostNetwork,
hostPID, and hostPath mounts for /proc, /sys, and /var/run/netns.

We are requesting eligibility to use customer-owned WorkloadAllowlists.

Speedscale is a GCP partner currently in the Autopilot Partner Allowlist review
process. This customer-owned path is needed until the partner allowlist is
published globally.

Project ID: PROJECT_ID
Cluster: CLUSTER_NAME
Region: REGION

Stop here until Google confirms the project is eligible for customer-owned WorkloadAllowlists. This usually takes 3 to 7 business days.

2. Upload the Speedscale Allowlist

After Google approves the project, create a bucket and upload the allowlist:

gcloud storage buckets create "gs://${BUCKET_NAME}" \
--project "${PROJECT_ID}" \
--location "${REGION}"

gcloud storage cp workload-allowlist.yaml \
"gs://${BUCKET_NAME}/nettap/workload-allowlist.yaml"

Grant the GKE service agent read access:

export PROJECT_NUMBER="$(gcloud projects describe "${PROJECT_ID}" --format='value(projectNumber)')"
export GKE_SERVICE_AGENT="service-${PROJECT_NUMBER}@container-engine-robot.iam.gserviceaccount.com"

gcloud storage buckets add-iam-policy-binding "gs://${BUCKET_NAME}" \
--member="serviceAccount:${GKE_SERVICE_AGENT}" \
--role="roles/storage.objectViewer"

gcloud storage buckets add-iam-policy-binding "gs://${BUCKET_NAME}" \
--member="serviceAccount:${GKE_SERVICE_AGENT}" \
--role="roles/storage.bucketViewer"

The service account must use container-engine-robot.iam.gserviceaccount.com.

3. Allow the Bucket Path

Set the container.managed.autopilotPrivilegedAdmission org policy at the project level. Keep allowAnyGKEPath: true to preserve the default GKE-approved allowlists while adding the Speedscale bucket path.

cat > /tmp/speedscale-autopilot-policy.yaml <<EOF
name: projects/${PROJECT_ID}/policies/container.managed.autopilotPrivilegedAdmission
spec:
rules:
- enforce: true
parameters:
allowAnyGKEPath: true
allowPaths:
- gs://${BUCKET_NAME}/*
EOF

gcloud org-policies set-policy /tmp/speedscale-autopilot-policy.yaml \
--update-mask=spec

Wait about 15 minutes before creating or updating the cluster.

4. Create or Update the Autopilot Cluster

For a new cluster:

gcloud container clusters create-auto "${CLUSTER_NAME}" \
--project "${PROJECT_ID}" \
--region "${REGION}" \
--release-channel rapid \
--autopilot-privileged-admission="gke://*,gs://${BUCKET_NAME}/*"

For an existing cluster:

gcloud container clusters update "${CLUSTER_NAME}" \
--project "${PROJECT_ID}" \
--region "${REGION}" \
--autopilot-privileged-admission="gke://*,gs://${BUCKET_NAME}/*"

The gke://* entry keeps the default GKE-approved allowlist source enabled. The gs://${BUCKET_NAME}/* entry adds your Speedscale allowlist source.

Connect kubectl:

gcloud container clusters get-credentials "${CLUSTER_NAME}" \
--project "${PROJECT_ID}" \
--region "${REGION}"

5. Install the AllowlistSynchronizer

Create a cluster-specific copy of the synchronizer file and apply it:

sed \
-e "s/YOUR_PROJECT_NUMBER/${PROJECT_NUMBER}/g" \
-e "s/YOUR_BUCKET_NAME/${BUCKET_NAME}/g" \
allowlist-synchronizer.yaml > /tmp/speedscale-allowlist-synchronizer.yaml

kubectl apply -f /tmp/speedscale-allowlist-synchronizer.yaml
kubectl wait --for=condition=Ready allowlistsynchronizer/speedscale-nettap-sync --timeout=60s

Verify the synchronizer and allowlist were created:

kubectl get allowlistsynchronizer speedscale-nettap-sync -o yaml
kubectl get workloadallowlist speedscale-nettap -o yaml

If the synchronizer is not Ready, inspect status.conditions[*].message and status.managedAllowlistStatus[*].lastError.

6. Install Speedscale With eBPF

Add the Speedscale Helm repo and install the operator with eBPF enabled.

Pin the chart version

Install the chart version that matches your workload-allowlist.yaml. The allowlist pins exact image SHA-256 digests; a different chart version pulls different nettap/goproxy images whose digests will not match, and Warden will reject the nettap pods. To move to a newer version, get a re-pinned workload-allowlist.yaml from Speedscale first.

helm repo add speedscale https://speedscale.github.io/operator-helm/
helm repo update

helm upgrade --install speedscale-operator speedscale/speedscale-operator \
--version "${CHART_VERSION}" \
--namespace speedscale --create-namespace \
--set apiKey="${SPEEDSCALE_API_KEY}" \
--set clusterName="${CLUSTER_NAME}" \
--set image.registry=gcr.io/speedscale \
--set ebpf.enabled=true \
--set 'sidecar.resources.limits.cpu=500m' \
--set 'sidecar.resources.limits.memory=512Mi' \
--set 'sidecar.resources.limits.ephemeral-storage=100Mi' \
--set 'sidecar.resources.requests.cpu=500m' \
--set 'sidecar.resources.requests.memory=512Mi' \
--set 'sidecar.resources.requests.ephemeral-storage=100Mi'

Autopilot requires that resource requests and limits match, and that every container declares ephemeral-storage. The ephemeral-storage values are required for Speedscale replay init containers to pass Warden admission.

Verify the install:

kubectl get pods -n speedscale
kubectl get pods -n speedscale -l app=speedscale-nettap

The nettap pod should show 2/2 Running on each node.

7. Configure a Capture Target

Set the namespace and app label for the workload you want to capture, then add it as an eBPF capture target:

export APP_NAMESPACE="YOUR_APP_NAMESPACE"
export APP_NAME="YOUR_APP_LABEL"

helm upgrade speedscale-operator speedscale/speedscale-operator \
--version "${CHART_VERSION}" \
--namespace speedscale --reuse-values \
--set "ebpf.configuration.capture.targets[0].name=${APP_NAME}" \
--set "ebpf.configuration.capture.targets[0].namespaceSelector.matchLabels.kubernetes\\.io/metadata\\.name=${APP_NAMESPACE}" \
--set "ebpf.configuration.capture.targets[0].podSelector.matchLabels.app=${APP_NAME}"

--reuse-values preserves your install values but not the chart version, so pin --version again here. Generate traffic against the workload, then confirm it appears in Speedscale.

Updating Speedscale Versions

The WorkloadAllowlist pins container image digests, so it must be updated in lockstep with the chart. When Speedscale provides an updated workload-allowlist.yaml (and its matching chart version), upload it to the same bucket path:

gcloud storage cp workload-allowlist.yaml \
"gs://${BUCKET_NAME}/nettap/workload-allowlist.yaml"

Force a sync:

kubectl annotate allowlistsynchronizer speedscale-nettap-sync \
force-sync="$(date +%s)" --overwrite

Then upgrade the chart to the matching --version.

Java Agent Notes

For workloads that make outbound HTTPS calls, the Speedscale Java Agent instruments SSLSocketImpl and SSLEngineImpl to decrypt TLS traffic.

On Autopilot, the operator's Java Agent init container can be rejected if the injected container does not declare explicit ephemeral-storage resources. Speedscale support can provide a workload-specific patch when outbound HTTPS capture is required. If you manually patch a deployment for the Java Agent, re-apply the patch after each replay, since replay cleanup restores the target deployment to its pre-replay state.

Troubleshooting

SymptomCauseFix
Pods rejected by autopilot-allowlist-synchronizer-limitationGoogle has not enabled customer-owned WorkloadAllowlists on the projectWait for support approval (Step 1), then retry
Cluster create/update fails with Cluster is not authorized to use custom allowlist pathsProject eligibility missing, or the org policy has not propagatedConfirm support approval, verify the org policy, wait 15 minutes, retry
AllowlistSynchronizer not ReadyBucket IAM, path mismatch, invalid YAML, or incompatible GKE versionConfirm container-engine-robot has objectViewer and bucketViewer, then inspect synchronizer status
Nettap pods rejected by WardenResource requests and limits do not matchReinstall with matching requests and limits
Replay init containers rejected by WardenMissing ephemeral-storage valuesReinstall with the sidecar.resources.*.ephemeral-storage=100Mi values
Nettap image digest mismatch after a chart upgradeInstalled chart version does not match the allowlistInstall the chart --version that matches your workload-allowlist.yaml, or upload the updated allowlist from Speedscale
Forwarder crashes with FATAL: failed to get filter rulefilterRule=none in the configmapPatch it: kubectl patch cm speedscale-forwarder -n speedscale --type merge -p '{"data":{"SPEEDSCALE_FILTER_RULE":"standard"}}'

Sidecar Capture (Dual Proxy Mode)

If you do not want to run the privileged eBPF DaemonSet (or want to avoid the WorkloadAllowlist eligibility request), you can capture per workload with the Speedscale sidecar instead. Autopilot does not allow the sidecar to make the networking changes a transparent proxy needs, so the sidecar must run in dual proxy mode, and your application must send its own outbound traffic to the sidecar's forward proxy.

transparent proxy mode is not supported on Autopilot. See Proxy Modes for how dual mode works.

Operator values

Autopilot also blocks the sidecar's smart reverse DNS (it needs NET_ADMIN), and it enforces that every container's resource requests equal its limits, including ephemeral-storage. Set these operator values when you install the operator:

disableSidecarSmartReverseDNS: true

sidecar:
resources:
limits:
cpu: 500m
memory: 512Mi
ephemeral-storage: 100Mi
requests:
cpu: 500m
memory: 512Mi
ephemeral-storage: 100Mi

See the Helm reference for disableSidecarSmartReverseDNS.

Workload annotations

Inject the sidecar in dual mode with matching request/limit annotations:

sidecar.speedscale.com/inject: "true"
sidecar.speedscale.com/proxy-type: "dual"
sidecar.speedscale.com/proxy-protocol: "tcp:http"
sidecar.speedscale.com/proxy-port: "8080"
sidecar.speedscale.com/cpu-request: 500m
sidecar.speedscale.com/cpu-limit: 500m
sidecar.speedscale.com/memory-request: 1Gi
sidecar.speedscale.com/memory-limit: 1Gi
sidecar.speedscale.com/ephemeral-storage-request: 100Mi
sidecar.speedscale.com/ephemeral-storage-limit: 100Mi

Configure the application

Dual mode does not reconfigure your runtime — the app must route outbound traffic to the sidecar's forward proxy on 127.0.0.1:4140 (unless you changed proxy-out-port):

  • Java: add -Dhttp.proxyHost, -Dhttp.proxyPort, -Dhttps.proxyHost, and -Dhttps.proxyPort via JAVA_TOOL_OPTIONS. If you enable tls-out, add the truststore flags from the Java reference.
  • Runtimes that honor proxy env vars: set HTTP_PROXY and HTTPS_PROXY to http://127.0.0.1:4140.
  • Clients with custom proxy behavior: configure the library directly so outbound traffic actually uses the sidecar.

See TLS Support for outbound TLS decryption details.

Getting Help

If you are experiencing issues with this guide and have further questions, please reach out to us on the community Slack.