Getting Started

Install CastSlice into your Kubernetes cluster and start sharing GPUs across your AI workloads in under five minutes.

Prerequisites

Before installing CastSlice, ensure you have the following:

Kubernetes β‰₯ 1.25

Any CNCF-conformant cluster β€” EKS, GKE, AKS, or on-prem.

cert-manager

Required for automatic TLS certificate injection into the webhook.

NVIDIA GPU Operator

Or equivalent device plugin exposing nvidia.com/gpu resources.

kubectl

Cluster admin access to apply manifests and inspect Pods.

ℹ️

Don't have a GPU yet? You can still verify the webhook mutation logic using the Local Testing (No GPU) guide.

Install CastSlice

1
Install cert-manager (if not already installed)

CastSlice uses cert-manager to inject TLS certs into the Mutating Webhook configuration automatically.

bash
# Install cert-manager v1 $ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml # Wait for it to be ready $ kubectl rollout status deployment/cert-manager -n cert-manager deployment "cert-manager" successfully rolled out
2
Apply the CastSlice install manifest

The single-file install.yaml includes the Namespace, Deployment, Service, Certificate (cert-manager), and MutatingWebhookConfiguration.

bash
# Install latest release $ kubectl apply -f https://github.com/castops/cast-slice/releases/latest/download/install.yaml namespace/cast-slice created serviceaccount/cast-slice created deployment.apps/cast-slice created service/cast-slice created certificate.cert-manager.io/cast-slice-cert created mutatingwebhookconfiguration.admissionregistration.k8s.io/cast-slice created
3
Verify the webhook pod is running
bash
$ kubectl get pods -n cast-slice NAME READY STATUS RESTARTS AGE cast-slice-7d8f9b5c4-xk2n9 1/1 Running 0 42s $ kubectl rollout status deployment/cast-slice -n cast-slice deployment "cast-slice" successfully rolled out
βœ…

1/1 Running means the cert-manager certificate was issued, TLS was injected into the webhook, and the readiness probe at /readyz passed.

Enable GPU Slicing on a Workload

4
Add the opt-in annotation to your Pod or Deployment

Add castops.io/optimize: "true" to the Pod's metadata.annotations. Optionally set castops.io/workload-type to control the slice count. For Deployments, add annotations to the Pod template (spec.template.metadata.annotations), not the Deployment's own metadata.

deployment.yaml β€” inference workload (slice ratio: 2)
apiVersion: apps/v1 kind: Deployment metadata: name: my-ai-workload spec: template: metadata: annotations: castops.io/optimize: "true" # ← required: opt-in castops.io/workload-type: "inference" # ← optional: training/inference/batch/dev spec: containers: - name: inference image: ollama/ollama resources: limits: nvidia.com/gpu: 1 # CastSlice rewrites this β†’ gpu-shared: 2

Or use an explicit ratio for fine-grained control:

explicit ratio override
annotations: castops.io/optimize: "true" castops.io/slice-ratio: "8" # explicit override β†’ gpu-shared: 8
5
Verify the mutation happened

After the Pod is created, inspect its actual resource spec to confirm the rewrite:

bash
$ kubectl get pod my-ai-workload-xxx -o jsonpath='{.spec.containers[0].resources}' | jq . { "limits": { "nvidia.com/gpu-shared": "2" ← inference workload (ratio 2); dev=1, training=4 } }
βœ“ nvidia.com/gpu was rewritten to nvidia.com/gpu-shared with the correct ratio β€” GPU slicing is active.

Uninstall

To remove CastSlice from your cluster:

bash
$ kubectl delete -f https://github.com/castops/cast-slice/releases/latest/download/install.yaml
⚠️

This removes the MutatingWebhookConfiguration, so new Pods will no longer be mutated. Existing running Pods are unaffected since admission happens at creation time.