Getting Started

Install CastSlice into your Kubernetes cluster and start sharing GPUs across your AI workloads in under five minutes.

Prerequisites

Before installing CastSlice, ensure you have the following:

Kubernetes ≥ 1.25

Any CNCF-conformant cluster — EKS, GKE, AKS, or on-prem.

cert-manager

Required for automatic TLS certificate injection into the webhook.

Install guide →

NVIDIA GPU Operator

Or equivalent device plugin exposing nvidia.com/gpu resources.

kubectl

Cluster admin access to apply manifests and inspect Pods.

ℹ️

Don't have a GPU yet? You can still verify the webhook mutation logic using the Local Testing (No GPU) guide.

Install CastSlice

Install cert-manager (if not already installed)

CastSlice uses cert-manager to inject TLS certs into the Mutating Webhook configuration automatically.

bash
# Install cert-manager v1
$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

# Wait for it to be ready
$ kubectl rollout status deployment/cert-manager -n cert-manager
deployment "cert-manager" successfully rolled out

Apply the CastSlice install manifest

The single-file install.yaml includes the Namespace, Deployment, Service, Certificate (cert-manager), and MutatingWebhookConfiguration.

bash
# Install latest release
$ kubectl apply -f https://github.com/castops/cast-slice/releases/latest/download/install.yaml

namespace/cast-slice created
serviceaccount/cast-slice created
deployment.apps/cast-slice created
service/cast-slice created
certificate.cert-manager.io/cast-slice-cert created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cast-slice created

Verify the webhook pod is running

bash
$ kubectl get pods -n cast-slice
NAME                          READY   STATUS    RESTARTS   AGE
cast-slice-7d8f9b5c4-xk2n9   1/1     Running   0          42s

$ kubectl rollout status deployment/cast-slice -n cast-slice
deployment "cast-slice" successfully rolled out

✅

1/1 Running means the cert-manager certificate was issued, TLS was injected into the webhook, and the readiness probe at /readyz passed.

Enable GPU Slicing on a Workload

Add the opt-in annotation to your Pod or Deployment

Add castops.io/optimize: "true" to the Pod's metadata.annotations. Optionally set castops.io/workload-type to control the slice count. For Deployments, add annotations to the Pod template (spec.template.metadata.annotations), not the Deployment's own metadata.

deployment.yaml — inference workload (slice ratio: 2)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-ai-workload
spec:
  template:
    metadata:
      annotations:
        castops.io/optimize: "true"         # ← required: opt-in
        castops.io/workload-type: "inference" # ← optional: training/inference/batch/dev
    spec:
      containers:
      - name: inference
        image: ollama/ollama
        resources:
          limits:
            nvidia.com/gpu: 1   # CastSlice rewrites this → gpu-shared: 2

Or use an explicit ratio for fine-grained control:

explicit ratio override
annotations:
  castops.io/optimize:    "true"
  castops.io/slice-ratio: "8"   # explicit override → gpu-shared: 8

Verify the mutation happened

After the Pod is created, inspect its actual resource spec to confirm the rewrite:

bash
$ kubectl get pod my-ai-workload-xxx -o jsonpath='{.spec.containers[0].resources}' | jq .
{
  "limits": {
    "nvidia.com/gpu-shared": "2"   ← inference workload (ratio 2); dev=1, training=4
  }
}

✓ nvidia.com/gpu was rewritten to nvidia.com/gpu-shared with the correct ratio — GPU slicing is active.

Uninstall

To remove CastSlice from your cluster:

bash
$ kubectl delete -f https://github.com/castops/cast-slice/releases/latest/download/install.yaml

⚠️

This removes the MutatingWebhookConfiguration, so new Pods will no longer be mutated. Existing running Pods are unaffected since admission happens at creation time.