CastSlice is a zero-touch Kubernetes Mutating Webhook that automatically converts full GPU requests into shared GPU slices — no application changes required.
CastSlice sits in your Kubernetes Control Plane and intercepts Pod creation requests transparently.
Install with a single kubectl command. cert-manager injects TLS automatically.
Add one annotation to any Pod or Deployment. No code changes. No restarts.
CastSlice intercepts the CREATE request and rewrites nvidia.com/gpu to a shared resource — before the Pod is scheduled.
Everything you need to start sharing GPUs across your AI workloads today.
The webhook intercepts Pod CREATE requests and rewrites resource specs on-the-fly. Your application never knows it was changed.
Only Pods with castops.io/optimize: "true" are mutated. Everything else passes through unchanged.
Works on any CNCF-conformant Kubernetes cluster — EKS, GKE, AKS, or on-prem bare metal.
Leverages cert-manager for automatic TLS certificate injection and rotation. No manual cert management.
Exposes /healthz and /readyz endpoints. The Pod only becomes Ready once the webhook server is fully up.
Webhook is configured with failurePolicy: Ignore — if CastSlice is down, Pods still schedule normally.
Add castops.io/optimize: "true" to any Pod or Deployment template.
CastSlice detects it at admission time and rewrites nvidia.com/gpu limits into nvidia.com/gpu-shared — enabling NVIDIA MPS or Time-Slicing to pack multiple Pods onto a single physical card.
No changes to your container image, entrypoint, or business logic. The application is completely unaware.
Read the docs →apiVersion: apps/v1 kind: Deployment metadata: name: ollama-inference spec: template: metadata: annotations: castops.io/optimize: "true" # ← opt-in spec: containers: - name: ollama image: ollama/ollama resources: limits: # CastSlice rewrites this ↓ nvidia.com/gpu: 1
CastSlice is actively developed. Here's what's planned.
Static slicing — rewrites nvidia.com/gpu to gpu-shared on annotated Pods.
Dynamic ratios based on workload type — training, inference, batch, dev — or an explicit slice-ratio override.
Live GPU utilization metrics and a "dollars saved" counter.
PlannedNamespace-level and label-based slicing rules without annotation changes.
PlannedDeploy CastSlice in under five minutes with a single kubectl command.