快速开始

将 CastSlice 安装到 Kubernetes 集群，五分钟内开始在 AI 工作负载间共享 GPU。

前置条件

安装 CastSlice 前，请确保以下组件已就绪：

Kubernetes ≥ 1.25

任何 CNCF 兼容集群——EKS、GKE、AKS 或本地环境。

cert-manager

用于自动将 TLS 证书注入 Webhook 配置。

安装指南 →

NVIDIA GPU Operator

或能够暴露 nvidia.com/gpu 资源的同类设备插件。

kubectl

需要集群管理员权限，用于应用资源清单并检查 Pod 状态。

ℹ️

还没有 GPU？可以使用本地测试（无 GPU）指南验证 Webhook 变更逻辑。

安装 CastSlice

安装 cert-manager（若尚未安装）

CastSlice 使用 cert-manager 自动将 TLS 证书注入 Mutating Webhook 配置。

bash
# Install cert-manager v1
$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

# Wait for it to be ready
$ kubectl rollout status deployment/cert-manager -n cert-manager
deployment "cert-manager" successfully rolled out

应用 CastSlice 安装清单

单文件 install.yaml 包含 Namespace、Deployment、Service、Certificate（cert-manager）和 MutatingWebhookConfiguration。

bash
# Install latest release
$ kubectl apply -f https://github.com/castops/cast-slice/releases/latest/download/install.yaml

namespace/cast-slice created
serviceaccount/cast-slice created
deployment.apps/cast-slice created
service/cast-slice created
certificate.cert-manager.io/cast-slice-cert created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cast-slice created

验证 Webhook Pod 已正常运行

bash
$ kubectl get pods -n cast-slice
NAME                          READY   STATUS    RESTARTS   AGE
cast-slice-7d8f9b5c4-xk2n9   1/1     Running   0          42s

$ kubectl rollout status deployment/cast-slice -n cast-slice
deployment "cast-slice" successfully rolled out

✅

1/1 Running 表示 cert-manager 证书已签发、TLS 已注入 Webhook，且 /readyz 就绪探针已通过。

为工作负载启用 GPU 切片

为 Pod 或 Deployment 添加选择性注解

在 Pod 的 metadata.annotations 中添加 castops.io/optimize: "true"，并可选地通过 castops.io/workload-type 控制切片数量。对于 Deployment，需添加在 Pod 模板（spec.template.metadata.annotations）上，而非 Deployment 自身的 metadata。

deployment.yaml — 推理工作负载（切片比例：2）
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-ai-workload
spec:
  template:
    metadata:
      annotations:
        castops.io/optimize: "true"         # ← 必填：启用优化
        castops.io/workload-type: "inference" # ← 可选：training/inference/batch/dev
    spec:
      containers:
      - name: inference
        image: ollama/ollama
        resources:
          limits:
            nvidia.com/gpu: 1   # CastSlice 将改写为 gpu-shared: 2

或使用显式 ratio 进行精细控制：

显式 ratio 覆盖
annotations:
  castops.io/optimize:    "true"
  castops.io/slice-ratio: "8"   # 显式覆盖 → gpu-shared: 8

验证变更已生效

Pod 创建后，检查其实际资源规格，确认改写已完成：

bash
$ kubectl get pod my-ai-workload-xxx -o jsonpath='{.spec.containers[0].resources}' | jq .
{
  "limits": {
    "nvidia.com/gpu-shared": "2"   ← 推理工作负载（ratio 2）；dev=1，training=4
  }
}

✓ nvidia.com/gpu 已按正确比例改写为 nvidia.com/gpu-shared——GPU 切片已激活。

卸载

从集群中卸载 CastSlice：

bash
$ kubectl delete -f https://github.com/castops/cast-slice/releases/latest/download/install.yaml

⚠️

此操作将删除 MutatingWebhookConfiguration，之后新建的 Pod 将不再被变更。已运行的 Pod 不受影响，因为准入检查发生在创建时。