fix(bootstrap-kit): remove bp-hcloud-csi slot 17a — chicken-and-egg with harbor (Wave 7 critical-path hotfix) (#1610)

* fix(bootstrap-kit): remove bp-hcloud-csi slot 17a — chicken-and-egg with harbor

Family G (PR #1601) added bp-hcloud-csi at bootstrap-kit slot 17a to ship
the `hcloud-volumes` default StorageClass for C9-006. Caught live on t11
fresh prov 2026-05-17:

  - Flux source-controller chart pull went through harbor.t11.<sov>
    OCI endpoint BEFORE harbor itself was reachable on the network.
  - Chicken-and-egg: harbor depends on Gateway. Gateway lives in
    `sovereign-tls` Kustomization which dependsOn bootstrap-kit Ready.
    bp-hcloud-csi blocked bootstrap-kit Ready → sovereign-tls never
    applied → no Gateway CR → console.t11.<sov> ERR_CONNECTION_CLOSED.
  - Entire UI test matrix on t11 was BLOCKED on the missing Gateway
    (5 test agents reported the same root cause).

C9-006 (hcloud-volumes default SC) is a cosmetic operator-facing
improvement; Gateway availability is launch-critical. Removing slot 17a
unblocks the chain. Follow-up PR will re-add at a later slot (e.g., 19a
AFTER bp-harbor 19) OR fix the pull path to bypass the registry pivot
during bootstrap.

Also bumps chart 1.4.155 → 1.4.156 + bootstrap-kit pin per the
chart-bump-needs-both rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bootstrap-kit): also drop 17a-bp-hcloud-csi from kustomization.yaml resources list

Companion commit to b96d8c50 — the prior commit only removed the file
itself; this commit removes the resources: list entry that referenced
it (otherwise Kustomize fails the dry-run with 'no such file').

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-18 00:34:40 +04:00 committed by GitHub
parent 4a4ffa34ab
commit 5e57dfb565
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 14 additions and 141 deletions

View File

@ -1,132 +0,0 @@
# bp-hcloud-csi — Catalyst bootstrap-kit Blueprint #17a
# (Tier 3.5 — Storage and Data). Pairs with bp-hcloud-ccm (slot 55)
# and bp-cluster-autoscaler-hcloud (slot 50) — the full Hetzner-cloud-
# direct trio.
#
# Wires the Hetzner Cloud CSI driver into the cluster so the canonical
# `hcloud-volumes` StorageClass exists (and is the default StorageClass).
# Without this:
# - PVCs default to `local-path` (rancher.io/local-path), node-pinned
# emptyDir-style hostPath volumes that cannot survive a Pod
# rescheduled to a different node. Multi-node stateful workloads
# (CNPG primary/replica, Harbor blob backend on Hetzner-direct,
# Velero PVC backups) require a CSI-managed networked volume.
# - Operator-facing UI shows `provisioner=rancher.io/local-path` for
# every PVC, breaking the docs/SOVEREIGN-MULTI-REGION-DOD.md C9
# gate which expects `hcloud-volumes default=true`.
#
# 2026-05-17 t143 (C9-006): added to the bootstrap-kit Kustomization
# (clusters/_template/bootstrap-kit/kustomization.yaml). Previously the
# chart existed at platform/hcloud-csi but was not wired into any
# bootstrap-kit slot, so fresh Sovereigns shipped without
# hcloud-volumes despite the Hetzner CSI driver being available in
# the catalog. The Blueprint Release pipeline auto-builds
# bp-hcloud-csi:1.1.0 from the same push that ships this slot.
#
# Wrapper chart: platform/hcloud-csi/chart/ — umbrella over upstream
# hetznercloud/csi-driver chart 2.13.0 (appVersion 2.13.0). Catalyst
# overlay templates render: (a) the `hcloud-volumes` StorageClass with
# the `storageclass.kubernetes.io/is-default-class=true` annotation
# (when defaultStorageClass=true, default in this slot), and
# (b) a chart-local `hcloud-csi-token` Secret rendered from
# `.Values.hetznerToken` via the same valuesFrom seam bp-hcloud-ccm
# uses.
#
# Reconciled by: Flux on the new Sovereign's k3s control plane.
#
# Hetzner-token wiring (mirrors bp-hcloud-ccm at slot 55 +
# bp-cluster-autoscaler-hcloud at slot 50):
# - cloud-init writes `flux-system/cloud-credentials` Secret with the
# `hcloud-token` key (see infra/hetzner/cloudinit-control-plane.tftpl
# §"cloud-credentials-secret").
# - This HelmRelease lifts the `hcloud-token` value into the umbrella
# chart's `hetznerToken` value via Flux `valuesFrom`. The umbrella
# chart's templates/hcloud-token-secret.yaml synthesises the
# namespace-local `hcloud-csi/hcloud-csi-token` Secret the upstream
# subchart's `controller.hcloudToken.existingSecret` binding
# resolves at controller startup.
#
# dependsOn: (none) — hcloud-csi is independent of every other
# bootstrap-kit blueprint at install time. The cloud-credentials Secret
# is provisioned by cloud-init BEFORE Flux installs anything.
#
# Placement in the bootstrap-kit ordering (slot 17a):
# - AFTER 17-valkey (no dependency, just sequencing)
# - BEFORE 18-seaweedfs / 19-harbor — both will consume hcloud-volumes
# for their PVCs on Hetzner-direct Sovereigns once flipped via the
# per-Sovereign overlay (today they still default to local-path so
# this ordering is forward-looking, not strict). The 17a suffix
# mirrors the established 01a/05a/06a convention for inserting
# new slots without renumbering the whole bootstrap kit.
---
apiVersion: v1
kind: Namespace
metadata:
name: hcloud-csi
labels:
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: bp-hcloud-csi
namespace: flux-system
spec:
type: oci
interval: 15m
url: oci://ghcr.io/openova-io
secretRef:
name: ghcr-pull
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: bp-hcloud-csi
namespace: flux-system
labels:
catalyst.openova.io/slot: "17a"
spec:
interval: 15m
releaseName: hcloud-csi
targetNamespace: hcloud-csi
chart:
spec:
chart: bp-hcloud-csi
version: 1.1.0
sourceRef:
kind: HelmRepository
name: bp-hcloud-csi
namespace: flux-system
# Event-driven install: hcloud-csi controller + node DaemonSet are
# standard CSI workloads — Helm install completes when manifests
# apply. The driver's Hetzner-API connectivity check is a runtime
# concern, not a Helm-wait concern. disableWait keeps Flux's Ready
# signal aligned with manifest apply (matches the bp-hcloud-ccm
# pattern at slot 55).
install:
timeout: 15m
disableWait: true
remediation:
retries: 3
upgrade:
timeout: 15m
disableWait: true
remediation:
retries: 3
# ── Hetzner-token wiring ─────────────────────────────────────────────
# Pulls the `hcloud-token` key from the canonical
# `flux-system/cloud-credentials` Secret cloud-init writes at Phase 0.
# Flux dereferences `valuesFrom` at HelmRelease apply time, so the
# plaintext payload never appears in this committed manifest.
valuesFrom:
- kind: Secret
name: cloud-credentials
valuesKey: hcloud-token
targetPath: hetznerToken
# Enable the chart + flip hcloud-volumes to the cluster default.
# On a fresh Sovereign there are no pre-existing PVCs bound to
# `local-path`, so flipping the default at install time is safe.
values:
enabled: true
defaultStorageClass: true

View File

@ -24,15 +24,20 @@ resources:
- 15a-external-secrets-stores.yaml
- 16-cnpg.yaml
- 17-valkey.yaml
# bp-hcloud-csi (slot 17a) — Hetzner Cloud CSI driver + the canonical
# `hcloud-volumes` StorageClass (annotated as default). Pairs with
# bp-hcloud-ccm (slot 55) + bp-cluster-autoscaler-hcloud (slot 50) as
# the Hetzner-cloud-direct trio. Without this slot, fresh Sovereigns
# default PVCs to `local-path` (rancher.io/local-path) which is
# node-pinned and cannot survive a Pod rescheduled to a different
# node — breaks docs/SOVEREIGN-MULTI-REGION-DOD.md C9 (operator
# expects `hcloud-volumes default=true`). Caught on t10 2026-05-17.
- 17a-bp-hcloud-csi.yaml
# bp-hcloud-csi (formerly slot 17a) REMOVED 2026-05-17 (Wave 7):
# the Flux source-controller chart pull went through harbor.t11.* OCI
# endpoint BEFORE harbor itself was reachable (chicken-and-egg —
# harbor depends on Gateway, Gateway lives in sovereign-tls which
# dependsOn bootstrap-kit Ready, which never went Ready because
# bp-hcloud-csi was stuck on harbor pull). Caught live on t11 fresh
# prov 2026-05-17: bootstrap-kit Reconciliation-in-progress for 30+
# min → sovereign-tls "not ready: dependency bootstrap-kit not ready"
# → no Gateway CR → console.t11.<sov> ERR_CONNECTION_CLOSED →
# entire UI test matrix BLOCKED. C9-006 (hcloud-volumes default SC)
# is a cosmetic operator-facing nice-to-have; Gateway availability
# is launch-critical. Removing this slot unblocks the chain. Follow-
# up PR will re-add at a later slot (e.g., 19a, AFTER bp-harbor 19)
# OR fix the pull path to bypass the registry pivot during bootstrap.
- 18-seaweedfs.yaml
- 19-harbor.yaml
# 06a — Post-handover Self-Sovereignty Cutover (issue #791). Filename