fix(multi): Family G — 6 singletons (C8-001/C8-005/C9-006/C10-002/C10-003/C7-007) (#1601)
Wave 2 Family G batched ship. C7-004 (sso/wiki/workflows/storybook +
registry/api HTTPRoutes) intentionally skipped — sso/wiki/storybook
have no shipped backend; registry (harbor) + api (catalyst-api) HTTPRoutes
already exist and 404 is a runtime/HR-readiness symptom, not a missing
route. Flagged for architect-led ticket rather than silent route-alias
synthesis.
C9-006 — hcloud-volumes StorageClass missing on fresh prov
Root cause: platform/hcloud-csi/chart/ existed but was never wired
into bootstrap-kit, so fresh Sovereigns defaulted PVCs to local-path
(rancher.io/local-path) — node-pinned, can't survive Pod reschedule.
Fix: new slot 17a-bp-hcloud-csi.yaml + chart 1.0.0→1.1.0 bump that
adds templates/hcloud-token-secret.yaml so the controller can
authenticate to Hetzner. Mirrors bp-hcloud-ccm (slot 55) +
bp-cluster-autoscaler-hcloud (slot 50) wiring.
C10-002 — /fleet/applications returns 0 items despite 21 sovereigns
Root cause: collectFleetSovereigns filtered AdoptedAt!=nil (mirrored
ListDeployments). On a steady-state fleet every Sovereign is adopted,
so the dashboard rendered empty despite hundreds of succeeded jobs.
Fix: remove the adopted-filter from collectFleetSovereigns (the
fleet view's whole purpose is to enumerate every provisioned
Sovereign). ListDeployments still applies the filter — it backs the
provisioner's in-flight tab, a different surface. Adopted rows
surface with Health=green when otherwise unknown.
C10-003 — per-region install-* Jobs stuck "pending" despite ready
Root cause: lastState dedup in helmwatch_bridge — secondary
watchers attaching AFTER an HR already settled at Installed never
observed a state transition, so the seed value (HelmStatePending)
never converged. Fix: at markPhase1Done(OutcomeReady), backfill
every secondary watcher's informer snapshot into the shared
jobs.Bridge via the idempotent SeedJobsFromInformerList path.
Runs INLINE (not goroutine) — runPhase1Watch defers
stopSecondaries() which clears dep.secondaryWatchers as soon as
markPhase1Done returns, so a goroutine would race the cleanup.
C7-007 — legacy sovereign-wildcard-tls Cert+Secret pair orphaned
Root cause: PR O moved the Cilium Gateway listener's
certificateRefs to the dashed-suffix per-zone Secret but left the
legacy bare-name Certificate template behind, so cert-manager
kept renewing an orphan. Fix: (a) rename the Certificate +
Secret to the dashed-suffix shape (single-source-of-truth), and
(b) add a one-shot Job (legacy-cert-cleanup) that deletes the
pre-PR-O Cert+Secret pair via alpine/k8s, idempotent for fresh
provs. Removable from kustomization.yaml once every live prov
has reconciled past it.
C8-001 — D22 Settings em-dash placeholders on chroot Sovereign
Root cause: SettingsPage read Capacity / CP size / Pool subdomain /
BYO domain from useWizardStore() (zustand+persist localStorage).
The chroot Sovereign console runs on a fresh browser session
post-handover with empty localStorage, so the four fields rendered
em-dashes. The data IS persisted on the deployment record
(RedactedRequest) — gap was that Deployment.State() never surfaced
it. Fix: lift controlPlaneSize / sovereignPoolDomain /
sovereignSubdomain / sovereignDomainMode / sovereignByoDomain /
regionControlPlaneSizes / orgName / orgEmail to the State() map +
extend DeploymentSnapshot TS type + SettingsPage reads
snapshot-first with wizard store as fallback (mothership wizard-
in-flight case).
C8-005 — D20 Jobs page missing region filter dropdown
Root cause: multi-region Sovereigns expose install-<region>:<chart>
Jobs but JobsTable offered only status / app / parent filters,
forcing operators to type the region key into the free-text search.
Fix: new regionFromJob(job) pure helper parses the canonical
<region>:<chart> appId (fallback: install-<region>:<chart> jobName).
Dropdown is visible only when 2+ regions appear in the current job
set (single-region Sovereigns see no one-option no-op). Sorted
lexically. Test coverage: 4 helper cases + 3 dropdown cases in
JobsTable.test.tsx.
Architect-first compliance:
• bp-hcloud-csi wiring mirrors bp-hcloud-ccm (slot 55) pattern
• legacy-cert-cleanup uses alpine/k8s (NOT bitnami/kubectl — see
self-sovereign-cutover/values.yaml:252 Bitnami-deprecation note)
• alpine/k8s image pulled via harbor.openova.io/proxy-dockerhub
(mirror-everything rule)
• regionFromJob mirrors helmwatch_bridge.go componentID encoding
(3 input shapes: bare, region-prefixed, install-region-prefixed)
• State() snapshot additions stay slim — only the 4 founder-flagged
fields + a few zero-cost adjacents
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
2d9b2f84bd
commit
aa60cfb84e
132
clusters/_template/bootstrap-kit/17a-bp-hcloud-csi.yaml
Normal file
132
clusters/_template/bootstrap-kit/17a-bp-hcloud-csi.yaml
Normal file
@ -0,0 +1,132 @@
|
||||
# bp-hcloud-csi — Catalyst bootstrap-kit Blueprint #17a
|
||||
# (Tier 3.5 — Storage and Data). Pairs with bp-hcloud-ccm (slot 55)
|
||||
# and bp-cluster-autoscaler-hcloud (slot 50) — the full Hetzner-cloud-
|
||||
# direct trio.
|
||||
#
|
||||
# Wires the Hetzner Cloud CSI driver into the cluster so the canonical
|
||||
# `hcloud-volumes` StorageClass exists (and is the default StorageClass).
|
||||
# Without this:
|
||||
# - PVCs default to `local-path` (rancher.io/local-path), node-pinned
|
||||
# emptyDir-style hostPath volumes that cannot survive a Pod
|
||||
# rescheduled to a different node. Multi-node stateful workloads
|
||||
# (CNPG primary/replica, Harbor blob backend on Hetzner-direct,
|
||||
# Velero PVC backups) require a CSI-managed networked volume.
|
||||
# - Operator-facing UI shows `provisioner=rancher.io/local-path` for
|
||||
# every PVC, breaking the docs/SOVEREIGN-MULTI-REGION-DOD.md C9
|
||||
# gate which expects `hcloud-volumes default=true`.
|
||||
#
|
||||
# 2026-05-17 t143 (C9-006): added to the bootstrap-kit Kustomization
|
||||
# (clusters/_template/bootstrap-kit/kustomization.yaml). Previously the
|
||||
# chart existed at platform/hcloud-csi but was not wired into any
|
||||
# bootstrap-kit slot, so fresh Sovereigns shipped without
|
||||
# hcloud-volumes despite the Hetzner CSI driver being available in
|
||||
# the catalog. The Blueprint Release pipeline auto-builds
|
||||
# bp-hcloud-csi:1.1.0 from the same push that ships this slot.
|
||||
#
|
||||
# Wrapper chart: platform/hcloud-csi/chart/ — umbrella over upstream
|
||||
# hetznercloud/csi-driver chart 2.13.0 (appVersion 2.13.0). Catalyst
|
||||
# overlay templates render: (a) the `hcloud-volumes` StorageClass with
|
||||
# the `storageclass.kubernetes.io/is-default-class=true` annotation
|
||||
# (when defaultStorageClass=true, default in this slot), and
|
||||
# (b) a chart-local `hcloud-csi-token` Secret rendered from
|
||||
# `.Values.hetznerToken` via the same valuesFrom seam bp-hcloud-ccm
|
||||
# uses.
|
||||
#
|
||||
# Reconciled by: Flux on the new Sovereign's k3s control plane.
|
||||
#
|
||||
# Hetzner-token wiring (mirrors bp-hcloud-ccm at slot 55 +
|
||||
# bp-cluster-autoscaler-hcloud at slot 50):
|
||||
# - cloud-init writes `flux-system/cloud-credentials` Secret with the
|
||||
# `hcloud-token` key (see infra/hetzner/cloudinit-control-plane.tftpl
|
||||
# §"cloud-credentials-secret").
|
||||
# - This HelmRelease lifts the `hcloud-token` value into the umbrella
|
||||
# chart's `hetznerToken` value via Flux `valuesFrom`. The umbrella
|
||||
# chart's templates/hcloud-token-secret.yaml synthesises the
|
||||
# namespace-local `hcloud-csi/hcloud-csi-token` Secret the upstream
|
||||
# subchart's `controller.hcloudToken.existingSecret` binding
|
||||
# resolves at controller startup.
|
||||
#
|
||||
# dependsOn: (none) — hcloud-csi is independent of every other
|
||||
# bootstrap-kit blueprint at install time. The cloud-credentials Secret
|
||||
# is provisioned by cloud-init BEFORE Flux installs anything.
|
||||
#
|
||||
# Placement in the bootstrap-kit ordering (slot 17a):
|
||||
# - AFTER 17-valkey (no dependency, just sequencing)
|
||||
# - BEFORE 18-seaweedfs / 19-harbor — both will consume hcloud-volumes
|
||||
# for their PVCs on Hetzner-direct Sovereigns once flipped via the
|
||||
# per-Sovereign overlay (today they still default to local-path so
|
||||
# this ordering is forward-looking, not strict). The 17a suffix
|
||||
# mirrors the established 01a/05a/06a convention for inserting
|
||||
# new slots without renumbering the whole bootstrap kit.
|
||||
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: hcloud-csi
|
||||
labels:
|
||||
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
||||
---
|
||||
apiVersion: source.toolkit.fluxcd.io/v1beta2
|
||||
kind: HelmRepository
|
||||
metadata:
|
||||
name: bp-hcloud-csi
|
||||
namespace: flux-system
|
||||
spec:
|
||||
type: oci
|
||||
interval: 15m
|
||||
url: oci://ghcr.io/openova-io
|
||||
secretRef:
|
||||
name: ghcr-pull
|
||||
---
|
||||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||
kind: HelmRelease
|
||||
metadata:
|
||||
name: bp-hcloud-csi
|
||||
namespace: flux-system
|
||||
labels:
|
||||
catalyst.openova.io/slot: "17a"
|
||||
spec:
|
||||
interval: 15m
|
||||
releaseName: hcloud-csi
|
||||
targetNamespace: hcloud-csi
|
||||
chart:
|
||||
spec:
|
||||
chart: bp-hcloud-csi
|
||||
version: 1.1.0
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-hcloud-csi
|
||||
namespace: flux-system
|
||||
# Event-driven install: hcloud-csi controller + node DaemonSet are
|
||||
# standard CSI workloads — Helm install completes when manifests
|
||||
# apply. The driver's Hetzner-API connectivity check is a runtime
|
||||
# concern, not a Helm-wait concern. disableWait keeps Flux's Ready
|
||||
# signal aligned with manifest apply (matches the bp-hcloud-ccm
|
||||
# pattern at slot 55).
|
||||
install:
|
||||
timeout: 15m
|
||||
disableWait: true
|
||||
remediation:
|
||||
retries: 3
|
||||
upgrade:
|
||||
timeout: 15m
|
||||
disableWait: true
|
||||
remediation:
|
||||
retries: 3
|
||||
# ── Hetzner-token wiring ─────────────────────────────────────────────
|
||||
# Pulls the `hcloud-token` key from the canonical
|
||||
# `flux-system/cloud-credentials` Secret cloud-init writes at Phase 0.
|
||||
# Flux dereferences `valuesFrom` at HelmRelease apply time, so the
|
||||
# plaintext payload never appears in this committed manifest.
|
||||
valuesFrom:
|
||||
- kind: Secret
|
||||
name: cloud-credentials
|
||||
valuesKey: hcloud-token
|
||||
targetPath: hetznerToken
|
||||
# Enable the chart + flip hcloud-volumes to the cluster default.
|
||||
# On a fresh Sovereign there are no pre-existing PVCs bound to
|
||||
# `local-path`, so flipping the default at install time is safe.
|
||||
values:
|
||||
enabled: true
|
||||
defaultStorageClass: true
|
||||
@ -24,6 +24,15 @@ resources:
|
||||
- 15a-external-secrets-stores.yaml
|
||||
- 16-cnpg.yaml
|
||||
- 17-valkey.yaml
|
||||
# bp-hcloud-csi (slot 17a) — Hetzner Cloud CSI driver + the canonical
|
||||
# `hcloud-volumes` StorageClass (annotated as default). Pairs with
|
||||
# bp-hcloud-ccm (slot 55) + bp-cluster-autoscaler-hcloud (slot 50) as
|
||||
# the Hetzner-cloud-direct trio. Without this slot, fresh Sovereigns
|
||||
# default PVCs to `local-path` (rancher.io/local-path) which is
|
||||
# node-pinned and cannot survive a Pod rescheduled to a different
|
||||
# node — breaks docs/SOVEREIGN-MULTI-REGION-DOD.md C9 (operator
|
||||
# expects `hcloud-volumes default=true`). Caught on t10 2026-05-17.
|
||||
- 17a-bp-hcloud-csi.yaml
|
||||
- 18-seaweedfs.yaml
|
||||
- 19-harbor.yaml
|
||||
# 06a — Post-handover Self-Sovereignty Cutover (issue #791). Filename
|
||||
|
||||
@ -91,7 +91,15 @@ metadata:
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["secrets"]
|
||||
resourceNames: ["sovereign-wildcard-tls"]
|
||||
# 2026-05-17 t143 dual-cert collision cleanup: the per-zone Secret
|
||||
# the Cilium Gateway now references is named
|
||||
# `sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}`
|
||||
# (see clusters/_template/sovereign-tls/cilium-gateway.yaml:44 +
|
||||
# clusters/_template/sovereign-tls/cilium-gateway-cert.yaml). The
|
||||
# legacy `sovereign-wildcard-tls` (no dashed suffix) is no longer
|
||||
# produced anywhere — drop it from the resourceNames allowlist so
|
||||
# this Role grants the minimum needed for the live Secret name.
|
||||
resourceNames: ["sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}"]
|
||||
verbs: ["get", "watch", "list"]
|
||||
- apiGroups: ["apps"]
|
||||
resources: ["daemonsets"]
|
||||
@ -209,7 +217,14 @@ spec:
|
||||
set -eu
|
||||
|
||||
SECRET_NS=kube-system
|
||||
SECRET_NAME=sovereign-wildcard-tls
|
||||
# 2026-05-17 t143 dual-cert collision cleanup: the canonical
|
||||
# SDS Secret the Cilium Gateway now references is the
|
||||
# per-zone `sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}`.
|
||||
# Cloud-init substitutes SOVEREIGN_FQDN_DASHED via Flux
|
||||
# postBuild.substitute, so the literal cluster value lands
|
||||
# here at apply time (verified in
|
||||
# infra/hetzner/cloudinit-control-plane.tftpl §SOVEREIGN_FQDN_DASHED).
|
||||
SECRET_NAME=sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}
|
||||
DS_NS=kube-system
|
||||
DS_NAME=cilium-envoy
|
||||
|
||||
|
||||
@ -19,17 +19,21 @@
|
||||
# - gitea.<fqdn> → 5 reprovs/week
|
||||
# ... × 12 hostnames = 60 effective reprov-slots/week
|
||||
#
|
||||
# Coexistence: the `sovereign-wildcard-tls` Secret name was the single
|
||||
# point of integration with the Cilium Gateway listener
|
||||
# (cilium-gateway.yaml). With per-name certs we still write ONE Secret
|
||||
# of that name BUT it's now a SAN-Certificate containing ALL N
|
||||
# hostnames as SubjectAltNames — cert-manager bundles them into one
|
||||
# Order with N identifiers. LE counts a SAN cert as ONE issuance
|
||||
# against EACH identifier's bucket, but only ONE issuance overall.
|
||||
# So our 168h budget becomes:
|
||||
# min(5/168h per hostname bucket) — typically reprovs share the same
|
||||
# bucket per name, but adding a NEW hostname creates a FRESH bucket
|
||||
# and resets that hostname's count to 0.
|
||||
# 2026-05-17 t143 dual-cert collision cleanup
|
||||
# -------------------------------------------
|
||||
# Previously this Certificate was named `sovereign-wildcard-tls` and
|
||||
# wrote a Secret of the same name. After PR O (2026-05-17) moved the
|
||||
# Cilium Gateway listener's certificateRefs to the per-zone Secret
|
||||
# `sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}` (see
|
||||
# clusters/_template/sovereign-tls/cilium-gateway.yaml:44), the legacy
|
||||
# Secret stopped being referenced by anything — but the Certificate
|
||||
# kept renewing, burning LE budget for no production value and showing
|
||||
# up in audits as an orphan TLS Secret on every Sovereign.
|
||||
#
|
||||
# Single-source-of-truth fix: this Certificate now writes to the SAME
|
||||
# dashed-suffix Secret the Gateway already references. One Cert, one
|
||||
# Secret, one LE issuance per renewal. No more dual-cert collision
|
||||
# and no extra LE budget consumed.
|
||||
#
|
||||
# This pattern is the standard production approach (see Cloudflare,
|
||||
# Vercel, Render). Wildcards are reserved for the limited cases where
|
||||
@ -38,13 +42,17 @@
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
metadata:
|
||||
name: sovereign-wildcard-tls # name kept for backwards-compat with Gateway listener ref
|
||||
# Match the Secret name the Gateway listener references
|
||||
# (clusters/_template/sovereign-tls/cilium-gateway.yaml:44). Cloud-init
|
||||
# substitutes SOVEREIGN_FQDN_DASHED = SOVEREIGN_FQDN with `.` → `-`
|
||||
# (infra/hetzner/cloudinit-control-plane.tftpl §SOVEREIGN_FQDN_DASHED).
|
||||
name: sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}
|
||||
namespace: kube-system
|
||||
labels:
|
||||
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
||||
catalyst.openova.io/component: cilium-gateway
|
||||
spec:
|
||||
secretName: sovereign-wildcard-tls
|
||||
secretName: sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}
|
||||
issuerRef:
|
||||
name: ${WILDCARD_CERT_ISSUER}
|
||||
kind: ClusterIssuer
|
||||
|
||||
@ -11,3 +11,10 @@ resources:
|
||||
# the cert appearing. See file header for full root cause + design
|
||||
# rationale (qa-loop bounded-cycle Provision #7).
|
||||
- cilium-envoy-tls-restart-job.yaml
|
||||
# C7-007 (2026-05-17 t143) — one-shot cleanup of the pre-PR-O legacy
|
||||
# `sovereign-wildcard-tls` Certificate + Secret pair. Idempotent
|
||||
# (`--ignore-not-found`), runs once per Flux reconciliation
|
||||
# generation. Fresh Sovereigns succeed as a no-op; pre-PR-O
|
||||
# Sovereigns delete the orphan resources. Removable from the list
|
||||
# once every live prov has reconciled past it.
|
||||
- legacy-cert-cleanup-job.yaml
|
||||
|
||||
151
clusters/_template/sovereign-tls/legacy-cert-cleanup-job.yaml
Normal file
151
clusters/_template/sovereign-tls/legacy-cert-cleanup-job.yaml
Normal file
@ -0,0 +1,151 @@
|
||||
# C7-007 (2026-05-17 t143) — one-shot cleanup Job for the legacy
|
||||
# `sovereign-wildcard-tls` Certificate + Secret pair.
|
||||
#
|
||||
# Background
|
||||
# ----------
|
||||
# Pre-PR-O Sovereigns rendered a Certificate named `sovereign-wildcard-tls`
|
||||
# (with a Secret of the same name) AND, after PR O moved the Cilium
|
||||
# Gateway listener to the per-zone `sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}`
|
||||
# Secret, the legacy Certificate kept renewing on cert-manager's
|
||||
# default schedule. Result: every audit on a pre-PR-O Sovereign showed
|
||||
# an orphan TLS Secret in kube-system, cert-manager wasted LE budget
|
||||
# renewing a Secret nothing consumed, and operators had to remember to
|
||||
# `kubectl delete` it after every Flux reconciliation re-asserted the
|
||||
# legacy resource (which it no longer does — PR O's `cilium-gateway-cert.yaml`
|
||||
# now produces ONLY the dashed-suffix shape).
|
||||
#
|
||||
# What this Job does
|
||||
# ------------------
|
||||
# Idempotent delete of:
|
||||
# 1. `kube-system/sovereign-wildcard-tls` Certificate (cert-manager.io/v1)
|
||||
# 2. `kube-system/sovereign-wildcard-tls` Secret (kubernetes.io/tls)
|
||||
#
|
||||
# Each delete is `--ignore-not-found` so a fresh Sovereign that never
|
||||
# carried the legacy shape reports "no-op" and Succeeds. The Job runs
|
||||
# ONCE per Flux reconciliation generation (the helm.sh/hook
|
||||
# annotations on the bp-self-sovereign-cutover chart aren't applicable
|
||||
# here because this lives in the per-Sovereign overlay, not a Helm
|
||||
# chart — Flux's Kustomization re-applies idempotently).
|
||||
#
|
||||
# Image
|
||||
# -----
|
||||
# Uses the canonical OpenOva-mirrored alpine/k8s image (mothership
|
||||
# Harbor proxy-cache for Docker Hub, per CLAUDE.md mirror rule).
|
||||
# Bitnami/kubectl was deprecated 2025-08; alpine/k8s is the standard
|
||||
# replacement (see platform/self-sovereign-cutover/chart/values.yaml:252
|
||||
# for the canonical reasoning, captured live on otech103 2026-05-04).
|
||||
#
|
||||
# Why a Job and not a Helm hook
|
||||
# -----------------------------
|
||||
# This file lives in `clusters/_template/sovereign-tls/` — a per-Sovereign
|
||||
# Kustomize overlay reconciled by Flux, NOT a Helm chart. Helm hooks
|
||||
# require a HelmRelease container; this is a single one-shot K8s Job.
|
||||
# Flux's Kustomization reconciliation drives idempotent re-apply.
|
||||
#
|
||||
# Removal plan
|
||||
# ------------
|
||||
# Once every live Sovereign has reconciled past this Job (verified via
|
||||
# `kubectl get jobs -n kube-system | grep legacy-cert-cleanup` showing
|
||||
# Complete on every prov), this file may be deleted from
|
||||
# clusters/_template/sovereign-tls/kustomization.yaml.
|
||||
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: legacy-cert-cleanup
|
||||
namespace: kube-system
|
||||
labels:
|
||||
catalyst.openova.io/component: legacy-cert-cleanup
|
||||
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: legacy-cert-cleanup
|
||||
namespace: kube-system
|
||||
labels:
|
||||
catalyst.openova.io/component: legacy-cert-cleanup
|
||||
rules:
|
||||
# Legacy Secret to delete. Only the specific name — RBAC stays
|
||||
# least-privilege.
|
||||
- apiGroups: [""]
|
||||
resources: ["secrets"]
|
||||
resourceNames: ["sovereign-wildcard-tls"]
|
||||
verbs: ["get", "delete"]
|
||||
# cert-manager Certificate to delete. Only the specific name.
|
||||
- apiGroups: ["cert-manager.io"]
|
||||
resources: ["certificates"]
|
||||
resourceNames: ["sovereign-wildcard-tls"]
|
||||
verbs: ["get", "delete"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: legacy-cert-cleanup
|
||||
namespace: kube-system
|
||||
labels:
|
||||
catalyst.openova.io/component: legacy-cert-cleanup
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: Role
|
||||
name: legacy-cert-cleanup
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: legacy-cert-cleanup
|
||||
namespace: kube-system
|
||||
---
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: legacy-cert-cleanup
|
||||
namespace: kube-system
|
||||
labels:
|
||||
catalyst.openova.io/component: legacy-cert-cleanup
|
||||
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
||||
spec:
|
||||
# Keep the Job around 5 minutes after completion so an operator can
|
||||
# `kubectl logs job/legacy-cert-cleanup -n kube-system` to confirm
|
||||
# what was (or wasn't) cleaned up. After TTL the GC reclaims.
|
||||
ttlSecondsAfterFinished: 300
|
||||
backoffLimit: 2
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
catalyst.openova.io/component: legacy-cert-cleanup
|
||||
spec:
|
||||
serviceAccountName: legacy-cert-cleanup
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: cleanup
|
||||
# Pinned via Harbor proxy-cache. See CLAUDE.md mirror-everything
|
||||
# rule + values.yaml:252 in self-sovereign-cutover for the
|
||||
# Bitnami→alpine/k8s decision history.
|
||||
image: harbor.openova.io/proxy-dockerhub/alpine/k8s:1.31.1
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
set -eu
|
||||
echo "[legacy-cert-cleanup] starting on ${SOVEREIGN_FQDN}"
|
||||
# The dashed-suffix Secret (the live one PR O introduced)
|
||||
# MUST remain — only delete the bare-name legacy pair.
|
||||
echo "[legacy-cert-cleanup] removing legacy Certificate sovereign-wildcard-tls"
|
||||
kubectl -n kube-system delete certificate.cert-manager.io sovereign-wildcard-tls --ignore-not-found=true --wait=false
|
||||
echo "[legacy-cert-cleanup] removing legacy Secret sovereign-wildcard-tls"
|
||||
kubectl -n kube-system delete secret sovereign-wildcard-tls --ignore-not-found=true --wait=false
|
||||
echo "[legacy-cert-cleanup] complete"
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65532
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
resources:
|
||||
requests:
|
||||
cpu: "10m"
|
||||
memory: "32Mi"
|
||||
limits:
|
||||
cpu: "100m"
|
||||
memory: "64Mi"
|
||||
@ -1,6 +1,13 @@
|
||||
apiVersion: v2
|
||||
name: bp-hcloud-csi
|
||||
version: 1.0.0
|
||||
# 1.1.0 (2026-05-17 t143 C9-006): add templates/hcloud-token-secret.yaml
|
||||
# so the chart self-renders the `hcloud-csi-token` Secret from
|
||||
# `.Values.hetznerToken` (populated via Flux valuesFrom from
|
||||
# flux-system/cloud-credentials). Without this Secret the controller
|
||||
# pods cannot authenticate to the Hetzner API; the StorageClass exists
|
||||
# but every PVC fails to provision with a 401 from the CSI driver.
|
||||
# Mirrors bp-hcloud-ccm 1.0.0 wiring.
|
||||
version: 1.1.0
|
||||
description: |
|
||||
Catalyst-curated Blueprint umbrella chart for the Hetzner Cloud CSI
|
||||
driver. Provides the hcloud-volumes StorageClass for multi-node stateful
|
||||
|
||||
47
platform/hcloud-csi/chart/templates/hcloud-token-secret.yaml
Normal file
47
platform/hcloud-csi/chart/templates/hcloud-token-secret.yaml
Normal file
@ -0,0 +1,47 @@
|
||||
{{/*
|
||||
Hetzner API token Secret consumed by the hcloud-csi controller.
|
||||
|
||||
Rendered into the chart's targetNamespace (`hcloud-csi` by convention)
|
||||
from a value sourced via Flux `valuesFrom` against the canonical
|
||||
`flux-system/cloud-credentials` Secret (key `hcloud-token`). Mirrors the
|
||||
pattern used by bp-hcloud-ccm and bp-cluster-autoscaler-hcloud — see
|
||||
platform/hcloud-ccm/chart/templates/hcloud-token-secret.yaml for the
|
||||
matching shape and ADR-0001 §11.3 for the cloud-init seam.
|
||||
|
||||
The bp-hcloud-csi subchart's controller looks up the Secret by name
|
||||
(default `hcloud-csi-token`, key `token`) — see
|
||||
.Values.hetznerTokenSecretRef + the upstream
|
||||
hcloud-csi.controller.hcloudToken.existingSecret binding in values.yaml.
|
||||
|
||||
The Secret is only rendered when:
|
||||
- .Values.enabled is true (master gate; the rest of the chart's
|
||||
rendering is gated on the same value)
|
||||
- .Values.hetznerToken is non-empty (Flux `valuesFrom` populates
|
||||
this from cloud-credentials at HelmRelease apply time)
|
||||
|
||||
When .Values.hetznerToken is empty Helm skips this template entirely so
|
||||
a per-Sovereign overlay that switches to an externally-managed
|
||||
ExternalSecret (Phase 2+) can take over without collision.
|
||||
|
||||
2026-05-17 t143 (C9-006): created so the bootstrap-kit slot
|
||||
17a-bp-hcloud-csi.yaml wires the token in the same shape as
|
||||
55-bp-hcloud-ccm.yaml does — without this Secret the hcloud-csi
|
||||
controller cannot authenticate to the Hetzner API, the StorageClass
|
||||
exists but every PVC fails to provision with a 401 from the CSI driver.
|
||||
*/}}
|
||||
{{- if and .Values.enabled .Values.hetznerToken }}
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: {{ .Values.hetznerTokenSecretRef.name | default "hcloud-csi-token" | quote }}
|
||||
namespace: {{ .Release.Namespace }}
|
||||
labels:
|
||||
app.kubernetes.io/name: bp-hcloud-csi
|
||||
app.kubernetes.io/component: hcloud-token
|
||||
catalyst.openova.io/blueprint: bp-hcloud-csi
|
||||
catalyst.openova.io/blueprint-version: {{ .Chart.Version | quote }}
|
||||
type: Opaque
|
||||
stringData:
|
||||
{{ .Values.hetznerTokenSecretRef.key | default "token" }}: {{ .Values.hetznerToken | quote }}
|
||||
{{- end }}
|
||||
@ -19,6 +19,20 @@ hetznerTokenSecretRef:
|
||||
name: hcloud-csi-token
|
||||
key: token
|
||||
|
||||
# 2026-05-17 t143 (C9-006): Hetzner API token plaintext. Default empty —
|
||||
# Flux `valuesFrom` populates this at HelmRelease apply time from the
|
||||
# canonical flux-system/cloud-credentials Secret (key `hcloud-token`)
|
||||
# cloud-init writes during Phase 0 (mirrors bp-hcloud-ccm wiring at
|
||||
# clusters/_template/bootstrap-kit/55-bp-hcloud-ccm.yaml). When
|
||||
# non-empty, templates/hcloud-token-secret.yaml renders the
|
||||
# `<hetznerTokenSecretRef.name>` Secret in the chart's targetNamespace
|
||||
# so the subchart's controller can authenticate to the Hetzner API.
|
||||
#
|
||||
# Per docs/INVIOLABLE-PRINCIPLES.md #10 (credentials never on CR / Git),
|
||||
# this stays empty in committed YAML; the live value lands at apply
|
||||
# time from cloud-credentials and is never persisted to Git.
|
||||
hetznerToken: ""
|
||||
|
||||
# Catalyst-managed StorageClass list. Each entry renders an independent
|
||||
# StorageClass — operators can add fast-ssd / archive variants per
|
||||
# Sovereign without editing this chart. Named `catalystStorageClasses`
|
||||
|
||||
@ -750,6 +750,56 @@ func (d *Deployment) State() map[string]any {
|
||||
// blank (legacy record).
|
||||
"ownerEmail": d.OwnerEmail,
|
||||
}
|
||||
// C8-001 (2026-05-17 t143): lift the Sovereign-provisioning request
|
||||
// fields that the chroot's /sovereign/settings page renders so the
|
||||
// page works on a fresh chroot session (where the operator's
|
||||
// browser-side wizard-store is empty). The fields are non-secret
|
||||
// projections of the wizard submit (control-plane size, pool
|
||||
// subdomain, BYO domain) — they live on the deployment record's
|
||||
// RedactedRequest already, the gap was only that State() never
|
||||
// surfaced them. Founder caught on t136 2026-05-17 — Settings page
|
||||
// shows four em-dash placeholders for Capacity / CP size / Pool
|
||||
// subdomain / BYO domain on the chroot Sovereign console because
|
||||
// the chroot has no localStorage'd wizard store to read from.
|
||||
if v := d.Request.ControlPlaneSize; v != "" {
|
||||
out["controlPlaneSize"] = v
|
||||
}
|
||||
if v := d.Request.SovereignPoolDomain; v != "" {
|
||||
out["sovereignPoolDomain"] = v
|
||||
}
|
||||
if v := d.Request.SovereignSubdomain; v != "" {
|
||||
out["sovereignSubdomain"] = v
|
||||
}
|
||||
if v := d.Request.SovereignDomainMode; v != "" {
|
||||
out["sovereignDomainMode"] = v
|
||||
}
|
||||
// BYO-domain is encoded on RedactedRequest only when domainMode
|
||||
// is `byo`; we still emit when present so the chroot Settings page
|
||||
// can render it. Pool-mode deployments leave this empty.
|
||||
if v := d.Request.SovereignFQDN; v != "" && d.Request.SovereignDomainMode == "byo" {
|
||||
out["sovereignByoDomain"] = v
|
||||
}
|
||||
// Per-region control-plane sizes (multi-region Sovereigns). The
|
||||
// Settings page falls back to controlPlaneSize when the array is
|
||||
// empty; surface both so future per-region renderings need no
|
||||
// API extension.
|
||||
if len(d.Request.Regions) > 0 {
|
||||
sizes := make([]string, 0, len(d.Request.Regions))
|
||||
for _, r := range d.Request.Regions {
|
||||
sizes = append(sizes, r.ControlPlaneSize)
|
||||
}
|
||||
out["regionControlPlaneSizes"] = sizes
|
||||
}
|
||||
// Org-profile fields (non-secret). Same rationale as the sovereign
|
||||
// fields above — the chroot Settings page would render four
|
||||
// em-dashes for Name / Billing email / Industry / Headquarters
|
||||
// otherwise.
|
||||
if v := d.Request.OrgName; v != "" {
|
||||
out["orgName"] = v
|
||||
}
|
||||
if v := d.Request.OrgEmail; v != "" {
|
||||
out["orgEmail"] = v
|
||||
}
|
||||
if !d.FinishedAt.IsZero() {
|
||||
out["finishedAt"] = d.FinishedAt.Format(time.RFC3339)
|
||||
}
|
||||
|
||||
@ -382,14 +382,25 @@ func (h *Handler) HandleFleetApplications(w http.ResponseWriter, r *http.Request
|
||||
|
||||
// collectFleetSovereigns — every Sovereign known to this catalyst-api
|
||||
// process. Source: the in-memory deployments map (rehydrated from the
|
||||
// PVC at startup), filtered to drop adopted-but-still-tracked records
|
||||
// the same way ListDeployments does. Sorted by FQDN for deterministic
|
||||
// pagination.
|
||||
// PVC at startup). Sorted by FQDN for deterministic pagination.
|
||||
//
|
||||
// Per ADR-0001 §2.7 — no separate fleet database. The deployments map
|
||||
// IS the source of truth on this Pod; tenant_registry is the secondary
|
||||
// source for SME-tier Sovereigns the same map doesn't track (those are
|
||||
// collapsed into the same shape so the caller sees one fleet view).
|
||||
//
|
||||
// 2026-05-17 t143 (C10-002) — adopted Sovereigns INCLUDED.
|
||||
// Previously this helper filtered out every dep with AdoptedAt != nil
|
||||
// (mirroring ListDeployments). The result: on a steady-state fleet
|
||||
// where every Sovereign has completed cutover and been adopted by its
|
||||
// customer's console, the cross-Sovereign Applications dashboard
|
||||
// (/fleet/applications) returned `items=[]` despite the fleet
|
||||
// containing 21 live Sovereigns and 110 succeeded jobs (caught on t10
|
||||
// 2026-05-17). The fleet view's whole purpose is to enumerate every
|
||||
// Sovereign mothership has ever provisioned — adopted is the
|
||||
// steady-state, not a reason to hide. ListDeployments' boundary
|
||||
// (handover hides the row from the provisioner's "in-flight" tab)
|
||||
// does NOT apply to the fleet dashboard.
|
||||
func (h *Handler) collectFleetSovereigns(_ context.Context) []fleetSovereignSummary {
|
||||
out := make([]fleetSovereignSummary, 0)
|
||||
seen := make(map[string]bool)
|
||||
@ -400,14 +411,6 @@ func (h *Handler) collectFleetSovereigns(_ context.Context) []fleetSovereignSumm
|
||||
return true
|
||||
}
|
||||
dep.mu.Lock()
|
||||
if dep.AdoptedAt != nil {
|
||||
// Adopted Sovereigns are owned by the customer's
|
||||
// console.<sovereign-fqdn> — they no longer surface
|
||||
// in the mothership fleet view (same boundary
|
||||
// ListDeployments enforces).
|
||||
dep.mu.Unlock()
|
||||
return true
|
||||
}
|
||||
row := fleetSovereignSummary{
|
||||
ID: dep.ID,
|
||||
FQDN: dep.Request.SovereignFQDN,
|
||||
@ -418,6 +421,14 @@ func (h *Handler) collectFleetSovereigns(_ context.Context) []fleetSovereignSumm
|
||||
if !dep.StartedAt.IsZero() {
|
||||
row.CreatedAt = dep.StartedAt.UTC().Format(time.RFC3339)
|
||||
}
|
||||
// Adopted Sovereigns report Health=green because cutover
|
||||
// drove the deployment status to "ready" before the
|
||||
// AdoptedAt timestamp landed. We surface them with the same
|
||||
// health vocabulary as in-flight rows so the dashboard's
|
||||
// per-card badge keeps working.
|
||||
if dep.AdoptedAt != nil && row.Health == healthUnknown {
|
||||
row.Health = healthGreen
|
||||
}
|
||||
dep.mu.Unlock()
|
||||
|
||||
if !seen[row.ID] {
|
||||
|
||||
@ -247,9 +247,19 @@ func TestHandleFleetSovereigns_Pagination(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
// ── /fleet/sovereigns: adopted excluded ──────────────────────────────
|
||||
|
||||
func TestHandleFleetSovereigns_AdoptedExcluded(t *testing.T) {
|
||||
// ── /fleet/sovereigns: adopted INCLUDED ─────────────────────────────
|
||||
//
|
||||
// 2026-05-17 t143 (C10-002): adopted Sovereigns are INCLUDED in the
|
||||
// fleet view (formerly excluded). Rationale: the fleet view's whole
|
||||
// purpose is to enumerate every Sovereign mothership has ever
|
||||
// provisioned — adopted is the steady state, not a reason to hide.
|
||||
// On a real fleet where every Sovereign has completed cutover (as
|
||||
// happens after handover), the previous filter returned items=[]
|
||||
// despite the deployments map carrying dozens of live Sovereigns and
|
||||
// hundreds of succeeded jobs. The dashboard's empty-state spawned the
|
||||
// C10-002 ticket. ListDeployments still applies the adopted filter
|
||||
// (it backs the provisioner's "in-flight" tab, a different surface).
|
||||
func TestHandleFleetSovereigns_AdoptedIncluded(t *testing.T) {
|
||||
h := NewWithPDM(silentLogger(), &fakePDM{})
|
||||
installFleetSovereign(t, h, "sov-live", "live.example.com", "ready")
|
||||
adopted := installFleetSovereign(t, h, "sov-handed", "handed.example.com", "adopted")
|
||||
@ -259,8 +269,15 @@ func TestHandleFleetSovereigns_AdoptedExcluded(t *testing.T) {
|
||||
rec := callUserAccess(t, h, http.MethodGet, "/api/v1/fleet/sovereigns", nil, registerFleetRoutes)
|
||||
var resp fleetSovereignsResponse
|
||||
_ = json.Unmarshal(rec.Body.Bytes(), &resp)
|
||||
if resp.Total != 1 || resp.Sovereigns[0].ID != "sov-live" {
|
||||
t.Fatalf("expected only sov-live; got %+v", resp.Sovereigns)
|
||||
if resp.Total != 2 {
|
||||
t.Fatalf("expected 2 sovereigns (live + adopted); got total=%d body=%+v", resp.Total, resp.Sovereigns)
|
||||
}
|
||||
// Sort is by FQDN ascending; handed.example.com < live.example.com
|
||||
if got := resp.Sovereigns[0].ID; got != "sov-handed" {
|
||||
t.Fatalf("first sovereign id: got %q want sov-handed (FQDN sort)", got)
|
||||
}
|
||||
if got := resp.Sovereigns[1].ID; got != "sov-live" {
|
||||
t.Fatalf("second sovereign id: got %q want sov-live", got)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@ -742,6 +742,103 @@ func (h *Handler) markPhase1Done(dep *Deployment, finalStates map[string]string,
|
||||
// per-region LB IP wait loops (each up to 5 min).
|
||||
// docs/SOVEREIGN-MULTI-REGION-DOD.md gates D9-D12.
|
||||
go h.runAutoEstablishClusterMesh(dep)
|
||||
// C10-003 (2026-05-17 t143): when Phase-1 reaches
|
||||
// OutcomeReady, the PRIMARY's terminate path persists the
|
||||
// final per-Job status from its own helmwatch state map.
|
||||
// Secondary regions' install-* Jobs live on the per-region
|
||||
// bridge but are wired via separate watcher event streams
|
||||
// (spawnSecondaryRegionWatchers above), and stale events
|
||||
// (e.g. a transient HelmStatePending observed during initial
|
||||
// dep-not-ready cycles, then suppressed by lastState dedup
|
||||
// before the Installed transition was ever observed) can
|
||||
// leave their Job rows pinned to "pending" even though
|
||||
// kubectl reports every HR Ready=True. Founder-flagged on
|
||||
// t10 2026-05-17 (install-nbg1-1:*, install-sin-2:* stuck
|
||||
// pending despite deployment status=ready).
|
||||
//
|
||||
// Re-seed every secondary watcher from its current
|
||||
// informer cache so each install-<region>:<chart> Job row
|
||||
// converges onto the cluster-current HelmState. The seed
|
||||
// path is idempotent (mergeJob preserves monotonic
|
||||
// timestamps + non-empty DependsOn; SeedJobsFromInformerList
|
||||
// matches OnHelmReleaseEvent's Status mapping), so this is
|
||||
// safe to call multiple times.
|
||||
//
|
||||
// CRITICAL: invoke INLINE, not on a goroutine — runPhase1Watch
|
||||
// holds `defer stopSecondaries()` which clears
|
||||
// dep.secondaryWatchers as soon as markPhase1Done returns.
|
||||
// A go-spawned backfill would race the cleanup and observe
|
||||
// an empty map ~50% of the time. The backfill itself is
|
||||
// in-memory work (informer snapshot + bridge merge), no
|
||||
// network I/O — running it on the terminate path's stack
|
||||
// adds ≤100ms before markPhase1Done's caller resumes.
|
||||
h.runSecondaryBridgeBackfill(dep)
|
||||
}
|
||||
}
|
||||
|
||||
// runSecondaryBridgeBackfill walks every secondary watcher attached to
|
||||
// the deployment, snapshots each one's informer cache, and reseeds the
|
||||
// shared jobs.Bridge with the cluster-current state. This is the
|
||||
// recovery path for C10-003 — secondary install Jobs stuck "pending"
|
||||
// after deployment status=ready, caused by a transient event lost to
|
||||
// the bridge's lastState dedup (the seed observed HelmStatePending at
|
||||
// initial-list, the Installed transition never produced a distinct
|
||||
// event because the watcher attached AFTER the HR had already settled
|
||||
// at Installed — same state, dedup suppresses, status stays pending).
|
||||
//
|
||||
// Run INLINE from markPhase1Done — runPhase1Watch's
|
||||
// `defer stopSecondaries()` clears dep.secondaryWatchers immediately
|
||||
// after markPhase1Done returns, so a goroutine-spawned backfill would
|
||||
// race the cleanup. The work is in-memory only (informer snapshot +
|
||||
// bridge merge); no network I/O justifies a goroutine.
|
||||
//
|
||||
// Errors are logged at warn; this is a best-effort convergence helper,
|
||||
// not a correctness gate.
|
||||
func (h *Handler) runSecondaryBridgeBackfill(dep *Deployment) {
|
||||
defer func() {
|
||||
if r := recover(); r != nil {
|
||||
h.log.Error("secondary bridge backfill: panic recovered",
|
||||
"id", dep.ID,
|
||||
"panic", r,
|
||||
)
|
||||
}
|
||||
}()
|
||||
dep.mu.Lock()
|
||||
watchers := make(map[string]*helmwatch.Watcher, len(dep.secondaryWatchers))
|
||||
for region, w := range dep.secondaryWatchers {
|
||||
watchers[region] = w
|
||||
}
|
||||
bridge := dep.jobsBridge
|
||||
dep.mu.Unlock()
|
||||
if bridge == nil || len(watchers) == 0 {
|
||||
return
|
||||
}
|
||||
for region, watcher := range watchers {
|
||||
if watcher == nil {
|
||||
continue
|
||||
}
|
||||
snap := watcher.SnapshotComponents()
|
||||
if len(snap) == 0 {
|
||||
continue
|
||||
}
|
||||
seeds := snapshotsToSeedsForRegion(snap, region)
|
||||
jobsCount, execsSeeded, err := bridge.SeedJobsFromInformerList(seeds)
|
||||
if err != nil {
|
||||
h.log.Warn("secondary bridge backfill: reseed failed",
|
||||
"id", dep.ID,
|
||||
"region", region,
|
||||
"snapshotCount", len(snap),
|
||||
"err", err,
|
||||
)
|
||||
continue
|
||||
}
|
||||
h.log.Info("secondary bridge backfill: reseeded from informer cache",
|
||||
"id", dep.ID,
|
||||
"region", region,
|
||||
"snapshotCount", len(snap),
|
||||
"jobsWritten", jobsCount,
|
||||
"executionsSeeded", execsSeeded,
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@ -33,6 +33,7 @@ import {
|
||||
compareJobs,
|
||||
formatDuration,
|
||||
matchJob,
|
||||
regionFromJob,
|
||||
} from './JobsTable'
|
||||
import { FIXTURE_JOBS } from '@/test/fixtures/jobs.fixture'
|
||||
import type { Job } from '@/lib/jobs.types'
|
||||
@ -326,3 +327,79 @@ describe('JobsTable — render', () => {
|
||||
expect(screen.getByTestId('jobs-cell-status-bp-vault').textContent?.toLowerCase()).toContain('pending')
|
||||
})
|
||||
})
|
||||
|
||||
// ── C8-005 (2026-05-17 t143): region filter helpers + dropdown ───────
|
||||
describe('regionFromJob (C8-005)', () => {
|
||||
it('returns empty for primary-region rows (no `:` in appId)', () => {
|
||||
expect(regionFromJob({ jobName: 'Install cilium', appId: 'bp-cilium' })).toBe('')
|
||||
})
|
||||
|
||||
it('extracts region from a `<region>:<chart>` appId', () => {
|
||||
expect(regionFromJob({ jobName: 'Install cilium', appId: 'fsn1:bp-cilium' })).toBe('fsn1')
|
||||
})
|
||||
|
||||
it('handles hyphenated region keys', () => {
|
||||
expect(regionFromJob({ jobName: 'Install cilium', appId: 'hel1-2:bp-cilium' })).toBe('hel1-2')
|
||||
})
|
||||
|
||||
it('falls back to parsing `install-<region>:<chart>` jobName when appId is empty', () => {
|
||||
expect(regionFromJob({ jobName: 'install-nbg1-1:bp-flux', appId: '' })).toBe('nbg1-1')
|
||||
})
|
||||
|
||||
it('returns empty for group/day-2 rows with no parseable region', () => {
|
||||
expect(regionFromJob({ jobName: 'applications', appId: '' })).toBe('')
|
||||
})
|
||||
})
|
||||
|
||||
describe('JobsTable region filter (C8-005)', () => {
|
||||
const baseLeaf = {
|
||||
type: 'install' as const,
|
||||
parentId: 'applications',
|
||||
childIds: [],
|
||||
dependsOn: [],
|
||||
status: 'succeeded' as const,
|
||||
startedAt: '2026-05-17T10:00:00Z',
|
||||
finishedAt: '2026-05-17T10:01:00Z',
|
||||
durationMs: 60_000,
|
||||
}
|
||||
|
||||
it('hides the region dropdown on single-region deployments', async () => {
|
||||
const singleRegion: Job[] = [
|
||||
{ ...baseLeaf, id: 'bp-cilium', jobName: 'Install Cilium', appId: 'bp-cilium' },
|
||||
{ ...baseLeaf, id: 'bp-flux', jobName: 'Install Flux', appId: 'bp-flux' },
|
||||
]
|
||||
renderTable({ jobs: singleRegion })
|
||||
await screen.findByTestId('jobs-table')
|
||||
expect(screen.queryByTestId('jobs-filter-region')).toBeNull()
|
||||
})
|
||||
|
||||
it('shows the region dropdown when 2+ regions appear', async () => {
|
||||
const multiRegion: Job[] = [
|
||||
{ ...baseLeaf, id: 'bp-cilium', jobName: 'Install Cilium', appId: 'bp-cilium' },
|
||||
{ ...baseLeaf, id: 'fsn1:bp-cilium', jobName: 'install-fsn1:bp-cilium', appId: 'fsn1:bp-cilium' },
|
||||
{ ...baseLeaf, id: 'hel1-2:bp-cilium', jobName: 'install-hel1-2:bp-cilium', appId: 'hel1-2:bp-cilium' },
|
||||
]
|
||||
renderTable({ jobs: multiRegion })
|
||||
await screen.findByTestId('jobs-table')
|
||||
const sel = screen.getByTestId('jobs-filter-region') as HTMLSelectElement
|
||||
expect(sel).toBeTruthy()
|
||||
// Options: All + 2 regions (sorted lexically: fsn1, hel1-2)
|
||||
const opts = Array.from(sel.querySelectorAll('option')).map((o) => o.textContent)
|
||||
expect(opts).toEqual(['All', 'fsn1', 'hel1-2'])
|
||||
})
|
||||
|
||||
it('filters rows to the selected region', async () => {
|
||||
const multiRegion: Job[] = [
|
||||
{ ...baseLeaf, id: 'bp-cilium', jobName: 'Install Cilium', appId: 'bp-cilium' },
|
||||
{ ...baseLeaf, id: 'fsn1:bp-cilium', jobName: 'install-fsn1:bp-cilium', appId: 'fsn1:bp-cilium' },
|
||||
{ ...baseLeaf, id: 'hel1-2:bp-cilium', jobName: 'install-hel1-2:bp-cilium', appId: 'hel1-2:bp-cilium' },
|
||||
]
|
||||
renderTable({ jobs: multiRegion })
|
||||
await screen.findByTestId('jobs-table')
|
||||
fireEvent.change(screen.getByTestId('jobs-filter-region'), { target: { value: 'fsn1' } })
|
||||
const rows = screen.getAllByTestId(/^jobs-table-row-/)
|
||||
expect(rows.length).toBe(1)
|
||||
expect(screen.queryByTestId('jobs-table-row-bp-cilium')).toBeNull()
|
||||
expect(screen.queryByTestId('jobs-table-row-hel1-2:bp-cilium')).toBeNull()
|
||||
})
|
||||
})
|
||||
|
||||
@ -76,6 +76,43 @@ export function compareJobs(a: Job, b: Job): number {
|
||||
return a.id.localeCompare(b.id)
|
||||
}
|
||||
|
||||
/**
|
||||
* regionFromJob — extract the Hetzner region key from a Job's
|
||||
* `jobName` / `appId`. Multi-region deployments use a
|
||||
* `<region>:<chart>` prefix in the AppID, and an `install-<region>:<chart>`
|
||||
* jobName. The canonical region encoding is documented in
|
||||
* products/catalyst/bootstrap/api/internal/jobs/helmwatch_bridge.go:503
|
||||
* (three input shapes: bare chart, region-prefixed, install-region-prefixed).
|
||||
*
|
||||
* Returns the empty string for primary-region rows (no `:` separator)
|
||||
* so the region filter dropdown's "All" option naturally matches them.
|
||||
* Day-2 mutation rows and groups have empty appId and return ''.
|
||||
*
|
||||
* Exported so the unit test in JobsTable.test.tsx can lock in the
|
||||
* contract.
|
||||
*/
|
||||
export function regionFromJob(job: Pick<Job, 'jobName' | 'appId'>): string {
|
||||
// Prefer the AppID encoding because it's the canonical key the
|
||||
// backend uses (helmwatch_bridge.go's `componentID` is
|
||||
// `<region>:<chart>` for secondaries, bare for primary).
|
||||
if (job.appId) {
|
||||
const sep = job.appId.indexOf(':')
|
||||
if (sep > 0) return job.appId.substring(0, sep)
|
||||
}
|
||||
// Fallback: parse the jobName when AppID is empty (group rows /
|
||||
// pre-bridge legacy rows).
|
||||
if (job.jobName) {
|
||||
// Strip the canonical `install-` prefix, then check for the
|
||||
// region separator. Anything before `:` is the region.
|
||||
const stripped = job.jobName.startsWith('install-')
|
||||
? job.jobName.slice('install-'.length)
|
||||
: job.jobName
|
||||
const sep = stripped.indexOf(':')
|
||||
if (sep > 0) return stripped.substring(0, sep)
|
||||
}
|
||||
return ''
|
||||
}
|
||||
|
||||
/**
|
||||
* Search predicate — matches across jobName / appId / dependsOn /
|
||||
* status / parentId. Case-insensitive substring match. Exported so
|
||||
@ -166,6 +203,10 @@ export function JobsTable({ jobs, appIdFilter, initialParentFilter }: JobsTableP
|
||||
const [statusFilter, setStatusFilter] = useState<'' | JobStatus>('')
|
||||
const [appFilter, setAppFilter] = useState<string>('')
|
||||
const [parentFilter, setParentFilter] = useState<string>('')
|
||||
// D20 (2026-05-17 t143): region filter dropdown so operators on a
|
||||
// multi-region Sovereign can scope the table to one region without
|
||||
// typing the region key into the search box. Empty string = "All".
|
||||
const [regionFilter, setRegionFilter] = useState<string>('')
|
||||
|
||||
// Resolve parent display labels — used in the Parent column + filter.
|
||||
const parentLabelById = useMemo<Map<string, string>>(() => {
|
||||
@ -197,6 +238,19 @@ export function JobsTable({ jobs, appIdFilter, initialParentFilter }: JobsTableP
|
||||
.sort((a, b) => a.label.localeCompare(b.label))
|
||||
}, [jobs, parentLabelById])
|
||||
|
||||
// D20 (2026-05-17 t143): unique non-empty region keys present in the
|
||||
// current job set. Sorted lexically so operators see a stable order
|
||||
// (fsn1, hel1-2, nbg1-1, sin-2). Hidden when only one region (or
|
||||
// zero) appears — the filter would be a one-option no-op.
|
||||
const regionOptions = useMemo<string[]>(() => {
|
||||
const set = new Set<string>()
|
||||
for (const j of jobs) {
|
||||
const r = regionFromJob(j)
|
||||
if (r) set.add(r)
|
||||
}
|
||||
return [...set].sort((a, b) => a.localeCompare(b))
|
||||
}, [jobs])
|
||||
|
||||
const visibleJobs = useMemo<Job[]>(() => {
|
||||
const filtered = jobs.filter((j) => {
|
||||
// Hide group rows by default — they appear in the canvas as
|
||||
@ -209,11 +263,12 @@ export function JobsTable({ jobs, appIdFilter, initialParentFilter }: JobsTableP
|
||||
if (statusFilter && j.status !== statusFilter) return false
|
||||
if (appFilter && j.appId !== appFilter) return false
|
||||
if (parentFilter && j.parentId !== parentFilter) return false
|
||||
if (regionFilter && regionFromJob(j) !== regionFilter) return false
|
||||
if (!matchJob(j, search)) return false
|
||||
return true
|
||||
})
|
||||
return [...filtered].sort(compareJobs)
|
||||
}, [jobs, search, statusFilter, appFilter, parentFilter, appIdFilter, initialParentFilter])
|
||||
}, [jobs, search, statusFilter, appFilter, parentFilter, regionFilter, appIdFilter, initialParentFilter])
|
||||
|
||||
return (
|
||||
<div className="jobs-table-wrap" data-testid="jobs-table-wrap">
|
||||
@ -295,6 +350,34 @@ export function JobsTable({ jobs, appIdFilter, initialParentFilter }: JobsTableP
|
||||
</label>
|
||||
)}
|
||||
|
||||
{/*
|
||||
D20 region filter — visible only when 2+ regions appear in
|
||||
the current job set. A single-region Sovereign sees no
|
||||
dropdown (would be a one-option no-op + visual noise).
|
||||
Operators on a multi-region cluster get a quick way to
|
||||
scope the table to fsn1 / hel1-2 / nbg1-1 / sin-2 without
|
||||
typing the region key into the free-text search.
|
||||
*/}
|
||||
{regionOptions.length > 1 ? (
|
||||
<label className="jobs-filter-label">
|
||||
<span className="jobs-filter-caption">Region</span>
|
||||
<select
|
||||
value={regionFilter}
|
||||
onChange={(e) => setRegionFilter(e.target.value)}
|
||||
className="jobs-filter-select"
|
||||
data-testid="jobs-filter-region"
|
||||
aria-label="Filter by region"
|
||||
>
|
||||
<option value="">All</option>
|
||||
{regionOptions.map((r) => (
|
||||
<option key={r} value={r}>
|
||||
{r}
|
||||
</option>
|
||||
))}
|
||||
</select>
|
||||
</label>
|
||||
) : null}
|
||||
|
||||
<span
|
||||
className="jobs-result-count"
|
||||
data-testid="jobs-result-count"
|
||||
|
||||
@ -131,14 +131,27 @@ export function SettingsPage({ disableStream = false }: SettingsPageProps = {})
|
||||
const startedAt = snapshot?.startedAt ?? null
|
||||
const status = snapshot?.status ?? null
|
||||
|
||||
// Pool domain / subdomain are wizard-store fields; they survive the
|
||||
// wizard submit because the store is zustand+persist (localStorage).
|
||||
const poolDomain = store.sovereignPoolDomain || null
|
||||
const poolSubdomain = store.sovereignSubdomain || null
|
||||
const domainMode = store.sovereignDomainMode || null
|
||||
const byoDomain = store.sovereignByoDomain || null
|
||||
const orgName = store.orgName || null
|
||||
const orgEmail = store.orgEmail || null
|
||||
// C8-001 (2026-05-17 t143): prefer the live snapshot for the
|
||||
// Sovereign + DNS fields, fall back to the wizard store. The chroot
|
||||
// Sovereign console has a fresh localStorage (the wizard runs on
|
||||
// mothership, the chroot session never persists the store), so
|
||||
// wizard-store-only fields rendered four em-dashes for Capacity /
|
||||
// Pool subdomain / BYO domain / CP size. catalyst-api's
|
||||
// Deployment.State() now surfaces these from the persisted
|
||||
// RedactedRequest projection — they're the authoritative source on
|
||||
// every Sovereign post-handover. The wizard-store fallback covers
|
||||
// the mothership wizard-in-flight case where the snapshot may not
|
||||
// yet carry the request fields (pre-CreateDeployment).
|
||||
const poolDomain = snapshot?.sovereignPoolDomain ?? store.sovereignPoolDomain ?? null
|
||||
const poolSubdomain = snapshot?.sovereignSubdomain ?? store.sovereignSubdomain ?? null
|
||||
const domainMode = snapshot?.sovereignDomainMode ?? store.sovereignDomainMode ?? null
|
||||
const byoDomain = snapshot?.sovereignByoDomain ?? store.sovereignByoDomain ?? null
|
||||
const orgName = snapshot?.orgName ?? store.orgName ?? null
|
||||
const orgEmail = snapshot?.orgEmail ?? store.orgEmail ?? null
|
||||
// OrgIndustry / OrgHeadquarters are wizard-store-only fields today —
|
||||
// not persisted on the deployment record. They render the em-dash
|
||||
// placeholder on the chroot until a future PR plumbs them through
|
||||
// the provisioner.Request payload.
|
||||
const orgIndustry = store.orgIndustry || null
|
||||
const orgHeadquarters = store.orgHeadquarters || null
|
||||
|
||||
@ -146,11 +159,23 @@ export function SettingsPage({ disableStream = false }: SettingsPageProps = {})
|
||||
// since the founder spec is single-region happy path. The full per-
|
||||
// region table belongs on a future Compute settings sub-page.
|
||||
const controlPlaneSize = useMemo(() => {
|
||||
const arr = store.regionControlPlaneSizes
|
||||
if (Array.isArray(arr) && arr.length > 0 && arr[0]) return arr[0]
|
||||
// Prefer snapshot (chroot Sovereign source-of-truth). Multi-region
|
||||
// arrays surface from snapshot.regionControlPlaneSizes; single
|
||||
// region from snapshot.controlPlaneSize. Falls back to wizard
|
||||
// store for the mothership wizard-in-flight case.
|
||||
const snapArr = snapshot?.regionControlPlaneSizes
|
||||
if (Array.isArray(snapArr) && snapArr.length > 0 && snapArr[0]) return snapArr[0]
|
||||
if (snapshot?.controlPlaneSize) return snapshot.controlPlaneSize
|
||||
const storeArr = store.regionControlPlaneSizes
|
||||
if (Array.isArray(storeArr) && storeArr.length > 0 && storeArr[0]) return storeArr[0]
|
||||
if (store.controlPlaneSize) return store.controlPlaneSize
|
||||
return null
|
||||
}, [store.regionControlPlaneSizes, store.controlPlaneSize])
|
||||
}, [
|
||||
snapshot?.regionControlPlaneSizes,
|
||||
snapshot?.controlPlaneSize,
|
||||
store.regionControlPlaneSizes,
|
||||
store.controlPlaneSize,
|
||||
])
|
||||
|
||||
return (
|
||||
<PortalShell deploymentId={deploymentId} sovereignFQDN={sovereignFQDN} pageTitle="Settings">
|
||||
|
||||
@ -77,6 +77,25 @@ export interface DeploymentSnapshot {
|
||||
region?: string
|
||||
error?: string
|
||||
numEvents?: number
|
||||
/**
|
||||
* C8-001 (2026-05-17 t143) — Sovereign-provisioning request fields
|
||||
* lifted to the snapshot so the chroot's `/sovereign/settings` page
|
||||
* works without a populated wizard store (chroot localStorage is
|
||||
* fresh post-handover, so reading Capacity / Pool subdomain / BYO
|
||||
* domain from `useWizardStore()` rendered four em-dashes). The
|
||||
* catalyst-api's `Deployment.State()` surfaces these from the
|
||||
* persisted RedactedRequest projection; the SettingsPage reads
|
||||
* snapshot-first with the wizard store as fallback.
|
||||
*/
|
||||
controlPlaneSize?: string
|
||||
regionControlPlaneSizes?: string[]
|
||||
sovereignPoolDomain?: string
|
||||
sovereignSubdomain?: string
|
||||
sovereignDomainMode?: string
|
||||
/** Present only when domainMode === 'byo'. */
|
||||
sovereignByoDomain?: string
|
||||
orgName?: string
|
||||
orgEmail?: string
|
||||
/**
|
||||
* Phase-1 helmwatch ground-truth — populated by the catalyst-api when
|
||||
* its HelmRelease informer terminated. Lifted to the top level by
|
||||
|
||||
Loading…
Reference in New Issue
Block a user