Commit Graph

2204 Commits

Author SHA1 Message Date
hatiyildiz
4b97a50310 fix(store): PR P — preserve MarketplaceEnabled through Redact + ToProvisionerRequest
Founder caught on t144: /settings/marketplace toggle showed disabled
even though the prov body had marketplaceEnabled=true.

Root cause: store.RedactedRequest struct (the on-disk projection)
lacked a MarketplaceEnabled field. Every Save/Load cycle stripped
the bit:
- Mothership Save(rec) → MarketplaceEnabled dropped
- Mothership exportDeploymentToChild → chroot receives record without bit
- Chroot HandleGetMarketplace → reads dep.Request.MarketplaceEnabled
  → zero value (false) → UI toggle defaults to disabled

PR J #1590's GET endpoint was correctly wired but the data was already
gone before it ran.

Fix: add MarketplaceEnabled field to RedactedRequest + carry it
through Redact() + ToProvisionerRequest(). Backward-compat via
`omitempty` — records persisted before this PR deserialize with
false, same as the prior behavior.

Bumps chart 1.4.151 -> 1.4.152 + bootstrap-kit pin so next prov
exercises the full chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 12:51:43 +02:00
github-actions[bot]
efd5d60130 deploy: update catalyst images to 0242be5 2026-05-17 09:21:12 +00:00
e3mrah
0242be5c49
fix(infra): PR O — cilium-gateway TLS references per-zone wildcard cert (#1595)
t143 hit LE PROD rate limit (50 certs/week on omani.works exhausted)
because TWO cert templates compete for the same parent-domain quota:
1. clusters/_template/sovereign-tls/cilium-gateway-cert.yaml — legacy
   SAN cert named `sovereign-wildcard-tls`
2. products/catalyst/chart/templates/sovereign-wildcard-certs.yaml —
   chart per-zone cert named `sovereign-wildcard-tls-<sanitised-zone>`

The Cilium Gateway listener hardcoded the legacy name, so when LE 429s
the legacy cert (as happened on t143), HTTPS to console.<fqdn> breaks
even though the per-zone cert is Ready.

Fix: gateway listener now references `sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}`.
Cloud-init substitutes SOVEREIGN_FQDN_DASHED = replace(fqdn, ".", "-")
in the sovereign-tls Kustomization postBuild.substitute. The per-zone
cert from the chart provides the Ready Secret with this exact name.

The legacy cilium-gateway-cert.yaml SAN cert still renders for
backward-compat (some consumers may still reference it), but the
gateway listener no longer depends on it for TLS termination.

Bumps no chart version — the change is at the Flux/Kustomize layer.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:19:10 +04:00
github-actions[bot]
be0874f5e2 deploy: update catalyst images to b27bdee 2026-05-17 09:04:11 +00:00
e3mrah
b27bdeee05
fix(handover): PR N — fallback to per-FQDN cert when wildcard 429s (#1594)
t143 caught the LE PROD rate limit (429: too many certificates (50)
already issued for omani.works in last 168h0m0s, retry after
2026-05-17 10:28:32 UTC). The chart renders TWO cert names:
- sovereign-wildcard-tls (canonical, hit 429)
- sovereign-wildcard-tls-<fqdn> (per-FQDN, was already issued before
  rate limit, Ready=True)

waitForWildcardCert only checked the canonical name. With the limit
hit, handover waited the full 10-min budget before firing degraded.

Fix: when the canonical cert is unavailable, list namespace certs
matching `sovereign-wildcard-tls-*` prefix and return Ready=True if
ANY sibling is Ready. The operator's console.<fqdn> TLS handshake
will succeed against either secret since both wildcard *.<fqdn>.

Bumps chart 1.4.150 -> 1.4.151 + bootstrap-kit pin so the fix lands
on next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:02:17 +04:00
github-actions[bot]
13c9684cc1 deploy: update catalyst images to 32c46b8 2026-05-17 08:39:46 +00:00
e3mrah
32c46b80e1
feat(ui): PR M — dashboard default Layer-1=cluster + Marketplace Admin link + chart 1.4.150 (#1593)
Founder follow-up to t142 cycle:
1. "the dashboard is still not showing the clusters properly" — the D16
   fan-out CODE works (3 clusters in k8sCache, dashboard handler fans
   out) but the OPERATOR-FACING default Layer-1 was 'family' not
   'cluster'. Operator opens /dashboard, sees family-grouped bubbles,
   thinks the multi-cluster fix is broken. Fix: when SovereignFQDN is
   present (Sovereign Console mode), default to ['cluster', 'application']
   so the 3-cluster grouping is the first thing the operator sees.

2. "I have no idea where the admin components for billing, order, revenue
   etc related BSS are" — exists at marketplace.<sov>/back-office/ but
   the Sovereign Console sidebar had no link. Fix: add "Marketplace Admin"
   nav link (external, opens in new tab) — uses resolvedFQDN to construct
   the URL. data-testid=sov-console-nav-marketplace-admin for matrix.

Also bumps chart 1.4.149 → 1.4.150 + bootstrap-kit pin so the changes
land on next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 12:37:53 +04:00
github-actions[bot]
68fe94b331 deploy: update catalyst images to 86f5331 2026-05-17 08:02:06 +00:00
e3mrah
86f5331962
fix(catalyst-api): PR L — AppDetail HelmRelease fallback + chart 1.4.149 (#1592)
Founder t140 bug #2: "in the catalog and jobs it shows as installed,
in the application page it shows as provisioning, there is a sync issue".

Root cause: AppDetail reads Application CR via GET /sovereigns/{id}/
applications/{name}. For bootstrap-kit installs (cilium, cert-manager,
gateway-api, alloy, etc.) NO Application CR exists — they ship as
HelmReleases directly with no wizard step to create the CR. The handler
returned 404 → UI showed "App not found" or perpetual "Provisioning",
while /apps (which reads HelmRelease) shows "installed".

Fix: HandleApplicationGet, on Application CR not-found, falls back to a
HelmRelease lookup in h.k8sCache (uses resolveChrootClusterID so it works
post-D16 multi-cluster fan-out). Synthesises an applicationDetailResponse
from HR fields:
- Name/Namespace from HR
- Blueprint from spec.chart.spec.chart
- Version from spec.chart.spec.version (or status.lastAttemptedRevision)
- Phase: Ready (HR Ready=True) / Failed (False) / Provisioning (Unknown)
- Conditions: pass-through HR conditions

Also bumps chart to 1.4.149 + bootstrap-kit pin so this fix + the
queued PRs #1590 (marketplace GET) + #1591 (publish toggle UI) all
land on the next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:59:59 +04:00
github-actions[bot]
b0c0f91604 deploy: update catalyst images to df150fd 2026-05-17 07:57:50 +00:00
e3mrah
df150fdbd8
feat(ui): PR K — per-app catalog publish/unpublish toggle on AppDetail header (#1591)
Founder caught on t140 bug #4: "I am supposed to mark which applications
are going to be available in the catalog … I am not able to see such
option from the application page".

Fix: PublishToggleChip rendered in the AppDetail hero meta row.
- Reads current state on mount from GET /api/catalog/apps/{slug}
- Click flips via PUT /api/catalog/admin/apps/{slug}/published
- Optimistic update; reverts + tooltip on backend error
- data-testid="app-detail-publish-toggle" for matrix coverage

Backend already shipped — SetAppPublished handler at the catalog
service /catalog/admin/apps/{slug}/published. Gateway routes
admin/* with auth-gating so only Sovereign Console operator can
flip. No backend change needed.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:55:45 +04:00
github-actions[bot]
e1f619aa77 deploy: update catalyst images to 114705c 2026-05-17 07:51:10 +00:00
e3mrah
114705c63c
fix(marketplace): PR J — GET endpoint + UI reflects actual enabled state (#1590)
Founder caught on t140 bug #5: /settings/marketplace shows "disabled"
while the marketplace is actually serving (prov body had
marketplaceEnabled=true). Root cause: MarketplaceSettings UI hardcoded
useState(false) on mount because no GET endpoint existed to read the
current value.

Fix:
- Backend: new GET /api/v1/sovereigns/{id}/marketplace returning
  {deploymentId, sovereignFQDN, enabled, brand}. Reads from the
  in-memory deployment record (Request.MarketplaceEnabled set at
  prov time + mutated by HandleSetMarketplace's commit path).
- UI: MarketplaceSettings useEffect fetches on mount, sets the
  toggle to the actual value, hydrates the brand fields. Best-effort
  fetch — falls back to defaults on failure.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:49:03 +04:00
github-actions[bot]
a63f3c13ab deploy: update catalyst images to f1ebf14 2026-05-17 07:06:33 +00:00
e3mrah
f1ebf14cf8
fix(catalyst-api): D30 PR I — mark imported deployment as Adopted on chroot (#1589)
Founder t140 bug #6: /parent-domains shows only primary, not the
sme-pool domains. Chroot's deployment record has parentDomains[]
populated but ListParentDomains uses h.activeDeployment() which
filters to AdoptedAt!=nil. The mothership ships the record before
the chroot's own handover-finalisation, so AdoptedAt is nil →
activeDeployment returns nil → only synth primary row renders.

Fix: HandleDeploymentImport stamps AdoptedAt at import time. The
FQDN-match guard above verifies "this record IS my Sovereign's
record" so the chroot is by definition the operator/owner — no
separate adoption-wizard needed on chroot side.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:04:38 +04:00
github-actions[bot]
473a2ba4b9 deploy: update catalyst images to 52be4d4 2026-05-17 07:02:25 +00:00
e3mrah
52be4d4d3a
fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias (#1587)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalog): D27 — fresh-seed apps default Published+Deployable

Founder caught on t136: marketplace.t136/apps shows blank application
grid. Root cause: catalog seed.go calls migrateAppPublished +
migrateAppDeployable ONLY on the "already populated" path. On a fresh
Sovereign install (empty catalog) seedAllData inserts 27 rows with
zero-value bools — Published=false, Deployable=false. The marketplace
storefront filters with `?published=true`, gets [], renders blank.

Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished
+ seedSystemApps. Both migrations are idempotent (skip rows already
true), so re-runs are safe.

Verified the bug live on t138 (eaaee1ea24184c2a):
  http://catalog.sme:8082/catalog/apps returns 27 apps
  http://catalog.sme:8082/catalog/apps?published=true returns 0

With this fix the latter returns 27.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign

Founder caught on t136: console.t136.../app/bp-alloy renders the
catalog grid (AppsPage) instead of AppDetail. Three earlier PRs
(#1572 + chart bumps) flipped the appRoute beforeLoad logic but
the actual route-matching collision was not fixed.

Root cause: appRoute.addChildren registers appDeploymentRoute at
`/$deploymentId` (effective `/app/$deploymentId`, mother-only)
BEFORE consoleLayoutRoute registers consoleAppDetailRoute at
`/app/$componentId`. TanStack Router resolves equally-specific
dynamic routes by declaration order — so on the Sovereign Console
URL `/app/bp-alloy` matches appDeploymentRoute first and renders
AppsPage with deploymentId="bp-alloy".

Fix: at routeTree build time, filter appRoute children to exclude
every mother-only `/$deploymentId/*` route when running on
Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this
is a one-time check, no runtime overhead. With those routes
absent, consoleAppDetailRoute is the only matcher for
`/app/<componentId>` on Sovereign Console — AppDetail renders.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(bootstrap-kit): pin bp-catalyst-platform 1.4.147→1.4.148

Founder-flagged bug fixes from session t136/t138/t139 verify cycle
shipped 3 PRs that bumped catalyst chart Chart.yaml to 1.4.148
(d985f27c) with new images:
- catalystApi/Ui: 2ab8a0e (PR #1583 D16 fan-out + retry + auth-bypass,
  PR #1585 D17 router collision)
- smeTag: 964dc15 (PR #1584 D27 catalog fresh-seed Published)

But bootstrap-kit/13-bp-catalyst-platform.yaml stayed pinned to
1.4.147 — every fresh provision installs the OLDER chart with the
OLDER images, so the founder-flagged bugs persist.

Caught on t139 (b4a7ee052d844da0) post-handover verify: chart
installed = bp-catalyst-platform@1.4.147, catalog returns 0
published apps, /app/bp-alloy renders catalog grid.

Bumping the pin makes fresh provs install 1.4.148 (which has all 3
PRs baked).

Refs: feedback_test_theater_3rd_violation_2026_05_17.md
      feedback_overlap_provs_dont_serialize_wait.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias

Founder caught on t140 (29b7e14918178f7e) after D16 fan-out chain shipped:
- /dashboard is empty (no treemap rendered)
- "none of the k8s resources are streaming"

Root cause: after the D16 secondary-kubeconfig export (PR #1579/#1581)
landed, chroot's k8sCache went from 1 cluster (primary self-register)
to 3 clusters (primary + 2 secondaries). Two cascading bugs:

1. resolveChrootClusterID had a `len(clusters) != 1` guard — it only
   aliased when chroot had exactly one cluster. After D16 it returned
   the URL deployment_id unchanged → has-cluster check failed →
   every chroot handler (networking, k8s_search, k8s_resource_metrics,
   k8s_exec, dashboard) saw "not found" → returned empty.

2. dashboard.go::GetDashboardTreemap was the one chroot handler that
   didn't call resolveChrootClusterID before the has-cluster check —
   so even with #1 fixed, the dashboard would still 404.

Fix:
- resolveChrootClusterID: when N>1, prefer the cluster whose id is
  prefixed "sovereign-" (the FactoryFromEnv self-registered primary
  per buildChrootClusterRef). Falls back to clusters[0] if no match.
- GetDashboardTreemap: call resolveChrootClusterID before has-cluster
  check, matching the pattern in every other chroot handler.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md (don't ship
D16 fan-out without verifying every handler that depends on
single-cluster k8sCache assumption).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 10:59:43 +04:00
e3mrah
8c1ccfae07
chore(bootstrap-kit): pin bp-catalyst-platform 1.4.147→1.4.148 (#1586)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalog): D27 — fresh-seed apps default Published+Deployable

Founder caught on t136: marketplace.t136/apps shows blank application
grid. Root cause: catalog seed.go calls migrateAppPublished +
migrateAppDeployable ONLY on the "already populated" path. On a fresh
Sovereign install (empty catalog) seedAllData inserts 27 rows with
zero-value bools — Published=false, Deployable=false. The marketplace
storefront filters with `?published=true`, gets [], renders blank.

Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished
+ seedSystemApps. Both migrations are idempotent (skip rows already
true), so re-runs are safe.

Verified the bug live on t138 (eaaee1ea24184c2a):
  http://catalog.sme:8082/catalog/apps returns 27 apps
  http://catalog.sme:8082/catalog/apps?published=true returns 0

With this fix the latter returns 27.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign

Founder caught on t136: console.t136.../app/bp-alloy renders the
catalog grid (AppsPage) instead of AppDetail. Three earlier PRs
(#1572 + chart bumps) flipped the appRoute beforeLoad logic but
the actual route-matching collision was not fixed.

Root cause: appRoute.addChildren registers appDeploymentRoute at
`/$deploymentId` (effective `/app/$deploymentId`, mother-only)
BEFORE consoleLayoutRoute registers consoleAppDetailRoute at
`/app/$componentId`. TanStack Router resolves equally-specific
dynamic routes by declaration order — so on the Sovereign Console
URL `/app/bp-alloy` matches appDeploymentRoute first and renders
AppsPage with deploymentId="bp-alloy".

Fix: at routeTree build time, filter appRoute children to exclude
every mother-only `/$deploymentId/*` route when running on
Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this
is a one-time check, no runtime overhead. With those routes
absent, consoleAppDetailRoute is the only matcher for
`/app/<componentId>` on Sovereign Console — AppDetail renders.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(bootstrap-kit): pin bp-catalyst-platform 1.4.147→1.4.148

Founder-flagged bug fixes from session t136/t138/t139 verify cycle
shipped 3 PRs that bumped catalyst chart Chart.yaml to 1.4.148
(d985f27c) with new images:
- catalystApi/Ui: 2ab8a0e (PR #1583 D16 fan-out + retry + auth-bypass,
  PR #1585 D17 router collision)
- smeTag: 964dc15 (PR #1584 D27 catalog fresh-seed Published)

But bootstrap-kit/13-bp-catalyst-platform.yaml stayed pinned to
1.4.147 — every fresh provision installs the OLDER chart with the
OLDER images, so the founder-flagged bugs persist.

Caught on t139 (b4a7ee052d844da0) post-handover verify: chart
installed = bp-catalyst-platform@1.4.147, catalog returns 0
published apps, /app/bp-alloy renders catalog grid.

Bumping the pin makes fresh provs install 1.4.148 (which has all 3
PRs baked).

Refs: feedback_test_theater_3rd_violation_2026_05_17.md
      feedback_overlap_provs_dont_serialize_wait.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 10:15:16 +04:00
github-actions[bot]
b61e9afabf deploy: update catalyst images to 2ab8a0e 2026-05-17 05:37:01 +00:00
e3mrah
2ab8a0e653
fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign (#1585)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalog): D27 — fresh-seed apps default Published+Deployable

Founder caught on t136: marketplace.t136/apps shows blank application
grid. Root cause: catalog seed.go calls migrateAppPublished +
migrateAppDeployable ONLY on the "already populated" path. On a fresh
Sovereign install (empty catalog) seedAllData inserts 27 rows with
zero-value bools — Published=false, Deployable=false. The marketplace
storefront filters with `?published=true`, gets [], renders blank.

Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished
+ seedSystemApps. Both migrations are idempotent (skip rows already
true), so re-runs are safe.

Verified the bug live on t138 (eaaee1ea24184c2a):
  http://catalog.sme:8082/catalog/apps returns 27 apps
  http://catalog.sme:8082/catalog/apps?published=true returns 0

With this fix the latter returns 27.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign

Founder caught on t136: console.t136.../app/bp-alloy renders the
catalog grid (AppsPage) instead of AppDetail. Three earlier PRs
(#1572 + chart bumps) flipped the appRoute beforeLoad logic but
the actual route-matching collision was not fixed.

Root cause: appRoute.addChildren registers appDeploymentRoute at
`/$deploymentId` (effective `/app/$deploymentId`, mother-only)
BEFORE consoleLayoutRoute registers consoleAppDetailRoute at
`/app/$componentId`. TanStack Router resolves equally-specific
dynamic routes by declaration order — so on the Sovereign Console
URL `/app/bp-alloy` matches appDeploymentRoute first and renders
AppsPage with deploymentId="bp-alloy".

Fix: at routeTree build time, filter appRoute children to exclude
every mother-only `/$deploymentId/*` route when running on
Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this
is a one-time check, no runtime overhead. With those routes
absent, consoleAppDetailRoute is the only matcher for
`/app/<componentId>` on Sovereign Console — AppDetail renders.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 09:34:01 +04:00
github-actions[bot]
d985f27c8b deploy: update sme service images to 964dc15 + bump chart to 1.4.148 2026-05-17 05:29:35 +00:00
e3mrah
964dc15570
fix(catalog): D27 — fresh-seed apps default Published+Deployable (#1584)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalog): D27 — fresh-seed apps default Published+Deployable

Founder caught on t136: marketplace.t136/apps shows blank application
grid. Root cause: catalog seed.go calls migrateAppPublished +
migrateAppDeployable ONLY on the "already populated" path. On a fresh
Sovereign install (empty catalog) seedAllData inserts 27 rows with
zero-value bools — Published=false, Deployable=false. The marketplace
storefront filters with `?published=true`, gets [], renders blank.

Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished
+ seedSystemApps. Both migrations are idempotent (skip rows already
true), so re-runs are safe.

Verified the bug live on t138 (eaaee1ea24184c2a):
  http://catalog.sme:8082/catalog/apps returns 27 apps
  http://catalog.sme:8082/catalog/apps?published=true returns 0

With this fix the latter returns 27.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 09:28:35 +04:00
github-actions[bot]
f7ea19000e deploy: update catalyst images to 9fc2850 2026-05-17 05:28:28 +00:00
e3mrah
9fc2850504
fix(catalyst-api): D16/D17 — 3 bugs caught on t138 fresh prov (#1583)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 09:26:16 +04:00
github-actions[bot]
ccbe51e3e4 deploy: update catalyst images to 9237c1e 2026-05-17 04:48:41 +00:00
e3mrah
9237c1e6ee
fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) (#1582)
PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:45:49 +04:00
e3mrah
ce4ef6ba98
feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B) (#1581)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)

When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.

Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).

For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.

PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B)

Closes the D16 multi-cluster fan-out chain:
- PR #1579 (PR A): chroot endpoint accepts kubeconfigs
- PR #1580 (PR C): dashboard handler fans out across registered clusters
- This PR (PR B): mothership-side hook iterates secondary regions at
  handover, reads each region's kubeconfig from the mothership PVC,
  and POSTs to the chroot's endpoint

After handover-fire, exportSecondaryKubeconfigsToChild fires as a
goroutine (alongside exportDeploymentToChild). Best-effort per region:
a failure on region N doesn't abort N+1.

The chroot's k8sCache.Factory.AddCluster runs on every POST so
dashboard /api/v1/dashboard/treemap?group_by=cluster|region now
enumerates pods from all N regions and Layer-1=Cluster renders N
bubbles on an N-region Sovereign.

regionKeysForExport derives the filename convention `<region>-<slot>`
from dep.Request.Regions[1:] (primary is auto-registered by the
chroot's FactoryFromEnv self-registration so we skip index 0).

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are read with stdlib os.ReadFile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:22:01 +04:00
github-actions[bot]
b07e5206a1 deploy: update catalyst images to d92f734 2026-05-17 04:09:34 +00:00
e3mrah
d92f734374
feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C) (#1580)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)

When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.

Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).

For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.

PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:07:26 +04:00
e3mrah
bcab6430cb
feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A) (#1579)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:06:08 +04:00
github-actions[bot]
6e329e27ae deploy: update catalyst images to 4f62dd2 2026-05-17 00:10:50 +00:00
e3mrah
4f62dd21b3
fix(handover): D21 owner seed uses tierRoleRef not wildcard app (#1578)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 04:08:45 +04:00
github-actions[bot]
6466f97f6c deploy: update catalyst images to ea30ded 2026-05-16 23:28:04 +00:00
e3mrah
ea30ded120
fix(handover): D21 owner seed uses catalyst-system namespace (#1577)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:26:06 +04:00
github-actions[bot]
18b5fa1466 deploy: update catalyst images to 33ed484 2026-05-16 23:24:34 +00:00
e3mrah
33ed484e04
fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30) (#1576)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:21:42 +04:00
github-actions[bot]
a65a024114 deploy: update catalyst images to c148ec6 2026-05-16 22:33:19 +00:00
e3mrah
c148ec6a34
fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22) (#1575)
PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:31:26 +04:00
github-actions[bot]
c5f777056f deploy: update catalyst images to 3568b72 2026-05-16 22:20:19 +00:00
e3mrah
3568b72b5e
fix(cloud): hide non-active 0/0 chips (D15) (#1574)
* feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22)

Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): add D22 sovereign-side identity placeholders

Add ${ORG_EMAIL:-} + ${ORG_NAME:-} + ${SOVEREIGN_CONTROL_PLANE_IP:-} +
${GITOPS_REPO_URL:-} envsubst placeholders so when cloud-init wires
them, the chart picks them up via sovereign-fqdn ConfigMap (PR #1569)
→ catalyst-api env → chrootEnsureDeployment populates the deployment
record → Settings page renders real values instead of `—`.

This PR alone is a no-op (placeholders default to empty, same as today).
The cloud-init substitute lines + provisioner.go tfvars need to land in
a companion PR to actually populate the values on next-prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloudinit): wire ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL substitutes (D22)

Companion to #1567+#1568+#1569+#1570 — the cloud-init substitute block
now emits ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL into the bootstrap-kit
Kustomization's postBuild.substitute env, which the slot-13 placeholders
(#1570) consume via ${ORG_EMAIL:-}/${ORG_NAME:-}/${GITOPS_REPO_URL:-}.

Chain: provisioner.go writeTfvars → tofu vars → cloudinit templatefile
substitute → Flux Kustomization postBuild → sovereign-fqdn ConfigMap
keys (#1569) → catalyst-api env (#1569) → chrootEnsureDeployment
populates the deployment record (#1567 + #1568 fallback).

SOVEREIGN_CONTROL_PLANE_IP omitted intentionally — main.tf:691 notes
the dependency cycle (hcloud_server.cp doesn't exist at cloudinit
render time). Separate PR will source it via metadata-service or
post-create ConfigMap patch.

Next-prov (t133+) Sovereign Console Settings page now renders real
ownerEmail/orgName/gitopsRepoURL instead of `—` placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(router): chroot /app/<name> only-redirect mothership-only sub-paths (D17/D17b)

PR #1552 stripped the `/app` prefix on Sovereign mode to make
`/app/bp-cnpg` → `/bp-cnpg`, hoping consoleAppDetailRoute would match.
But consoleAppDetailRoute is registered at `/app/$componentId` under
consoleLayoutRoute — no chroot route matches `/<componentId>` directly,
so stripping leaves an empty render path. Playwright walkthrough on
t132 2026-05-17 confirmed: /app/bp-cnpg + /app/bp-coraza both render
body_len=9 (empty).

Invert the logic: only redirect mothership-only sub-paths (/dashboard
Fleet view, /install wizard, /sre, /sec, /blueprints) which have no
Sovereign Console equivalent. For everything else (component names like
`/app/bp-cnpg`, bare `/app`), let TanStack's natural most-specific-match
pick consoleAppDetailRoute / consoleAppsRoute.

Caught live on t132 via Playwright walker3.js — agent a4825c5a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): re-mint handover JWT on every GetDeployment (D0)

D0 Playwright walkthrough on t132 2026-05-17 caught: handoverURL
persisted at handover-fire time carries a JWT that expires per
DefaultTTL (5min). Operators who click /jobs hours later get the stale
token → Sovereign-side /auth/handover rejects with raw JSON
{"error":"invalid token"} — no UI fallback, no /auth/handover-error,
auto-redirect to /dashboard never fires.

Re-mint the JWT on every GetDeployment when deployment is ready +
handover-fired so the URL returned to the wizard is always
freshly-signed.

Best-effort: on mint failure, leave the existing URL in place so a
transient signer error doesn't break polling. Helper is idempotent +
locked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): hide non-active 0/0 chips (D15)

Playwright walkthrough on t132 2026-05-17 caught D15 PARTIAL: 15 chips
are correct but Bucket+Volume show 0/0. Founder rule (DoD D15):
"No kind chip shows 0/0 for a resource that actually exists in the
cluster". Bucket+Volume genuinely don't exist on this Sovereign so
showing 0/0 is noise.

Hide chips with count exactly 0 unless they're the active selection
(operator who navigated to an empty kind keeps context).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:18:24 +04:00
github-actions[bot]
44e612f39d deploy: update catalyst images to 58dbb92 2026-05-16 22:18:16 +00:00
e3mrah
58dbb92f4f
fix(handover): re-mint handover JWT on every GetDeployment (D0) (#1573)
* feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22)

Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): add D22 sovereign-side identity placeholders

Add ${ORG_EMAIL:-} + ${ORG_NAME:-} + ${SOVEREIGN_CONTROL_PLANE_IP:-} +
${GITOPS_REPO_URL:-} envsubst placeholders so when cloud-init wires
them, the chart picks them up via sovereign-fqdn ConfigMap (PR #1569)
→ catalyst-api env → chrootEnsureDeployment populates the deployment
record → Settings page renders real values instead of `—`.

This PR alone is a no-op (placeholders default to empty, same as today).
The cloud-init substitute lines + provisioner.go tfvars need to land in
a companion PR to actually populate the values on next-prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloudinit): wire ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL substitutes (D22)

Companion to #1567+#1568+#1569+#1570 — the cloud-init substitute block
now emits ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL into the bootstrap-kit
Kustomization's postBuild.substitute env, which the slot-13 placeholders
(#1570) consume via ${ORG_EMAIL:-}/${ORG_NAME:-}/${GITOPS_REPO_URL:-}.

Chain: provisioner.go writeTfvars → tofu vars → cloudinit templatefile
substitute → Flux Kustomization postBuild → sovereign-fqdn ConfigMap
keys (#1569) → catalyst-api env (#1569) → chrootEnsureDeployment
populates the deployment record (#1567 + #1568 fallback).

SOVEREIGN_CONTROL_PLANE_IP omitted intentionally — main.tf:691 notes
the dependency cycle (hcloud_server.cp doesn't exist at cloudinit
render time). Separate PR will source it via metadata-service or
post-create ConfigMap patch.

Next-prov (t133+) Sovereign Console Settings page now renders real
ownerEmail/orgName/gitopsRepoURL instead of `—` placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(router): chroot /app/<name> only-redirect mothership-only sub-paths (D17/D17b)

PR #1552 stripped the `/app` prefix on Sovereign mode to make
`/app/bp-cnpg` → `/bp-cnpg`, hoping consoleAppDetailRoute would match.
But consoleAppDetailRoute is registered at `/app/$componentId` under
consoleLayoutRoute — no chroot route matches `/<componentId>` directly,
so stripping leaves an empty render path. Playwright walkthrough on
t132 2026-05-17 confirmed: /app/bp-cnpg + /app/bp-coraza both render
body_len=9 (empty).

Invert the logic: only redirect mothership-only sub-paths (/dashboard
Fleet view, /install wizard, /sre, /sec, /blueprints) which have no
Sovereign Console equivalent. For everything else (component names like
`/app/bp-cnpg`, bare `/app`), let TanStack's natural most-specific-match
pick consoleAppDetailRoute / consoleAppsRoute.

Caught live on t132 via Playwright walker3.js — agent a4825c5a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): re-mint handover JWT on every GetDeployment (D0)

D0 Playwright walkthrough on t132 2026-05-17 caught: handoverURL
persisted at handover-fire time carries a JWT that expires per
DefaultTTL (5min). Operators who click /jobs hours later get the stale
token → Sovereign-side /auth/handover rejects with raw JSON
{"error":"invalid token"} — no UI fallback, no /auth/handover-error,
auto-redirect to /dashboard never fires.

Re-mint the JWT on every GetDeployment when deployment is ready +
handover-fired so the URL returned to the wizard is always
freshly-signed.

Best-effort: on mint failure, leave the existing URL in place so a
transient signer error doesn't break polling. Helper is idempotent +
locked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:16:26 +04:00
github-actions[bot]
dea683f5e4 deploy: update catalyst images to 9e1e422 2026-05-16 22:08:01 +00:00
e3mrah
9e1e4224d8
fix(router): chroot /app/<name> only-redirect mothership-only sub-paths (D17/D17b) (#1572)
* feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22)

Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): add D22 sovereign-side identity placeholders

Add ${ORG_EMAIL:-} + ${ORG_NAME:-} + ${SOVEREIGN_CONTROL_PLANE_IP:-} +
${GITOPS_REPO_URL:-} envsubst placeholders so when cloud-init wires
them, the chart picks them up via sovereign-fqdn ConfigMap (PR #1569)
→ catalyst-api env → chrootEnsureDeployment populates the deployment
record → Settings page renders real values instead of `—`.

This PR alone is a no-op (placeholders default to empty, same as today).
The cloud-init substitute lines + provisioner.go tfvars need to land in
a companion PR to actually populate the values on next-prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloudinit): wire ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL substitutes (D22)

Companion to #1567+#1568+#1569+#1570 — the cloud-init substitute block
now emits ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL into the bootstrap-kit
Kustomization's postBuild.substitute env, which the slot-13 placeholders
(#1570) consume via ${ORG_EMAIL:-}/${ORG_NAME:-}/${GITOPS_REPO_URL:-}.

Chain: provisioner.go writeTfvars → tofu vars → cloudinit templatefile
substitute → Flux Kustomization postBuild → sovereign-fqdn ConfigMap
keys (#1569) → catalyst-api env (#1569) → chrootEnsureDeployment
populates the deployment record (#1567 + #1568 fallback).

SOVEREIGN_CONTROL_PLANE_IP omitted intentionally — main.tf:691 notes
the dependency cycle (hcloud_server.cp doesn't exist at cloudinit
render time). Separate PR will source it via metadata-service or
post-create ConfigMap patch.

Next-prov (t133+) Sovereign Console Settings page now renders real
ownerEmail/orgName/gitopsRepoURL instead of `—` placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(router): chroot /app/<name> only-redirect mothership-only sub-paths (D17/D17b)

PR #1552 stripped the `/app` prefix on Sovereign mode to make
`/app/bp-cnpg` → `/bp-cnpg`, hoping consoleAppDetailRoute would match.
But consoleAppDetailRoute is registered at `/app/$componentId` under
consoleLayoutRoute — no chroot route matches `/<componentId>` directly,
so stripping leaves an empty render path. Playwright walkthrough on
t132 2026-05-17 confirmed: /app/bp-cnpg + /app/bp-coraza both render
body_len=9 (empty).

Invert the logic: only redirect mothership-only sub-paths (/dashboard
Fleet view, /install wizard, /sre, /sec, /blueprints) which have no
Sovereign Console equivalent. For everything else (component names like
`/app/bp-cnpg`, bare `/app`), let TanStack's natural most-specific-match
pick consoleAppDetailRoute / consoleAppsRoute.

Caught live on t132 via Playwright walker3.js — agent a4825c5a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:05:54 +04:00
github-actions[bot]
4cc880cafd deploy: update catalyst images to 5793958 2026-05-16 21:48:54 +00:00
e3mrah
57939585c0
feat(cloudinit): wire ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL substitutes (D22) (#1571)
* feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22)

Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): add D22 sovereign-side identity placeholders

Add ${ORG_EMAIL:-} + ${ORG_NAME:-} + ${SOVEREIGN_CONTROL_PLANE_IP:-} +
${GITOPS_REPO_URL:-} envsubst placeholders so when cloud-init wires
them, the chart picks them up via sovereign-fqdn ConfigMap (PR #1569)
→ catalyst-api env → chrootEnsureDeployment populates the deployment
record → Settings page renders real values instead of `—`.

This PR alone is a no-op (placeholders default to empty, same as today).
The cloud-init substitute lines + provisioner.go tfvars need to land in
a companion PR to actually populate the values on next-prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloudinit): wire ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL substitutes (D22)

Companion to #1567+#1568+#1569+#1570 — the cloud-init substitute block
now emits ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL into the bootstrap-kit
Kustomization's postBuild.substitute env, which the slot-13 placeholders
(#1570) consume via ${ORG_EMAIL:-}/${ORG_NAME:-}/${GITOPS_REPO_URL:-}.

Chain: provisioner.go writeTfvars → tofu vars → cloudinit templatefile
substitute → Flux Kustomization postBuild → sovereign-fqdn ConfigMap
keys (#1569) → catalyst-api env (#1569) → chrootEnsureDeployment
populates the deployment record (#1567 + #1568 fallback).

SOVEREIGN_CONTROL_PLANE_IP omitted intentionally — main.tf:691 notes
the dependency cycle (hcloud_server.cp doesn't exist at cloudinit
render time). Separate PR will source it via metadata-service or
post-create ConfigMap patch.

Next-prov (t133+) Sovereign Console Settings page now renders real
ownerEmail/orgName/gitopsRepoURL instead of `—` placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:47:04 +04:00
e3mrah
700d28967f
chore(slot-13): add D22 sovereign-side identity placeholders (#1570)
* feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22)

Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): add D22 sovereign-side identity placeholders

Add ${ORG_EMAIL:-} + ${ORG_NAME:-} + ${SOVEREIGN_CONTROL_PLANE_IP:-} +
${GITOPS_REPO_URL:-} envsubst placeholders so when cloud-init wires
them, the chart picks them up via sovereign-fqdn ConfigMap (PR #1569)
→ catalyst-api env → chrootEnsureDeployment populates the deployment
record → Settings page renders real values instead of `—`.

This PR alone is a no-op (placeholders default to empty, same as today).
The cloud-init substitute lines + provisioner.go tfvars need to land in
a companion PR to actually populate the values on next-prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:29:59 +04:00
github-actions[bot]
df193d340e deploy: update catalyst images to 9cbcd23 2026-05-16 21:03:01 +00:00
e3mrah
9cbcd230da
feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22) (#1569)
Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:01:00 +04:00
github-actions[bot]
0e0280bbe0 deploy: update catalyst images to 6618392 2026-05-16 20:56:10 +00:00