docs: map staging and production backend behavior
This commit is contained in:
148
docs/environment-matrix.md
Normal file
148
docs/environment-matrix.md
Normal file
@@ -0,0 +1,148 @@
|
||||
# Environment Matrix
|
||||
|
||||
PRM-42 staging vs production separation inventory for `growqr-backend`.
|
||||
|
||||
No refactor was performed in this pass.
|
||||
|
||||
## Current Environment Model
|
||||
|
||||
The backend currently uses `config.nodeEnv` plus many individual env vars. There is no explicit first-class `environment` such as `development | staging | production | demo`.
|
||||
|
||||
Important consequence: local/dev defaults can leak into staging or production unless deployment env vars override every sensitive value.
|
||||
|
||||
## Current Config Inventory
|
||||
|
||||
| Area | Config/env | Current default | Production concern |
|
||||
| --- | --- | --- | --- |
|
||||
| Runtime | `PORT`, `LOG_LEVEL`, `NODE_ENV` | `4000`, `info`, `development` | `NODE_ENV` is too broad for staging/demo behavior. |
|
||||
| Database | `DATABASE_URL` | hardcoded fallback DSN in `config.ts` | Production should fail fast instead of falling back. |
|
||||
| Auth | `CLERK_SECRET_KEY`, `CLERK_PUBLISHABLE_KEY` | empty | Secret key absence changes auth behavior; publishable key appears underused. |
|
||||
| Service auth | `SERVICE_TOKEN`, `A2A_ALLOWED_KEY` | empty / `dev-a2a-key` | Dev token fallback must not be accepted in production. |
|
||||
| Redis events | `GROW_EVENTS_REDIS_URL`, `REDIS_URL`, stream/group/consumer names | disabled unless set | Staging/prod need explicit stream, group, and replay policy. |
|
||||
| Legacy Redis | `INTERVIEW_REDIS_URL`, `ROLEPLAY_REDIS_URL`, `RESUME_REDIS_URL` | fallback to event Redis | Legacy observation should be explicitly enabled per environment. |
|
||||
| LLM | `LLM_PROVIDER`, `LLM_API_KEY`, `OPENCODE_API_KEY`, `LLM_BASE_URL`, `GROW_AGENT_MODEL`, `LLM_MODEL` | `opencode`, `https://opencode.ai/zen/v1`, `kimi-k2.6` | Staging/prod should pin provider/model and require API key where features are enabled. |
|
||||
| Rivet | `RIVET_ENDPOINT`, `RIVET_CLIENT_ENDPOINT` | localhost/127.0.0.1 | Docker compose overrides endpoint; production needs internal and public separation. |
|
||||
| Product services | `INTERVIEW_SERVICE_URL`, `ROLEPLAY_SERVICE_URL`, `QSCORE_SERVICE_URL`, `RESUME_SERVICE_URL`, `USER_SERVICE_URL`, `MATCHMAKING_SERVICE_URL`, `SOCIAL_BRANDING_SERVICE_URL` | localhost ports | Production should require service URLs or feature-disable explicitly. |
|
||||
| Public URLs | `INTERVIEW_PUBLIC_URL`, `ROLEPLAY_PUBLIC_URL`, `RESUME_PUBLIC_URL`, `WORKFLOWS_DASHBOARD_URL`, `FRONTEND_ORIGIN` | localhost/frontend fallback | Public and internal service URLs need separate semantics. |
|
||||
| Gitea | `GITEA_PUBLIC_URL`, `GITEA_INTERNAL_URL`, `GITEA_ADMIN_USER`, `GITEA_ADMIN_PASSWORD`, `GITEA_ADMIN_TOKEN`, `GITEA_ORG_NAME` | localhost, `growqr-admin`, `growqr-admin-dev`, empty token | Admin password fallback is dev-only. Production should require token/secret. |
|
||||
| OpenCode | `OPENCODE_IMAGE`, `OPENCODE_IMAGE_VERSION`, `MIGRATION_VERSION`, `PROMPT_VERSION`, `USER_CONTAINER_HOST`, `USER_DATA_ROOT`, `USER_PORT_RANGE_*` | dev image/version, local paths/ports | Needs staging/prod image tags and storage policy. |
|
||||
| CORS/admin | `FRONTEND_ORIGIN`, `ADMIN_USER_IDS` | localhost / empty | Empty admin list currently allows `/workflows/admin/ops` to all authenticated users. |
|
||||
| Agent limits | `MAX_AGENT_TOKENS`, `PROJECTION_AGENT_MODEL`, `CONVERSATION_ACTOR_MODEL` | 4096 / agent model | Model overrides should be pinned by environment. |
|
||||
|
||||
## Environment-Dependent Code Paths
|
||||
|
||||
| File | Behavior |
|
||||
| --- | --- |
|
||||
| `src/config.ts` | Central env parsing with dev defaults for database, tokens, local service URLs, Gitea, OpenCode, Rivet, frontend, and ports. |
|
||||
| `src/auth/clerk.ts` | In non-production, `A2A_ALLOWED_KEY` is accepted as an auth fallback. Clerk client is only created when `CLERK_SECRET_KEY` exists. |
|
||||
| `src/index.ts` | Proxies `/api/rivet` only when `process.env.RIVET_ENDPOINT` is set. Starts Redis consumer opportunistically. CORS uses `FRONTEND_ORIGIN`. |
|
||||
| `src/events/redis-consumer.ts` | Canonical consumer disabled if no Redis URL. Legacy observers enabled by legacy Redis URLs. |
|
||||
| `src/events/projectors/projection-agent.ts` | Falls back if no LLM API key; model can be overridden by `PROJECTION_AGENT_MODEL`. |
|
||||
| `src/actors/conversation/agent.ts` | Requires LLM key for streaming; model can be overridden by `CONVERSATION_ACTOR_MODEL`. |
|
||||
| `src/routes/events.ts` | Service ingest auth allows no service token in non-production. |
|
||||
| `src/routes/home.ts` | Exposes demo seeding route. |
|
||||
| `src/home/seed-demo-home.ts` | Demo notifications and executable direct script behavior. |
|
||||
| `src/services/service-agents.ts` | Synthetic/demo fallbacks for some unavailable services and Q Score estimate behavior. |
|
||||
| `src/docker/manager.ts` | Uses Gitea/OpenCode image/version/host/path/port config and mutates Docker runtime. |
|
||||
| `scripts/rivet-actors.ts` | Uses dev Rivet namespace/token defaults. |
|
||||
| `docker-compose.yml` | Dev compose defaults for Postgres, Gitea, Rivet, backend, services, frontend origins, and OpenCode image. |
|
||||
| `docker/opencode/*` | Dev-oriented OpenCode image/template behavior. |
|
||||
|
||||
## Hardcoded URL and Default Hotspots
|
||||
|
||||
- `http://localhost:*` defaults in `src/config.ts`, `.env.example`, `README.md`, and `docker-compose.yml`.
|
||||
- `http://127.0.0.1:*` defaults for Rivet client, Gitea, and user container host.
|
||||
- `http://host.docker.internal:*` compose service defaults.
|
||||
- OpenCode base image `ghcr.io/anomalyco/opencode:latest` in `docker/opencode/Dockerfile`.
|
||||
- Dev image tag `growqr/opencode:dev`.
|
||||
- Gitea admin defaults `growqr-admin` / `growqr-admin-dev`.
|
||||
- A2A fallback `dev-a2a-key`.
|
||||
|
||||
## Clerk / JWKS Assumptions
|
||||
|
||||
The code uses Clerk SDK with `CLERK_SECRET_KEY`; there is no explicit JWKS URL configuration in the reviewed backend source. Service-to-service auth is token based, with dev fallback behavior. Target production should document whether auth is:
|
||||
|
||||
- Clerk session token verification for user requests.
|
||||
- `SERVICE_TOKEN` for service-to-backend event ingestion.
|
||||
- Separate internal A2A key for legacy product service calls.
|
||||
- Optional JWKS validation if services send JWTs instead of opaque service tokens.
|
||||
|
||||
## Target Config Model
|
||||
|
||||
Introduce:
|
||||
|
||||
```ts
|
||||
type RuntimeEnvironment = "development" | "test" | "staging" | "demo" | "production";
|
||||
```
|
||||
|
||||
Recommended top-level config shape:
|
||||
|
||||
```ts
|
||||
config.environment
|
||||
config.isProduction
|
||||
config.isStaging
|
||||
config.isDemo
|
||||
config.features.demoDataEnabled
|
||||
config.features.legacyRedisObserversEnabled
|
||||
config.features.opencodeProvisioningEnabled
|
||||
config.features.serviceProxyEnabled
|
||||
config.urls.internal.*
|
||||
config.urls.public.*
|
||||
config.auth.*
|
||||
config.retry.*
|
||||
config.events.*
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- Production must fail fast for missing `DATABASE_URL`, `CLERK_SECRET_KEY`, `SERVICE_TOKEN`, `FRONTEND_ORIGIN`, Gitea credentials/token, and any enabled service URL.
|
||||
- Staging may use staging service URLs and demo data only when `DEMO_DATA_ENABLED=true`.
|
||||
- Development may keep local defaults.
|
||||
- Demo behavior should be impossible in production unless an explicit, audited flag is set and the route remains auth/admin-gated.
|
||||
|
||||
## What Should Move to `src/staging`
|
||||
|
||||
Proposed `src/staging` candidates:
|
||||
|
||||
- `home/seed-demo-home.ts`
|
||||
- `/home/seed-demo` route handler
|
||||
- demo notification factories
|
||||
- demo Q Score formulas/fallback constants in service-agent behavior, if not product-approved
|
||||
- local-only service session scaffolding helpers
|
||||
- any future seeders/backfills used only for demos
|
||||
|
||||
Suggested layout:
|
||||
|
||||
```txt
|
||||
src/staging/
|
||||
demo-home.ts
|
||||
demo-qscore.ts
|
||||
seed-routes.ts
|
||||
guards.ts
|
||||
```
|
||||
|
||||
`src/staging/guards.ts` should expose `requireStagingOrDemo(config)` and fail closed in production.
|
||||
|
||||
## Target Environment Matrix
|
||||
|
||||
| Behavior | Development | Staging | Demo | Production |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Localhost defaults | Allowed | Not allowed | Not allowed unless local demo | Not allowed |
|
||||
| Demo seed endpoints | Allowed | Explicit flag + admin | Enabled by flag + admin | Disabled |
|
||||
| Service token fallback | Allowed | Not allowed | Not allowed | Not allowed |
|
||||
| Legacy Redis observers | Optional | Explicit flag | Explicit flag | Disable unless migration requires |
|
||||
| Redis canonical events | Optional | Required for event demos | Required | Required |
|
||||
| OpenCode image | `:dev` ok | pinned staging tag | pinned demo tag | pinned release tag |
|
||||
| Admin ops route | Authenticated maybe ok | `ADMIN_USER_IDS` required | `ADMIN_USER_IDS` required | `ADMIN_USER_IDS` required |
|
||||
| Missing Clerk secret | Allowed only for local mock if implemented | Fail | Fail | Fail |
|
||||
| Gitea admin password default | Allowed | Fail | Fail | Fail |
|
||||
|
||||
## Priority Recommendations
|
||||
|
||||
1. Add `APP_ENV` or `GROWQR_ENV` and derive `config.environment`; stop relying on `NODE_ENV` for product behavior.
|
||||
2. Fail fast in staging/production for missing secrets and localhost/default service URLs.
|
||||
3. Move demo seed code into `src/staging` and guard routes with `DEMO_DATA_ENABLED` plus admin check.
|
||||
4. Require `ADMIN_USER_IDS` before enabling `/workflows/admin/ops` outside development.
|
||||
5. Split public URLs and internal URLs in config names consistently across frontend, services, Gitea, Rivet, and OpenCode.
|
||||
6. Add a deployment checklist that records every required env var per environment.
|
||||
7. Make legacy Redis observers an explicit feature flag and set a removal date.
|
||||
Reference in New Issue
Block a user