Audience: engineering, QA, and release/audit reviewers. This documents the formal test harness in tests/ that supersedes the ad-hoc test-api.sh smoke script. It validates normal/typical product operations end to end.

How the harness is organised

The harness separates what an operation is from how it is verified, in three layers:

Procedures

One API operation per file (e.g. createPolicy, registerDevice). Reusable, typed building blocks under src/procedures/.

Cases

Ordered, asserted scenarios composing procedures (e.g. Policy CRUD) under src/cases/*.case.test.ts. Each step is individually reported.

Plans

Grouping of cases by capability with an objective. Catalogued in src/catalogue.ts for traceability.

Pass / fail determination

The harness uses Vitest, giving rigorous, machine-gradeable results — a step up from the old script which printed messages but never aggregated a verdict.
MechanismBehaviour
AssertionsEach step asserts HTTP status and response/DB shape with expect.
VerdictA case fails if any assertion throws; the run exits non-zero on any failure (CI-gateable).
IsolationEvery case provisions its own organisation and tears it down — no shared state between cases.
Reportsnpm run test:ci emits reports/junit.xml and reports/results.json for dashboards.

Type awareness

Tests are validated against the same generated database types the application uses. tests/src/types/database.ts re-exports Database, Tables, TablesInsert, TablesUpdate from the frontend’s database.types.ts, and the Supabase clients are typed with Database. REST/RPC procedures (.from('policies'), .update(...)) are therefore checked column-by-column at compile time.
Newly added tables not yet in the generated types (currently organisation_deletion_log) use a clearly-marked untyped escape hatch until supabase gen types is re-run.

API fidelity (no direct SQL)

A core principle of this harness is that tests exercise the product the way real clients do — through edge functions and PostgREST/RPC with a user JWT (RLS enforced) — never via privileged SQL. The web and mobile apps talk to PostgREST with the user’s token; the harness does the same via ctx.asAdmin. Every feature procedure uses a real API. The service-role client (which bypasses RLS, a path no client can take) is confined to two non-product roles:
UsageWhereWhy it’s not an API call
Confirm the bootstrap admin’s emailtest fixturesThe first admin of a brand-new org can’t be invited in-app. It registers via the public auth.signUp API; only the email-confirmation step uses admin access, standing in for the user clicking the confirmation link.
Cleanup leftoverstest fixturesSafety-net teardown runs only after a mid-run failure; teardownOrg calls the offboard API first.
Verify deletionlifecycle caseAfter offboard the user, session and rows are gone (and RLS-hidden); proving their absence and the audit record requires the privileged client.
With email confirmation enabled, the bootstrap admin is created in USER_SEED_MODE=signup_confirm: real public signup + an admin email-confirm that mimics clicking the link. Members are added through the real invite-user function. So the only irreducible admin access is the single confirmation step, cleanup, and post-deletion verification.
All operations under test are driven through edge functions or user-JWT PostgREST/RPC.
Service-role access appears only in scaffolding and post-deletion verification, each clearly commented in code.

Environment & safety

These tests perform destructive operations (they create and then delete organisations and users).
Running against any non-local target requires ALLOW_DESTRUCTIVE_TESTS=true. Test data is namespaced (e2e-test-*, qa+*) and the safety-net teardown can only remove resources the run itself created — never a pre-existing tenant.
User accounts are seeded directly via the Admin API with known passwords and no invitation email, which is how the harness avoids the email round-trip that made the production invite flow untestable.

Running

cd tests
npm install
cp .env.test.example .env.test   # fill in keys + ALLOW_DESTRUCTIVE_TESTS
npm test            # all plans, non-zero exit on failure
npm run test:ci     # + JUnit/JSON reports
npm run test:cases  # just the case suites

Test Plan Catalogue

Ten Test Plans cover the typical product operations ported from test-api.sh, plus the onboard/offboard lifecycle.

TP-ORG — Organisation Lifecycle

Verify an organisation can be onboarded and fully erased (GDPR), with cascade deletion and an audit record.
CaseSteps
TC-LIFECYCLE — onboard → offboardSeed admin → onboard → default profile via trigger → offboard rejected without confirmation → cascade delete → auth PII erased → audit record persists

TP-DEVICE — Device Management

Verify device registration, idempotent re-registration, capability sync, and validation.
CaseSteps
TC-DEVICERegister new (isNew=true) → re-register (isNew=false) → sync capabilities → reject unknown device → list device

TP-TAG — NFC Tag Management

Verify NFC tag creation, listing, and duplicate rejection.
CaseSteps
TC-TAGCreate tag → list tag → reject duplicate tag_uid (409)

TP-SCAN — Scan Resolution

Verify NFC scan profile resolution, event logging, and error paths.
CaseSteps
TC-SCANResolve LOCK → resolve UNLOCK → events logged → reject unregistered device → reject unregistered tag

TP-RPROFILE — Restriction Profiles

Verify CRUD for restriction profiles.
CaseSteps
TC-RPROFILE-CRUDCreate → list (incl. default) → update → delete

TP-POLICY — Policies

Verify the full policy lifecycle including the schedule variant.
CaseSteps
TC-POLICY-CRUDCreate tag_scan → read (joined) → list → update name+priority → toggle off/on → create schedule policy → delete both

TP-USER — User Management

Verify user invitation, listing, validation, deletion, and the self-deletion guard.
CaseSteps
TC-USERInvite auto-confirmed member → find in profiles → reject invalid role → delete member → block self-deletion

TP-APPCAT — App Catalogue

Verify app search and category listing (external iTunes proxy) contracts.
CaseSteps
TC-APPCATSearch returns results array → reject missing query → return categories

TP-LIMITS — Subscription Limits

Verify plan-limit enforcement on resource creation.
CaseSteps
TC-LIMITSFill restriction profiles to max → 403 over limit → fill policies to max → 403 over limit

TP-ORGINFO — Organisation Info

Verify organisation read operations.
CaseSteps
TC-ORGINFOReturn stats + limits → read org details → list Stripe products

Coverage vs test-api.sh

Every operation in the original script is represented as a procedure and exercised by a case. The key improvements:
Each operation is a reusable, typed procedure (not inline curl).
Cases run procedures in order with real pass/fail assertions.
Per-run organisation isolation replaces shared-org mutation.
Onboarding and offboarding are first-class tested lifecycles.
Limit enforcement runs against a fresh org filled to its plan maximum.

Roadmap

The directory layout is structured so the next layers slot in without rework:

Load testing (k6)

Reuse the seeded-user + edge-function patterns to script throughput/latency runs under load/.

E2E (Playwright)

Drive the web UI (e.g. the Settings → Danger Zone offboard flow) under src/e2e/, sharing .env.test.