# OPS MCP Protocol

This file is published to <a href="https://mcp.g3nretailstack.com/ops/PROTOCOL.md" target="_blank" rel="noopener noreferrer">https://mcp.g3nretailstack.com/ops/PROTOCOL.md</a>.


## Usage patterns (headless)
- Stack-wide SOPs & operations catalog: <a href="https://doc.g3nretailstack.com/story/operations.html" target="_blank" rel="noopener noreferrer">/story/operations.html</a>.
- Super-usecase scenarios + QA status: <a href="https://doc.g3nretailstack.com/story/super-usecases.html" target="_blank" rel="noopener noreferrer">/story/super-usecases.html</a>.
- This protocol stays contract-only; use the catalogs for workflow expectations.

## Base URL
- API Gateway: `https://api.g3nretailstack.com/ops`
- Health check: `GET /ops/ping` — requires `secret_code` (scrypt-validated).

## Auth + tenancy
- `ping` and `stat` require `secret_code` in the request body (scrypt-validated against DynamoDB hash).
- `maintenance/list` and `maintenance/get` require a valid session (`x-session-guid` header or `session_guid` body field).
- All destructive operations (maintenance schedule/start/end/cancel/update; vacuum all/org/cancel) are Direct Lambda only and require `secret_code`. They are not API Gateway routes.
- Maintenance mode blocks all other services (503) but OPS itself remains available.
- Maintenance check in other services fails open (infra issues do not cause global outage).
- `session_guid` is never emitted in responses; use `stats.session_fingerprint` for correlation.

## Roles
- Maintenance read (list/get): any authenticated user (session required).
- Maintenance write (schedule/start/end/cancel/update): operator-only (Direct Lambda + secret code).
- Vacuum: operator-only (Direct Lambda + secret code + confirmation phrase).

## Surfaces
- Contract-only surface. Implemented and deployed; see `/ops/openapi.yaml` for the current surface definition.

## Endpoint inventory (OpenAPI parity)
The endpoints below are implemented and defined in `/ops/openapi.yaml`. Request/response schema names reference OpenAPI component schemas.

| Method | Path | Request schema | Response schema |
| --- | --- | --- | --- |
| GET | /ping | — | PingSuccess |
| GET | /stat | — | StatSuccess |
| GET | /maintenance/list | (query params: limit, next_token) | MaintenanceListSuccess |
| POST | /maintenance/list | MaintenanceListRequest | MaintenanceListSuccess |
| GET | /maintenance/get | (query param: maintenance_id) | MaintenanceGetSuccess |
| POST | /maintenance/get | MaintenanceGetRequest | MaintenanceGetSuccess |

## Direct Lambda endpoints (operator-only, secret code required)
All mutation endpoints below are invoked via Direct Lambda (IAM-gated, not API Gateway). Each requires `secret_code` in the request body. The secret code is validated against a scrypt hash stored in DynamoDB; the actual code value is never stored in environment variables, CDK code, or version control.

| Operation | Handler | Description |
| --- | --- | --- |
| ops_maintenance_schedule | maintenanceSchedule | Schedule a future maintenance window |
| ops_maintenance_cancel | maintenanceCancel | Cancel a scheduled (not yet started) maintenance |
| ops_maintenance_start | maintenanceStart | Manually start a scheduled maintenance |
| ops_maintenance_end | maintenanceEnd | End the active maintenance window |
| ops_maintenance_update | maintenanceUpdate | Update description, duration, or add progress updates |
| ops_vacuum_all | vacuumAll | Full data vacuum. Requires `confirmation_phrase: "VACUUM ALL DATA PERMANENTLY"`. 5-min pending window. Dry-run available |
| ops_vacuum_org | vacuumOrg | Org-scoped data vacuum. 5-min pending window. Dry-run available |
| ops_vacuum_cancel | vacuumCancel | Cancel a pending vacuum (within the 5-minute window) |
| ops_vacuum_status | vacuumStatus | Read-only vacuum status. No secret code required |

## Internal jobs (operator-only)
These jobs run via EventBridge-scheduled Lambda invocations (not API Gateway) or as async workers.

### `maintenance_sweep`
Scheduled sweep. Checks for maintenance windows whose `scheduled_start_utc` has elapsed and automatically starts them.

### `vacuum_sweep`
Scheduled sweep. Checks for vacuum requests whose 5-minute pending window has elapsed and triggers execution.

### `vacuum_all_worker`
Async worker. Executes full data vacuum across all services. Emits NO events (only a permanent DynamoDB audit record) because vacuum deletes event infrastructure data.

### `vacuum_org_worker`
Async worker. Executes org-scoped data vacuum across all services. Emits per-service progress events.

## Events

| Action | Trigger |
| --- | --- |
| ops.maintenance.scheduled | maintenanceSchedule |
| ops.maintenance.cancelled | maintenanceCancel |
| ops.maintenance.started | maintenanceStart |
| ops.maintenance.ended | maintenanceEnd |
| ops.maintenance.updated | maintenanceUpdate |
| ops.vacuum.org.started | vacuumOrgWorker begins |
| ops.vacuum.org.service.progress | Per-service progress |
| ops.vacuum.org.completed | vacuumOrgWorker completes |

Note: vacuumAll emits NO events (only a permanent DynamoDB audit record) because vacuum deletes event infrastructure data.

## Error tags
Common tags (see [/common/error-tags.html](https://doc.g3nretailstack.com/common/error-tags.html) for definitions): `validation-error`, `unauthorized`, `forbidden`, `not-found`, `expected-revision-required`, `conflict`, `invalid-state`, `throttled`, `internal-error`.

## Example envelopes
Success envelope (shape-only):
```json
{
  "success": true,
  "data": { "ok": true, "maintenance_active": false },
  "stats": { "service": "ops", "call": "ping", "timestamp_utc": "2026-01-01T00:00:00Z", "build": { "build_major": "MONDAY", "build_minor": "0000000000", "build_id": "MONDAY-0000000000" } }
}
```

Error envelope (shape-only):
```json
{
  "success": false,
  "error": {
    "error_code": "ops.validation_failed",
    "http_status": 400,
    "retryable": false,
    "major": { "tag": "validation-error", "message": { "en_US": "Invalid request." } }
  },
  "stats": { "service": "ops", "call": "ops_example", "timestamp_utc": "2026-01-01T00:00:00Z", "build": { "build_major": "MONDAY", "build_minor": "0000000000", "build_id": "MONDAY-0000000000" } }
}
```

## Idempotency & retries
- All **GET / list / resolve / search** calls are safe to retry with identical inputs (read-only, no side effects).
- **POST mutations** that accept `expected_revision` use optimistic concurrency: on `409 conflict` or `428 expected-revision-required`, re-read the record, obtain the current `revision`, and retry with the updated value.
- Creates are generally **not** idempotent. Prefer caller-provided `code` (where supported) and verify existence before retrying a failed create.
- Bulk or scheduled jobs that accept an `idempotency_key` will de-duplicate within the documented time window.

## Known pitfalls
- **Missing `expected_revision`**: most state-changing operations require it; omitting it returns `428` with the current revision in `error.details`.
- **Stale revision**: reading a record, waiting, then writing with an outdated `revision` triggers `409`. Always use the latest revision from the most recent read.
- **Pagination cursors**: `next_token` is opaque JSON. Do not modify, decode, or persist cursors across sessions — they may change format between deploys.
- **Anti-enumeration 404**: some org-scoped reads return `404` even when the record exists, if the caller is not associated with the org. Treat `404` as ambiguous; verify caller association before assuming "not found".

## OpenAPI
- Contract schema: <a href="https://doc.g3nretailstack.com/ops/openapi.yaml" target="_blank" rel="noopener noreferrer">https://doc.g3nretailstack.com/ops/openapi.yaml</a>
- API Gateway: ping, stat, maintenance list/get.
- Direct Lambda (operator): maintenance schedule/start/end/cancel/update, vacuum all/org/cancel/status.


_Build MONDAY-1776194870 • 2026-04-14T19:27:50.000Z • [© 1999 Microhouse Systems Inc. All rights reserved.](https://doc.g3nretailstack.com/common/copyright-license.html)_
