Operations - Website Offline Controls, Heartbeat, And API Observability¶
Overview¶
This document specifies missing legacy behavior for storefront offline controls and API observability that is currently not covered in dedicated migration specs.
Checklist coverage:
- OP-001 Offline middleware / website online-offline controls
- OP-004 Request/response traffic recording middleware in APIs
- OP-005 Sentry exception capture in APIs
- OP-012 Heartbeat endpoint/admin heartbeat module
Primary legacy evidence:
- _core/config/router.php
- _www/index.php
- _www/api.php
- _www/appconfig/constants.php
- src/functions.php
- src/Atraxion/Front/Middleware/OfflineMiddleware.php
- src/Atraxion/Common/Command/WebsiteOfflineCommand.php
- src/Atraxion/Common/Command/WebsiteOnlineCommand.php
- _api/index.php
- _autodoc/index.php
- src/Atraxion/Api/Middleware/TrafficRecordingMiddleware.php
- src/Atraxion/Api/Repository/TrafficRecordingRepository.php
- src/Atraxion/Api/Resources/doctrine/TrafficRecording.orm.xml
- src/Atraxion/Request/RequestResponseRecorderMiddleware.php
- src/Atraxion/Request/DbTableRequestResponseStorage.php
- src/Atraxion/Api/Service/SentryService.php
- _www/appconfig/bootstrap.php
- _core/config/routing.php
- _admin/config/routing.php
- _core/controller/classes/atx/system/HeartbeatController.php
- _admin/controller/classes/atx/system/AdminHeartbeatController.php
Legacy Capabilities (As-Is)¶
1. Website Offline Controls (OP-001)¶
State model:
1. Offline state is represented by file presence at OFFLINE_FILE (APP_PUBLIC_DIR/appconfig/offline).
2. File contents store the expected back-online timestamp.
3. is_offline(bool $checkIp) supports privileged-IP bypass when $checkIp=true.
Control entry points:
1. CLI command website:offline [back-online] writes/extends offline timestamp.
2. CLI command website:online removes offline file.
Runtime enforcement:
1. _www/index.php blocks public storefront early when is_offline(true) and renders offline page.
2. _www/api.php returns HTTP 503 JSON when is_offline(true).
3. Slim front app uses OfflineMiddleware in _core/config/router.php; middleware checks is_offline(false) and returns 503 HTML or JSON payload.
2. API Traffic Recording (OP-004)¶
_api application:
1. Uses TrafficRecordingMiddleware.
2. Records request/response in api_traffic_logs Doctrine entity.
3. Captures duration and runs probabilistic retention cleanup (~5%) for logs older than two months.
4. On exceptions, writes a debug-level synthetic response into logs and returns safe error response.
_autodoc application:
1. Uses RequestResponseRecorderMiddleware with autodoc_traffic_logger.
2. Persists request, response, and optional exception fields in DB table autodoc_traffic_logs.
3. Retention is probabilistic (~5%) with a two-week window.
4. Middleware rethrows exceptions after recording; default error handler then builds JSON error response.
3. API Sentry Exception Capture (OP-005)¶
Global bootstrap:
1. _www/appconfig/bootstrap.php initializes Sentry SDK.
2. Shutdown handler captures fatal errors as Sentry messages.
Per-app capture points:
1. _api default error handler captures exception with request context via SentryService.
2. _api TrafficRecordingMiddleware also captures exception before building fallback response.
3. _autodoc default error handler captures exception with request context via SentryService.
Context enrichment:
1. SentryService attaches request metadata, locale tag, optional authenticated user info, endpoint, and IP.
2. Selected sensitive fields are redacted for Sentry context (authorization, cookie, selected auth server params).
4. Heartbeat Endpoint And Admin Heartbeat Module (OP-012)¶
Public heartbeat route:
1. Legacy route key hb maps to HeartbeatController.
2. init action returns lightweight runtime data (cart context and session-online updates for eligible authenticated users).
3. Used as a periodic front signal path rather than a deep health diagnostics endpoint.
Admin heartbeat module:
1. Admin route key hbadmin maps to AdminHeartbeatController.
2. check aggregates dashboard metrics (orders, platform orders, manual orders, searches, active users, queue/general-running flags, unvalidated articles, offline status).
3. Includes UI/session toggles (hideAdminInfo, hideHeartbeatMenu, showDebug) and version switch helper (switchVersion for eligible users).
Current Gaps And Risks To Resolve In Migration¶
- Offline checks are duplicated across entrypoints and middleware with different bypass semantics (
is_offline(true)vsis_offline(false)). - API offline behavior is not unified between legacy
_www/api.php,_api, and_autodocapps. _apimay double-report exceptions to Sentry (middleware + error handler for same failure path).- Traffic recording stores full request/response payloads and headers; redaction policy is inconsistent across apps.
- Retention windows differ (
api_traffic_logstwo months vsautodoc_traffic_logstwo weeks) without a formal policy. - Cleanup is probabilistic and piggybacked on writes, so deletion cadence is non-deterministic.
OfflineMiddlewareJSON detection uses requestContent-Typeonly; this can mismatch client expectations.- Heartbeat behavior and payloads are legacy-controller specific and undocumented as a stable contract.
Target Migration Specification (Symfony)¶
Scope¶
In-scope: - Unified maintenance-mode domain behavior for storefront and APIs. - Explicit contract for storefront/admin heartbeat endpoints. - Unified API request/response recording policy with redaction and retention controls. - Deterministic API exception telemetry pipeline to Sentry.
Out-of-scope: - SIEM downstream integrations beyond Sentry. - Product-level alert routing ownership matrix.
Required Operating Rules¶
- Maintenance mode must be represented by one typed source of truth (not raw file checks scattered across apps).
- All HTTP surfaces (webshop, internal API, Autodoc API) must follow one maintenance policy contract and response format per channel.
- Exception-to-Sentry pipeline must capture each exception once per request path.
- Traffic recording must apply configurable redaction rules for authorization headers, cookies, tokens, and sensitive body fields.
- Retention and cleanup must be deterministic (scheduled cleanup command/job), not probabilistic on write.
- Recorder behavior must define body truncation/size limits to control storage and PII exposure.
- Heartbeat endpoints must expose a documented, versioned payload contract with clear auth boundaries.
Proposed Target Components¶
Application services:
- MaintenanceWindowService (state and back-online metadata)
- MaintenanceModeResponder (HTML/JSON response contracts)
- HeartbeatStatusService (front/admin heartbeat payload builder)
- ApiTrafficRecorder (redaction, truncation, persistence)
- ExceptionTelemetryService (Sentry envelope and dedup guard)
Middleware:
- MaintenanceModeMiddleware (shared across apps, policy-driven)
- ApiObservabilityMiddleware (records duration + traffic with consistent schema)
- ApiExceptionMiddleware (single capture point to telemetry and response mapping)
Data stores:
- api_traffic_logs replacement table(s) with explicit retention index strategy.
- Optional separate table for exception metadata if query needs differ from traffic logs.
Data And Migration Notes¶
- Preserve historical
api_traffic_logsandautodoc_traffic_logsdata for the agreed retention horizon before archive/drop. - Define and run one migration for canonical observability schema (or explicitly keep separate schemas with policy parity).
- Migrate
OFFLINE_FILEbehavior to a typed configuration/state store with equivalent back-online timestamp semantics. - Document allowed maintenance bypass policy (if any) as explicit role-based authorization, not implicit IP checks.
- Define compatibility strategy for legacy heartbeat response fields used by current admin/front polling code.
Acceptance Scenarios (Gherkin)¶
Feature: Maintenance mode and API observability
Scenario: Storefront maintenance mode returns expected page
Given maintenance mode is enabled until 15:30
When an anonymous customer requests the webshop
Then response status should be 503
And response should contain back-online metadata
Scenario: API maintenance mode returns structured JSON
Given maintenance mode is enabled
When a client calls an API endpoint
Then response status should be 503
And response body should follow the agreed maintenance JSON contract
Scenario: API exceptions are captured once
Given an API handler throws an exception
When the request fails
Then exactly one Sentry event should be emitted for that exception
Scenario: Traffic recorder redacts sensitive fields
Given a request includes Authorization and Cookie headers
When the request and response are recorded
Then stored traffic logs should not contain raw secret values
Scenario: Traffic retention cleanup is deterministic
Given logs older than configured retention exist
When cleanup job runs
Then stale logs should be deleted according to policy
Scenario: Admin heartbeat returns operational snapshot
Given an authenticated admin opens the heartbeat dashboard
When heartbeat check is requested
Then response should include orders/users/runtime-state metrics
And the payload should follow the documented heartbeat contract
Open Decisions¶
- Should maintenance-mode bypass exist at all in target production, and if yes, should it be role-based or network-based.
- Whether to keep separate observability stores for
_apiand_autodocor merge into one canonical API telemetry schema. - Which fields are mandatory to redact in recorded bodies (tenant/customer-specific legal constraints).
- Whether to sample successful request recording in production while always recording failures.
- Whether storefront heartbeat remains a dedicated endpoint or is replaced by a broader health/status API.