Performance Implementation Backlog - 2026-02-08¶
Purpose¶
Turn the performance baseline into an execution backlog for Day 1 migration delivery.
This plan is constrained by existing architecture decisions:
- Max 5 deployables (
webshop,stock-import,platform-export,vehicle-data,erp-middleware). - Max 3 internal shared packages (
atraxion/contracts,atraxion/integration-sdk,atraxion/foundation). - Projection-first reads for stock, fitment, and wheel sets.
- Async-first write propagation with outbox/inbox and idempotency.
References:
- docs/migration/02-island-boundaries-and-performance.md
- docs/migration/webshop/02-products.md
- docs/migration/vehicle-data/02-vehicle-product-mapping.md
- docs/migration/stock-import/00-stock-import-overview.md
- docs/migration/platform-export/00-platform-export-overview.md
Scope And Targets¶
Data scale assumptions:
- ~285,000 products
- ~20,000 vehicle variants
- ~50 suppliers
- ~60 platforms
- composite sets without native SKU
Day 1 SLI targets:
- Product/search read p95:
< 200 ms - Vehicle-to-product lookup p95:
< 250 ms - Set list/detail + wheel-set API p95:
< 300 ms - Projection freshness lag (stock/fitment/sets):
< 5 min - Projection rebuild failure rate:
< 0.5%daily
Tooling Decisions (Day 1)¶
- Queue and async compute: Symfony Messenger + RabbitMQ transport.
- Cache and locks: Redis (cache + lock store).
- Persistence: MySQL/MariaDB (single family across islands).
- Profiling and performance regression checks: Blackfire + Symfony profiler.
- Metrics/alerting: Prometheus metrics + Grafana dashboards + Sentry for exception telemetry.
Defer until metrics prove need:
- OpenSearch/Elasticsearch for read/search offload.
- Additional deployables for compute-only services.
Work Packages¶
| ID | Priority | Island(s) | Deliverable | Dependencies |
|---|---|---|---|---|
PERF-001 |
H | All | Performance instrumentation baseline and dashboards | none |
PERF-002 |
H | Webshop, Vehicle Data | Projection schema and index hardening | PERF-001 |
PERF-003 |
H | Stock Import, Webshop | Stock projection pipeline (incremental + reconcile) | PERF-002 |
PERF-004 |
H | Vehicle Data | Fitment projection pipeline (incremental) | PERF-002 |
PERF-005 |
H | Webshop, Vehicle Data | Wheel-set projection pipeline + component reverse index | PERF-002, PERF-004 |
PERF-006 |
H | All integration islands | Queue topology, retry/backoff, DLQ, replay commands | PERF-001 |
PERF-007 |
M | All | Worker sizing baseline + autoscaling policy | PERF-006 |
PERF-008 |
H | Webshop, Vehicle Data | Dual-read/dual-write validation and drift detection | PERF-003, PERF-004, PERF-005 |
PERF-009 |
M | Webshop | API/query optimization and cache policy hardening | PERF-008 |
PERF-010 |
M | Shared Ops | Failure runbooks, replay tooling, reconciliation SOPs | PERF-006, PERF-008 |
PERF-011 |
M | All | Load tests and failure-injection tests | PERF-007, PERF-008 |
PERF-012 |
H | All | Cutover gates and go/no-go check | PERF-008, PERF-011 |
Schema And Index Specs¶
Use narrow projection tables with explicit compound indexes.
1) article_availability_projection¶
CREATE TABLE article_availability_projection (
article_number VARCHAR(50) PRIMARY KEY,
sellable_qty INT NOT NULL DEFAULT 0,
best_price DECIMAL(10,2) NOT NULL,
next_delivery_date DATE NULL,
source_batch_id VARCHAR(64) NULL,
updated_at DATETIME(3) NOT NULL,
INDEX idx_aap_qty_updated (sellable_qty, updated_at),
INDEX idx_aap_updated (updated_at)
);
2) product_vehicle_mappings¶
Use composite primary key to avoid surrogate-id lookups on hot paths.
CREATE TABLE product_vehicle_mappings (
vehicle_variant_id INT NOT NULL,
article_number VARCHAR(50) NOT NULL,
product_type VARCHAR(10) NOT NULL,
position VARCHAR(10) NOT NULL DEFAULT 'both',
is_original_equipment BOOLEAN NOT NULL DEFAULT FALSE,
match_score SMALLINT NOT NULL,
computed_at DATETIME(3) NOT NULL,
PRIMARY KEY (vehicle_variant_id, article_number),
INDEX idx_pvm_article_type_score (article_number, product_type, match_score),
INDEX idx_pvm_vehicle_type_score (vehicle_variant_id, product_type, match_score)
);
3) wheel_set_projection¶
CREATE TABLE wheel_set_projection (
set_key CHAR(64) PRIMARY KEY,
vehicle_variant_id INT NOT NULL,
season VARCHAR(20) NOT NULL,
mounting_type VARCHAR(20) NOT NULL,
wheel_front_article_number VARCHAR(50) NOT NULL,
wheel_rear_article_number VARCHAR(50) NULL,
tyre_front_article_number VARCHAR(50) NOT NULL,
tyre_rear_article_number VARCHAR(50) NULL,
tpms_article_number VARCHAR(50) NULL,
sellable_quantity INT NOT NULL,
set_price DECIMAL(10,2) NOT NULL,
computed_at DATETIME(3) NOT NULL,
INDEX idx_wsp_vehicle_season_qty_price (vehicle_variant_id, season, sellable_quantity, set_price),
INDEX idx_wsp_computed (computed_at)
);
4) wheel_set_component_index¶
Reverse index to update impacted sets when one component changes.
CREATE TABLE wheel_set_component_index (
component_article_number VARCHAR(50) NOT NULL,
component_role VARCHAR(20) NOT NULL,
set_key CHAR(64) NOT NULL,
PRIMARY KEY (component_article_number, component_role, set_key),
INDEX idx_wsci_set_key (set_key)
);
5) projection_job_checkpoint¶
CREATE TABLE projection_job_checkpoint (
job_name VARCHAR(100) PRIMARY KEY,
scope_key VARCHAR(100) NOT NULL,
last_success_cursor VARCHAR(255) NULL,
last_success_batch_id VARCHAR(64) NULL,
last_success_at DATETIME(3) NULL,
updated_at DATETIME(3) NOT NULL,
UNIQUE KEY uq_pjc_job_scope (job_name, scope_key)
);
Compute Strategy For Vehicle Mapping And Sets¶
Vehicle Mapping (Tyre/Wheel -> Vehicle)¶
Do not do full Cartesian recomputes (all vehicles x all products).
Incremental approach:
- Normalize vehicle rules into fitment buckets:
tyre_bucket = width:aspect:diameter:positionwheel_bucket = pcd:diameter:center_bore_floor:offset_range- Normalize products into the same bucket keys.
- Maintain bucket membership tables for vehicles and products.
- Recompute only affected key intersections when either side changes.
Trigger sources:
- Vehicle source sync updates.
- Manual fitment changes.
- Product spec changes.
Wheel Set Composition¶
- Build candidate sets from fitment-compatible wheel/tyre buckets only.
- Generate deterministic
set_key. - Compute quantity via projection stock snapshot:
- standard:
floor(min(wheel_qty, tyre_qty) / 4) - staggered:
min(floor(min(front)/2), floor(min(rear)/2)) - Store set in
wheel_set_projectionand component links inwheel_set_component_index. - On component stock/price change, fetch affected
set_keyvalues via reverse index and recompute only those sets.
Queue Topology And Policies¶
Queue classes remain high-priority and bulk, with explicit per-flow queues.
Suggested queue names¶
| Queue | Class | Purpose |
|---|---|---|
webshop.high-priority |
high-priority | order/payment/customer critical updates |
webshop.bulk |
bulk | set/stock projection recompute jobs |
vehicle-data.high-priority |
high-priority | lookup-critical fitment refresh notifications |
vehicle-data.bulk |
bulk | mapping recompute/backfill |
stock-import.high-priority |
high-priority | incremental supplier corrections |
stock-import.bulk |
bulk | full supplier feed loads |
platform-export.high-priority |
high-priority | order import/tracking updates |
platform-export.bulk |
bulk | stock export generation |
*.dlq |
dlq | exhausted retries for operator replay |
Retry defaults¶
- Temporary errors: exponential backoff (
15s,60s,300s,900s,1800s). - Permanent validation errors: no blind retry, route to DLQ/review.
- Max attempts:
5(override by connector if required). - Idempotency key required on all cross-island write events/messages.
Replay and reconciliation commands¶
bin/console queue:replay --queue=<dlq> --from=<ts> --to=<ts> [--scope=<id>]bin/console projection:reconcile --domain=stock|fitment|sets --since=<ts>bin/console projection:rebuild --domain=stock|fitment|sets --scope=<id|all>
Worker Sizing Baseline (Initial)¶
Initial production baseline (before autoscaling):
| Island | Queue | Workers | Concurrency/worker | Notes |
|---|---|---|---|---|
| Webshop | webshop.high-priority |
4 | 1 | protect latency-sensitive domain writes |
| Webshop | webshop.bulk |
8 | 1 | projection recompute throughput |
| Vehicle Data | vehicle-data.high-priority |
2 | 1 | fast fitment signal propagation |
| Vehicle Data | vehicle-data.bulk |
8 | 1 | fitment mapping recompute |
| Stock Import | stock-import.high-priority |
3 | 1 | urgent correction batches |
| Stock Import | stock-import.bulk |
6 | 1 | supplier full imports |
| Platform Export | platform-export.high-priority |
3 | 1 | order/tracking pushes |
| Platform Export | platform-export.bulk |
6 | 1 | export feed generation |
Autoscaling triggers (per queue):
- p95 queue wait time
> 60sfor 10 min: scale out +2 workers. - queue depth
> 10,000for 15 min: scale out +4 workers. - error rate
> 2%for 10 min: freeze scale-out, page on-call, inspect DLQ.
Observability And SLI Dashboard¶
Required metrics:
- API latency: p50/p95/p99 by endpoint and tenant.
- Projection freshness lag: event time to projection commit time.
- Queue lag: enqueue-to-start and enqueue-to-ack.
- Recompute throughput: rows/s by projection domain.
- Recompute failure rate and DLQ depth.
- Duplicate suppression rate by idempotency key.
- Stock-set consistency drift count.
Minimum dashboards:
perf-storefront-latencyperf-projection-freshnessperf-queue-healthperf-fitment-and-sets-computeperf-connector-runtime
Alert thresholds:
set API p95 > 300 msfor 15 min.fitment lookup p95 > 250 msfor 15 min.- projection lag
> 5 minfor 10 min. - any DLQ depth
> 500for 10 min. - daily rebuild failure rate
> 0.5%.
Delivery Phases And Gates¶
Phase P0 - Baseline And Instrumentation¶
Includes: PERF-001, partial PERF-006
Gate G0:
- Dashboards live in staging.
- All message producers emit idempotency keys.
- Queue lag and API latency metrics verified with test traffic.
Phase P1 - Projection Storage Hardening¶
Includes: PERF-002
Gate G1:
- Projection tables/indexes migrated.
- Explain plans for top 10 read queries are within budget.
- No full table scan on critical read endpoints.
Phase P2 - Incremental Compute Pipelines¶
Includes: PERF-003, PERF-004, PERF-005, PERF-006
Gate G2:
- Incremental triggers implemented for stock/fitment/sets.
- Reverse component index for set recompute active.
- Replay tooling validated against synthetic failure cases.
Phase P3 - Dual-Read Validation¶
Includes: PERF-008
Gate G3:
- Dual-read enabled for stock/fitment/set APIs.
- Drift reports under thresholds:
- stock mismatch
< 0.1% - fitment mismatch
< 0.5% - set quantity mismatch
< 0.5% - Drift reconciliation job and runbook validated.
Phase P4 - Tuning And Load Validation¶
Includes: PERF-007, PERF-009, PERF-011
Gate G4:
- Load tests at
>= 1.5xexpected peak pass SLI targets. - Queue backlog drains within
30 minafter peak bursts. - No unbounded memory growth in long-running workers.
Phase P5 - Cutover¶
Includes: PERF-010, PERF-012
Gate G5 (Go/No-Go):
- All H-priority work packages complete.
- SLIs stable for 7 consecutive days in pre-prod/prod canary.
- DLQ replay and rollback drills completed.
- Architecture gates still pass (deployable/package caps unchanged).
Backlog Details (Definition Of Done)¶
PERF-001 Instrumentation Baseline¶
DoD:
- Metrics and traces emitted from all islands.
- Correlation id propagated across sync+async flows.
- SLI dashboards and alerts configured.
PERF-002 Projection Schema Hardening¶
DoD:
- Projection DDL applied with indexes.
- Query plans captured and documented.
- Migration rollback script tested.
PERF-003 Stock Projection Pipeline¶
DoD:
- Stock events update
article_availability_projectionincrementally. - Reconciliation command catches and fixes drift.
- p95 product search meets
< 200 msin load tests.
PERF-004 Fitment Projection Pipeline¶
DoD:
- Bucket-based incremental recompute implemented.
- Vehicle and product change triggers wired.
- p95 fitment lookup meets
< 250 ms.
PERF-005 Wheel Set Projection Pipeline¶
DoD:
- Set projection uses deterministic key and reverse component index.
- Standard and staggered quantity logic covered by tests.
- p95 set endpoints meet
< 300 ms.
PERF-006 Queue Reliability Contract¶
DoD:
- Retry/backoff and DLQ behavior implemented for all connector classes.
- Replay command operational with scope filters.
- Duplicate-delivery tests pass.
PERF-007 Worker Sizing And Scaling¶
DoD:
- Baseline worker counts configured per queue.
- Scaling rules implemented and tested under load.
- Worker saturation runbook written.
PERF-008 Dual-Read And Drift Validation¶
DoD:
- API dual-read comparison runs continuously.
- Drift reports emitted daily.
- Drift thresholds are below gate limits.
PERF-009 Read Path Optimization¶
DoD:
- API and SQL hot paths profiled and optimized.
- Cache strategy documented per endpoint.
- No critical endpoint exceeds p95 target in soak test.
PERF-010 Runbooks¶
DoD:
- DLQ replay runbook per island exists.
- Projection reconciliation runbook exists.
- On-call escalation matrix is documented.
PERF-011 Load/Failure Tests¶
DoD:
- Import burst tests, compute burst tests, and outage tests pass.
- Failure scenarios validated (broker outage, DB failover, upstream 429/5xx).
- Results attached to release evidence.
PERF-012 Cutover Governance¶
DoD:
- Go/no-go checklist signed by domain and ops owners.
- Rollback steps tested and time-bounded.
- First 48-hour hypercare staffing confirmed.
Risks And Mitigations¶
| Risk | Impact | Mitigation |
|---|---|---|
| Projection drift during high write bursts | wrong stock/fitment/set results | dual-read drift monitor + reconcile command + replay |
| Queue saturation in bulk windows | stale data | autoscaling + separate high-priority queues + backpressure |
| Expensive set recompute fan-out | high CPU and lag | reverse component index + bucketed candidate generation |
| Connector retry storms | cascading failures | capped retries, jittered backoff, circuit-break policy |
| Package/service sprawl pressure | architecture erosion | enforce ADR gates from 03-architecture-review-checklist-template.md |
Acceptance Scenarios (Gherkin)¶
Feature: Performance migration backlog execution
Scenario: Component stock update recomputes only impacted sets
Given component article "WHEEL-123" is linked in wheel_set_component_index
When stock for "WHEEL-123" changes
Then only linked set_key projections should be recomputed
And unrelated sets should not be recalculated
Scenario: Fitment recompute runs incrementally by bucket
Given a fitment rule change only affects tyre bucket "225:45:17:both"
When fitment recompute is triggered
Then only product and vehicle records in that bucket are recomputed
Scenario: Queue backlog scales workers
Given bulk queue wait time exceeds 60 seconds for 10 minutes
When autoscaling policy evaluates queue health
Then worker count should scale out according to policy
Scenario: Dual-read gate blocks cutover on excessive drift
Given set quantity drift is above 0.5 percent
When cutover gate G3 is evaluated
Then cutover should be blocked
And reconciliation actions should be required