Processor Service¶
The Processor service consumes replayed events and produces derived aggregates.
It is the execution layer of the Pipeline Investigation Kit.
Processor answers the question:
“Given these events, what metrics should exist?”
Role in the Pipeline¶
Replay API → SQS → Processor → AggregatesTable
Processor never talks to the API directly. It only reacts to messages already validated and scoped by Replay.
Input Source¶
Processor is triggered by an SQS queue populated by the Replay service.
Each message represents a single immutable event reference:
{
"event_id": "e123",
"entity_id": "user_123",
"event_time": "2025-12-28T23:59:59Z",
"s3_bucket": "pipeline-raw-bucket",
"s3_key": "raw/source=demo/..."
}
Processing Flow¶
For each message:
- Load raw event from S3
- Validate payload structure
- Apply aggregation logic
- Write aggregate results to DynamoDB
- Emit metrics
Failures are isolated per message.
Aggregation Model¶
Processor writes results into AggregatesTable.
Each aggregate represents a windowed metric.
Example:
{
"PK": "ENTITY#user_123",
"SK": "DAY#2025-12-28#VER#1767043865",
"metric_name": "daily_event_count",
"value": 1,
"window_start": "2025-12-28T00:00:00Z",
"window_end": "2025-12-28T23:59:59Z"
}
Idempotency¶
Processor is idempotent by design.
- Aggregates are keyed by deterministic window identifiers
- Reprocessing the same event produces the same result
- Replay can safely be executed multiple times
DRY_RUN Mode¶
When DRY_RUN=true:
- messages are read
- events are parsed
- metrics are computed
- no DynamoDB writes occur
Processor returns counts only.
This allows safe testing of aggregation logic.
Metrics¶
Processor emits the following CloudWatch metrics:
ProcessorMessageCountProcessorErrorCount
These metrics are critical for validating replay outcomes.
Error Handling¶
- malformed events are skipped
- missing S3 objects are logged
- aggregation failures increment error metrics
- the Lambda does not crash the batch
This ensures partial failures do not block investigations.
Scaling Characteristics¶
- SQS controls concurrency
- Processor scales horizontally
- Backpressure is automatic
Replay volume directly controls processing pressure.
What Processor Is NOT¶
- Not a stream processor
- Not real-time analytics
- Not a scheduler
Processor is intentionally reactive and scoped.
Design Philosophy¶
Processor favors:
- determinism over speed
- observability over automation
- correctness over convenience
If results surprise you, the system is doing its job.
Typical Workflow¶
- Ingest raw events
- Replay a scoped window
- Observe processor metrics
- Inspect aggregates
- Iterate safely
Processor is an optional but powerful extension to the Pipeline Investigation Kit.