Replay Service¶

The Replay service enables controlled re-emission of previously ingested events.

It is the core investigation tool of the Pipeline Investigation Kit.

Replay answers the question:

“What would happen if these exact events were processed again?”

Endpoint¶

POST /replay

Request Payload¶

{
  "entity_id": "user_123",
  "start_time": "2025-12-28T00:00:00Z",
  "end_time": "2025-12-29T00:00:00Z",
  "limit": 50,
  "include_duplicates": false
}

Parameters¶

Field	Description
`entity_id`	Entity whose events will be replayed
`start_time`	Start of replay window (ISO-8601)
`end_time`	End of replay window (ISO-8601)
`limit`	Maximum number of events to replay
`include_duplicates`	Whether to include duplicate events

All parameters are required except limit (defaults to 500).

What Replay Does¶

Replay performs read-only investigation followed by optional re-emission.

Step by step:

Query EventsTable by entity_id
Filter events by time window
Apply deduplication rules
Build replay messages
Send messages to SQS
Emit metrics

Replay never modifies stored data.

Replay Filtering Rules¶

Replay messages are built using the following rules:

events without event_id are skipped
events without s3_bucket or s3_key are skipped
when include_duplicates=false, each event_id is emitted only once
events are emitted in query order
replay stops when limit is reached

These rules are conservative by design.

Replay Output¶

{
  "entity_id": "user_123",
  "start_time": "2025-12-28T00:00:00Z",
  "end_time": "2025-12-29T00:00:00Z",
  "scanned": 12,
  "sent": 7
}

Fields¶

Field	Meaning
`scanned`	Number of DynamoDB items read
`sent`	Number of SQS messages emitted

A response with sent = 0 is valid and meaningful.

DRY_RUN Mode¶

When DRY_RUN=true:

DynamoDB is queried
filtering rules are applied
no SQS messages are sent
metrics are still emitted

This allows safe estimation of replay impact.

Observability¶

Metrics¶

ReplayRequestedCount
ReplayMessageCount

Logs¶

Replay logs include:

query parameters
filtering decisions
final counts

All logs are structured JSON.

Common Use Cases¶

Missing Aggregates¶

Replay a specific day or entity to verify:

whether events exist
how many would be processed

Late Data¶

Replay a delayed window to:

surface late arrivals
observe aggregate changes

Duplicate Storms¶

Replay with include_duplicates=false to:

verify deduplication
limit downstream impact

Safety Guarantees¶

Replay is explicit
Replay is scoped
Replay is reversible
Replay is observable

Nothing happens automatically.

What Replay Is NOT¶

Not a backfill engine
Not a migration tool
Not a repair mechanism by itself

Replay provides visibility, not magic.

Design Philosophy¶

Replay is intentionally:

conservative
predictable
debuggable

If replay feels “slow” or “manual”, that is by design.