Skip to main content

Bulk Event Ingestion

This guide covers how to reliably send large volumes of usage events to Monk from a database, warehouse, or scheduled export.
For real-time event streaming, see Send your first Event. This guide is for batch/scheduled workflows — e.g., syncing 100K+ events daily from a centralized database.

Architecture Overview

Your Database → Scheduled Job (cron) → POST /v1/events/batch → Monk
                                        (up to 10,000 per request)
The batch endpoint accepts up to 10,000 events per request. For larger volumes, split into multiple requests.
Daily VolumeRequests NeededApproach
< 10,0001Single batch call
10K – 100K2–10Sequential or parallel batches
100K – 1M10–100Parallel batches with concurrency control
> 1M100+Parallel batches + contact us for dedicated ingestion

Step-by-Step

1. Query your events

Pull new events from your database since the last sync. Track your cursor (e.g., last processed ID or timestamp) to avoid re-processing.
SELECT
  external_customer_id,
  event_type AS event_name,
  occurred_at AS timestamp,
  jsonb_build_object('region', region, 'model', model) AS properties
FROM billing_events
WHERE id > :last_synced_id
ORDER BY id ASC
LIMIT 10000;

2. Chunk into batches

Split your result set into chunks of up to 10,000 events.
import requests
import time

API_URL = "https://events-api.monk.com/v1/events/batch"
HEADERS = {
    "Authorization": f"Bearer {MONK_API_KEY}",
    "Content-Type": "application/json",
}
BATCH_SIZE = 5000  # Stay under the 10K limit with margin


def chunk(lst, size):
    for i in range(0, len(lst), size):
        yield lst[i : i + size]


def sync_events(events):
    failed_batches = []

    for i, batch in enumerate(chunk(events, BATCH_SIZE)):
        payload = {
            "events": [
                {
                    "externalCustomerId": e["external_customer_id"],
                    "eventName": e["event_name"],
                    "timestamp": e["timestamp"],
                    "idempotencyKey": f"sync_{e['id']}",
                    "properties": e.get("properties", {}),
                }
                for e in batch
            ]
        }

        resp = requests.post(API_URL, json=payload, headers=HEADERS)

        if resp.status_code == 202:
            print(f"Batch {i+1}: {len(batch)} events accepted")
        elif resp.status_code == 429:
            # Rate limited — back off and retry
            time.sleep(5)
            resp = requests.post(API_URL, json=payload, headers=HEADERS)
            if resp.status_code != 202:
                failed_batches.append(batch)
        else:
            print(f"Batch {i+1} failed: {resp.status_code} {resp.text}")
            failed_batches.append(batch)

    return failed_batches

3. Schedule it

Run your sync on a schedule — hourly or daily depending on how fresh you need usage data.
FrequencyUse Case
Every hourNear-real-time usage visibility on invoices
DailySufficient for monthly billing cycles
On-demandBackfills, migrations

4. Handle failures

Derive keys from your source data (e.g., sync_{row_id}) so retrying the same batch doesn’t create duplicates.
"idempotencyKey": f"sync_{event['id']}"
On 5xx or 429 responses, retry with exponential backoff. Don’t retry 4xx validation errors — those need fixing in your source data.
delays = [1, 2, 4, 8, 16]  # seconds
for delay in delays:
    resp = requests.post(API_URL, json=payload, headers=HEADERS)
    if resp.status_code == 202:
        break
    time.sleep(delay)
If a batch fails after retries, log the failed events and alert your team. Don’t silently drop events — they’ll show up as missing usage on invoices.

Customer Mapping

Events require either customerId (Monk UUID) or externalCustomerId (your identifier). For bulk sync, externalCustomerId is usually easier since it matches your internal customer IDs.
{
  "externalCustomerId": "cus_bretton_123",
  "eventName": "api_call",
  "timestamp": "2026-03-15T10:30:00Z"
}
Set externalCustomerId when creating customers to enable this mapping.

Monitoring Your Sync

After syncing, verify events landed correctly:
  1. List events: GET /v1/events?customerId=...&start=2026-03-15&end=2026-03-16
  2. Check usage cost: POST /v1/usage/cost to verify aggregated amounts
  3. Dashboard: Navigate to Usage-based → Events to browse visually

FAQ

Not in a single request. Split into multiple batch calls. At 5,000 events per batch, you can sync 100K events in 20 sequential API calls — typically under a minute.
Use consistent idempotencyKey values derived from your source data. Monk deduplicates on this key — only one copy is counted for billing.
Events are processed asynchronously. Usage-based invoice line items are refreshed within minutes of ingestion.
Yes. Set the timestamp field to the original event time. Events are attributed to the billing period matching their timestamp, not their ingestion time.
The entire batch is rejected with a validation error. We recommend validating your data before sending — check that all customer IDs exist and timestamps are valid ISO 8601.

Next Steps

Events Concepts

Idempotency, schema, and best practices

Batch Events API

Full API specification

Usage Cost API

Verify aggregated usage amounts

Meters

How events are aggregated into billable usage