APR 09, 2026

Audit logs at ClearFeed

Designing ClearFeed's audit logging system end-to-end — BullMQ pipeline, batched writes, streamed exports, and the operational details that mattered most.

I designed and built ClearFeed’s audit logging system end-to-end for enterprise compliance requirements.

The easy version would have been writing audit rows directly from request handlers into Postgres. That approach works initially, until bulk imports, workflow automations, and large account actions suddenly generate thousands of writes inside the request path.

I avoided coupling audit persistence to the request lifecycle early.

Every service dispatches structured audit events into BullMQ. A NestJS orchestrator consumes the events, normalizes actor metadata, resource identifiers, operation types, timestamps, request source, and workspace context, then asynchronously flushes batched writes into Postgres.

The batching layer mattered more than expected.

Audit logs are append-heavy and queried mostly through filters like actor email, operation type, resource type, source, and timestamps. I indexed only the fields used by API filters, not the entire schema. The write path stayed fast, query performance stayed predictable, and storage growth remained manageable as retention increased.

The export pipeline had its own problems.

Enterprise customers wanted large CSV exports across wide date ranges. Loading entire result sets into memory would have failed quickly for larger accounts, so I built the exporter around streamed batched reads directly into the CSV writer. Memory usage stays flat regardless of export size. Small details mattered too: deterministic JSON serialization for details, stable flattening for resource_ids, and consistent ISO timestamps so downstream tooling behaved predictably.

Some of the harder problems were operational, not architectural.

Behind AWS ALB, the real client IP comes from X-Forwarded-For, not the socket address. Slack-originated actions intentionally store null IPs, because inventing values there pollutes the audit trail. Async retries propagate actor context so delayed jobs map back to the original user action instead of appearing as anonymous system activity.

The system stopped behaving like logging infrastructure pretty quickly. Once customers started depending on it for security reviews and compliance workflows, it became a production data system with the reliability expectations to match.