Architecture¶
AtlasLens is built as a layered stack with two cooperating ingestion pipelines feeding one unified event store.
6 Presentation React + TypeScript dashboard (filters, charts, tables, export) + login
5 API FastAPI: auth, event search, aggregations, exports, health/sync control
4 Storage MongoDB: events + identities + groups + sync_state + users
3 Normalise RawEvent -> Unified Event; resolve identity; map groups; classify; UTC
2 Adapters one connector per product (Cloud)
1 Sources Jira / Confluence / Bitbucket / JSM
Everything above the adapter layer is deployment-agnostic.
Two pipelines, one store¶
The key insight that shapes the design: audit data and activity data come from different APIs.
- The audit pipeline (
pipeline: "audit", categorysecurity) pulls from each product's audit-log API: logins, permission changes, group membership changes, admin actions. - The activity pipeline (
pipeline: "activity", categorycontent) pulls from content/activity APIs: issues, pages, commits, pull requests, requests.
Both normalise into the same events collection, so a single query can span security and
productivity across all products.
Incremental ingestion¶
Incrementality comes from per-source watermarks + idempotent upserts, not from the database:
- Each
(deployment, product, pipeline)has a row insync_stateholding a cursor (lastoccurred_at). - The runner pulls everything newer than the cursor, normalises it, and upserts by
_id, so re-pulling an overlapping window never creates duplicates. - On success it advances the cursor and records
last_success_at; on failure it recordslast_errorand does not advance — a failing connector never blocks the others. - All timestamps are normalised to UTC on the way in.
See Operations for the scheduler and on-demand sync.
Data model¶
The heart of the system is the events collection. Each document carries a natural _id
(<deployment>:<product>:<source_id>) that makes ingestion idempotent, the normalised fields,
and the full original payload under raw.
{
"_id": "cloud:jira:evt-90211",
"occurred_at": "2026-04-12T09:14:33Z", // UTC
"product": "jira", // jira|confluence|bitbucket|jsm
"pipeline": "audit", // audit|activity
"actor_id": "person:0042", // resolved canonical identity
"actor_raw": "5f8a…", // accountId
"operation": "permission_changed",
"category": "security", // security|content
"severity": "high", // low|medium|high
"object_type": "space",
"object_ref": { "id": "ENG", "name": "Engineering", "container": null },
"source_ip": "203.0.113.7",
"raw": { /* original payload */ }
}
Supporting collections:
| Collection | Purpose |
|---|---|
sync_state |
Per-connector cursor, last success, last error |
identities |
Canonical person ↔ product accounts; encrypted emails |
canonical_groups / source_groups / group_map |
Group resolution |
group_membership |
Person ↔ team membership |
users |
AtlasLens admin accounts (salted password hashes) |
Identity & group resolution¶
Events arrive keyed by an opaque account id (actor_raw). The normaliser resolves each to a
canonical identity (actor_id), so one person's activity can be followed across all
products. Groups are anchored on Atlassian org groups and mapped to canonical teams, which
powers the Group/team filter and team-level rankings.
Storage & retention¶
A single small MongoDB instance handles the expected scale (~200 audited users → a few million
events per year). A TTL index on occurred_at enforces a one-year retention window, which
doubles as data minimisation.