Architecture & Security

Built for organizations that can't ship data to OpenAI.

Sovereign deployment. Audit-grade lineage. Faithfulness on every answer.

Citorum runs inside your environment. Your documents stay there. Your inference happens there. Your audit trail lives there. Every answer is scored for faithfulness against its sources before it reaches the user. This page describes the architecture that makes those guarantees real — what runs where, what data moves and what doesn't, and how isolation, identity, audit, and adjudication work end to end.

Talk to us Read the whitepaper

What this page commits to

The teams who buy Citorum operate under regimes — Sarbanes-Oxley (SOX), Health Insurance Portability and Accountability Act (HIPAA), Federal Risk and Authorization Management Program (FedRAMP), Payment Card Industry Data Security Standard (PCI DSS), the European Union's General Data Protection Regulation (GDPR), state data-protection laws — that don't permit shipping their customers' material non-public information (MNPI) to a third-party Application Programming Interface (API) and hoping for the best. Most generative-AI products were not designed for those constraints. Citorum was.

Four commitments anchor the architecture:

Your data never leaves your perimeter. Documents you ingest are stored, embedded, retrieved, and reasoned over inside the deployment you control. The Citorum-hosted control plane never sees the corpus contents; it sees only configuration, license state, and aggregate telemetry you opt into.
Your inference happens on hardware you control. The Large Language Model (LLM) and the embedding model run inside your deployment. Default configurations make zero outbound calls to third-party model APIs. If you choose to wire an external model later (some customers do for specific use cases), that is an explicit, configurable, audit-logged decision — not a default behavior.
Every answer carries its provenance. Every retrieval, every prompt, every token of the model's response, and every user action is recorded with chain-of-custody metadata. Your auditors can reconstruct exactly what was retrieved, what was sent to the model, what came back, and who asked — for any answer, on any day, within your retention window.
Every answer carries its faithfulness score. Before a response returns to the user, an adjudication pipeline scores it against the sources it was drawn from — combining multiple grounding signals with an independent model judge — and assigns one of three labels: Verified — Cite Source, Review Recommended, or Do Not Rely — Consult Expert. The label rides with the answer. Reviewers, auditors, and downstream workflows see the system's confidence at the moment the answer was given, not weeks later in a post-hoc evaluation.

The rest of this page describes the architecture that delivers those commitments. It is technically precise. If your security team has a deeper question after reading it, the Security & Architecture Whitepaper (linked at the bottom) goes further, and our engineering team will answer the rest on a call.

Where Citorum runs

Citorum supports three deployment topologies. All three keep the corpus and the inference path inside the customer's control boundary.

On-premises

Citorum runs on customer-owned hardware in the customer's data center. Both the data plane (the Postgres cluster with the document corpus and vector indexes, the GPU nodes running inference, the application servers, the audit log store) and the local control plane (Kubernetes orchestration, identity integration, observability) live entirely inside the customer's network. A licensing-and-update channel reaches out to the Citorum control plane on a configurable schedule — that channel carries license state, version-update notifications, and opt-in aggregate telemetry. It does not carry corpus content, query content, response content, or user identifying information.

Customer cloud (single-tenant Virtual Private Cloud)

Citorum runs inside a Virtual Private Cloud (VPC) the customer owns, on hyperscaler infrastructure (Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), or Oracle Cloud Infrastructure (OCI)). The deployment is single-tenant — one Citorum installation per customer, with no multi-tenancy across the cloud boundary. Networking, Identity and Access Management (IAM), and storage are all under the customer's account.

Citorum-managed dedicated tenant

For customers without an in-house infrastructure team, Citorum operates the same single-tenant deployment in a Citorum-managed cloud account on the customer's behalf, with documented access controls, just-in-time elevation, and the audit trail for any privileged operation. This is not multi-tenant Software as a Service (SaaS); it is a dedicated deployment with a managed-services wrapper. Customers in this mode retain the option to migrate to one of the other two topologies without data export or schema rewrite.

In all three modes, the architectural shape is identical — same Postgres schema, same inference pipeline, same audit format, same Application Programming Interface (API) surface. Topology only changes who operates the runtime; it does not change how the system works.

Figure 1. Three deployment topologies. The customer boundary is the same shape in all three; what changes is who operates inside it.

Where your data goes (and doesn't)

A request through Citorum touches several components. Each one has a defined responsibility and a defined data scope. None of them ships content to anywhere outside your deployment.

Ingestion

Documents enter Citorum through one of the supported connectors — direct upload, Server Message Block (SMB) and Network File System (NFS) mounts, object storage (Amazon Simple Storage Service-compatible endpoints including MinIO, Pure Storage FlashBlade, NetApp StorageGRID, Dell Elastic Cloud Storage), ownCloud and OpenCloud, Nextcloud, or the ingestion API. Documents land in the document store inside your deployment. Parsed text and metadata land in your Postgres database. Embeddings land in your pgvector indexes. Nothing transits to a Citorum-operated service.

Retrieval

When a user asks a question, the retrieval pipeline runs three lookups in parallel: dense vector similarity over your pgvector indexes, full text search via Postgres' built-in tsvector indexing, and metadata filtering against tags, dates, document types, and access-control attributes. Results are scored, deduplicated, and rank-fused into a context window. All of this happens inside your deployment, against your indexes, using your access-control policies. The user never sees a document they don't have permission to see.

Context assembly and inference

The retrieved context, the user's question, the system prompt, and any conversation history are assembled into the prompt sent to the LLM. The LLM runs on Graphics Processing Units (GPUs) inside your deployment. In default configurations, no outbound network call is made; the prompt and the model's response live entirely on hardware you operate. If you have explicitly configured an external model (for a specific use case where that tradeoff makes sense in your context), the call is made over Transport Layer Security (TLS), is logged in full to your audit store, and is subject to your egress firewall policies and whatever data-loss-prevention controls you have in place.

Adjudication

This is where Citorum diverges sharply from a generic Retrieval-Augmented Generation product. Before any answer reaches the user, it passes through the adjudication pipeline — a multi-signal faithfulness check that scores the answer against the sources it was generated from and assigns one of three labels:

Verified — Cite Source. The ensemble score is high; sources support the claims; no sentence in the answer is contradicted by the corpus. Safe to act on, with citations.
Review Recommended. The signals are mixed. The answer is plausible but a human should sanity-check before relying on it. Most regulated workflows route this label to a reviewer queue.
Do Not Rely — Consult Expert. Either the retrieval score is too low (the corpus didn't have what the question asked), or at least one sentence in the answer is contradicted by the sources, or the ensemble is below the floor. The user sees the label clearly; the answer is not silently presented as authoritative.

The ensemble combines five faithfulness signals — checking source relevance, claim grounding, contradiction, citation overlap, and an independent model judge — and weights them into a single score. A contradiction veto is decisive: if any sentence in the answer is actively contradicted by a cited source, the label drops to Do Not Rely regardless of how the other signals scored. Every per-signal score is recorded.

A fast synchronous gate runs in line with the response so the user sees a label without waiting; the full ensemble runs asynchronously and can refine the label over Server-Sent Events. None of this leaves your deployment. Specific weights, thresholds, and the technique behind each signal are documented in the Security & Architecture Whitepaper.

Response and audit

The model's response is returned to the user with two things attached: citations to the specific document spans that were retrieved and used, and the confidence label from adjudication. At the same moment, a complete audit record is written to the audit store: who asked, what was retrieved (with document IDs and span offsets), what prompt was constructed, what the model returned, what citations were rendered, which model and which version, how long it took, what the request identifier was — and every per-signal adjudication score, so an auditor can reconstruct not just the answer but the system's confidence in the answer at the moment it was given.

The shape of this flow does not change between topologies. The customer boundary is the trust boundary, regardless of where the boundary sits.

Data flow diagram showing one user question moving through Citorum's sovereignty boundary, adjudication cluster and faithfulness label. — Figure 2. Data flow - sovereignty and adjudication. The adjudication pipeline scores every answer's faithfulness before it returns; every signal score is captured in the audit log alongside the response itself.

How isolation works inside your deployment

Citorum's deployment model is single-tenant — one installation per customer, no shared infrastructure with anyone else. The cross-customer leakage threat that dominates multi-tenant Software as a Service (SaaS) products simply does not exist here: there is no other customer's data on your hardware to leak.

The isolation questions a Chief Information Security Officer (CISO) actually needs answered are about what happens inside your deployment — among your own users, departments, and workflows. Four answers anchor that:

Identity-gated retrieval

Every retrieval is gated by the asking user's identity, not by the application layer's good intentions. Citorum integrates with your Identity Provider (IdP) via Security Assertion Markup Language (SAML) 2.0 or OpenID Connect (OIDC), inherits the user's group and role claims, and evaluates document-level access-control policies at retrieval time. A user querying the corpus cannot retrieve a document they would not be allowed to open directly — the same access rules govern both.

Ephemeral request state

Each inference request is processed by a worker that holds, for the duration of that single request, the prompt, the key-value (KV) cache, intermediate scratch buffers, and the streaming response. The moment the request completes, that state is released — not persisted to disk, not held in memory across requests, not visible to the next user's query. Model weights are read-only and shared across requests; everything variable is request-scoped.

Audit captures every answer

Every retrieval, every prompt, every response, every citation, every per-signal adjudication score, and every user action is written to your audit store with a chain of custody linking the answer back to the asking user, the source documents, and the model version. Auditors reconstruct not just what was said but who asked, what was retrieved, and how confident the system was when it said it.

Model lifecycle under your control

Models are loaded from a model store inside your deployment. New model versions are introduced through a controlled rollout that your engineering team approves — never pushed over-the-air by Citorum. The audit log records which model version answered which request, so an answer given in March remains attributable to the model that produced it even if the model has since been replaced.

Adjudication & faithfulness

Every answer is scored against its sources before it returns to the user. An ensemble of grounding signals plus an independent model judge — with a contradiction veto that overrides apparent agreement when any source actively contradicts a claim — assigns one of three labels: Verified — Cite Source, Review Recommended, or Do Not Rely — Consult Expert. Every per-signal score is recorded in the audit log.

Identity & access

Identity stays in your Identity Provider (IdP). Citorum integrates with Security Assertion Markup Language (SAML) 2.0, OpenID Connect (OIDC), and System for Cross-domain Identity Management (SCIM) provisioning. Role-based access control (RBAC) gates collection access, document access, and administrative operations. There is no Citorum-side user database for production deployments.

Encryption

Transport Layer Security (TLS) 1.3 for every network hop, including between internal services. Disk encryption at rest using Advanced Encryption Standard (AES) 256 with keys held in your Key Management Service (KMS) — AWS Key Management Service, Azure Key Vault, GCP Cloud KMS, OCI Vault, or HashiCorp Vault. Application-layer encryption available for sensitive document fields where defense-in-depth requires it.

Audit & lineage

Every retrieval, prompt, response, citation, user action, AND adjudication signal score is logged with a tamper-evident chain. Default retention is seven years; configurable up to your regulatory requirement. Logs export to your Security Information and Event Management (SIEM) — Splunk, Microsoft Sentinel, Sumo Logic, Elastic Security, or any Syslog-receiving system.

Deployment isolation

Each Citorum deployment is single-tenant by construction. Your Postgres, your indexes, your audit log, and your inference workers are yours alone — no other customer shares them. Inside your deployment, role-based access control (RBAC) and document-level access policies enforce isolation between your users, departments, and workflows.

Where we are with certifications

Honest answer: Citorum is at an early stage as a company. We are not yet certified under Service Organization Controls (SOC) 2 Type II, the International Organization for Standardization's ISO/IEC 27001, the Federal Risk and Authorization Management Program (FedRAMP), or the Health Insurance Portability and Accountability Act (HIPAA). We say so plainly because the alternative — implying credentials we don't yet hold — would not meet the bar of the people we sell to.

What is in place today:

The architecture described above. Sovereignty, isolation, audit, and identity are not features we plan to add — they are the system as built.
Cyber liability insurance.
A signed Security and Architecture Whitepaper covering threat model, data flow, key management, incident response, and the controls inventory.
Self-attested responses to the Cloud Security Alliance's Consensus Assessments Initiative Questionnaire Lite (CAIQ Lite) and the Shared Assessments Standardized Information Gathering Lite (SIG Lite), available under non-disclosure agreement (NDA).
An on-demand penetration test arranged through a recognized firm, available to enterprise customers under NDA.

What is in flight:

SOC 2 Type II readiness work begins immediately upon seed close (target: Q4 2026). Audit window opens once the first three paying customers are live. Type II report expected mid-2027.
HIPAA Business Associate Agreement (BAA) capability prioritized for the first healthcare customer engagement.
Region-specific data residency commitments for European Union (EU) customers under GDPR are documented per deployment.

The whitepaper covers the full controls inventory, threat model, and incident-response plan. Most security teams find that, paired with this page, sufficient to move into a paid pilot before the Type II report arrives.

No third-party LLM API calls in default config

Default deployments never call OpenAI, Anthropic, Google, or any other external model API. The model runs on your hardware. If you choose to wire an external model later, that is an explicit, configurable, audit-logged decision.

No model training on your data

Citorum does not train models on customer documents, customer queries, or customer responses. The models you run are the models you chose; their weights do not change because of what your users do with them.

No data egress to Citorum

The Citorum control plane does not receive your corpus, your queries, your responses, or your users' identities. The licensing channel carries license state and (only with your explicit opt-in) aggregate operational telemetry — query counts, latency percentiles, error rates — never content.

No over-the-air model swaps

New model versions arrive through a controlled rollout that your engineering team approves. We do not push model changes to your deployment. Your audit log records which model version answered which request.

Outbound LLM API calls in default config

Inference runs on your hardware; external model calls are explicit-only

Faithfulness signals scored per answer

Grounding, contradiction, citation, and independent-judge checks combined per answer

100%

Query and signal lineage retention

Every retrieval, prompt, response, AND per-signal adjudication score is logged

7+ years

Default audit retention

Configurable up to your regulatory requirement

Supported deployment topologies

On-premises, customer Virtual Private Cloud, Citorum-managed dedicated tenant

Talk to your security team. We've already drafted the answers.

The Security & Architecture Whitepaper covers the threat model, controls inventory, key management, and incident response. Our engineering team takes the rest.

Download the whitepaper Talk to us