Architecture
This document provides a high-level overview of MFP’s internal design. It’s intended for contributors and users who want to understand how the protocol works under the hood.
Design Philosophy
Section titled “Design Philosophy”Library-first, runtime-mediated, symmetric by default.
MFP is not a client/server system. It’s a protocol for peer-to-peer communication where every agent is treated equally. The runtime acts as a neutral mediator, enforcing security constraints without imposing workflow hierarchy.
Key principles:
- No trust in LLMs — Cryptographic validation happens before message delivery, not after
- Symmetric peers — All agents use the same API and have identical capabilities
- Deterministic enforcement — Security constraints are mathematical, not policy-based
- Library-first — The runtime embeds in any Python process; the standalone server is a thin wrapper
Layered Architecture
Section titled “Layered Architecture”MFP is organized into six layers, from lowest to highest abstraction:
┌──────────────────────────────────────────────────────────┐│ Application │ ← User code│ (Agent callables, business logic) │└──────────────────────────────────────────────────────────┘ │ ▼┌──────────────────────────────────────────────────────────┐│ Agent Lifecycle │ ← bind/unbind, tools│ (identity.py, lifecycle.py, tools.py) │└──────────────────────────────────────────────────────────┘ │ ▼┌──────────────────────────────────────────────────────────┐│ Runtime │ ← Pipeline, quarantine│ (runtime.py, pipeline.py, channels.py, quarantine.py) │└──────────────────────────────────────────────────────────┘ │ ┌─────────────┼─────────────┐ ▼ ▼ ▼┌──────────────┐ ┌─────────────┐ ┌──────────────┐│ Core │ │ Storage │ │ Observability│ ← Production│ (crypto, │ │ (SQLite, │ │ (logging, │ hardening│ frames, │ │ schema, │ │ metrics, ││ ratchet, │ │ crypto) │ │ health, ││ types) │ │ │ │ circuit ││ │ │ │ │ breakers) │└──────────────┘ └─────────────┘ └──────────────┘ │ │ │ └─────────────┴─────────────┘ ▼ ┌─────────────────────────────┐ │ Federation │ ← Cross-runtime │ (bilateral, transport, │ communication │ recovery, wire protocol) │ └─────────────────────────────┘Layer 1: Core
Section titled “Layer 1: Core”Modules: mfp/core/
The foundation layer providing cryptographic primitives and frame construction.
primitives.py
Section titled “primitives.py”Low-level cryptographic operations:
- ChaCha20-Poly1305 — AEAD encryption/decryption
- HMAC-SHA256 — Message authentication
- X25519 — Diffie-Hellman key exchange
- SHA-256 — Hashing and key derivation
All primitives use the cryptography library.
frame.py
Section titled “frame.py”Frame construction and validation:
- build_frame() — Constructs symmetric mirror frame (
2kblocks) - validate_frame() — Verifies frame integrity and temporal validity
- strip_frame() — Removes validated frame, returns payload
Frames are deterministic functions of (step, local_state, global_state).
ratchet.py
Section titled “ratchet.py”State evolution:
- advance_ratchet() — One-way state chain:
S_n = f(S_{n-1}, frame_n) - derive_frame_seed() — Generates cryptographic seed from ratchet state
Uses HMAC-based ratcheting for forward secrecy.
encoding.py
Section titled “encoding.py”Payload transformation:
- encode_payload() — Encrypts plaintext with ChaCha20-Poly1305
- decode_payload() — Decrypts ciphertext, verifies authentication tag
types.py
Section titled “types.py”Core type definitions:
AgentId,ChannelId,MessageId(32-byte identifiers)AgentStatus(BOUND, ACTIVE, QUARANTINED, UNBOUND)StateValue(ratchet state, 32 bytes)- Error types and enums
Layer 2: Storage
Section titled “Layer 2: Storage”Modules: mfp/storage/
Persistent state management via SQLite.
engine.py
Section titled “engine.py”Storage interface:
- create_agent() / delete_agent() — Agent lifecycle
- create_channel() / get_channel() — Channel records
- enqueue_message() / dequeue_message() — Message queuing
- update_ratchet() — Ratchet state persistence
All operations are transactional.
schema.py
Section titled “schema.py”SQLite schema definitions:
agents— Agent identity, status, quarantine flagschannels— Channel metadata, peer mappings, ratchet statemessages— Pending message queue (FIFO)ratchets— Local and global ratchet states
crypto.py
Section titled “crypto.py”At-rest encryption (optional):
- encrypt_value() / decrypt_value() — Database field encryption
- Uses ChaCha20-Poly1305 with a master key from
storage.master_key_file
Layer 3: Runtime
Section titled “Layer 3: Runtime”Modules: mfp/runtime/
Execution orchestration and security enforcement.
runtime.py
Section titled “runtime.py”Central coordinator:
- Runtime.bind() — Register agent, assign
AgentId - Runtime.unbind() — Deregister agent, close channels
- Runtime.deliver() — Route message to agent callable
- Runtime.shutdown() — Graceful teardown
Holds the StorageEngine and global ratchet state.
pipeline.py
Section titled “pipeline.py”Agent callable wrapper:
- Validates agent state before message delivery
- Invokes agent callable with
(channel_id, plaintext_message) - Captures return value for hooks/logging
- Enforces retry logic and error handling
channels.py
Section titled “channels.py”Channel management:
- establish_channel() — Create new channel, initialize ratchet
- close_channel() — Tear down channel, flush pending messages
- derive_channel_id() — Deterministic ID from
(agent_a, agent_b, nonce)
quarantine.py
Section titled “quarantine.py”Security isolation:
- check_rate_limit() — Enforce
max_message_rate - check_payload_size() — Enforce
max_payload_size - trigger_quarantine() — Transition agent to
QUARANTINEDstate
Quarantined agents cannot send/receive messages until administratively reset.
Layer 4: Agent Lifecycle
Section titled “Layer 4: Agent Lifecycle”Modules: mfp/agent/
User-facing API for agent operations.
lifecycle.py
Section titled “lifecycle.py”Agent binding and handle management:
- bind() — Register agent callable with runtime, return
AgentHandle - unbind() — Deregister agent
- AgentHandle.establish_channel() — Create channel to peer
identity.py
Section titled “identity.py”Agent identity derivation:
- derive_agent_id() — Deterministic ID from
(runtime_id, agent_callable) - Uses HMAC-SHA256 for stable, unique identifiers
tools.py
Section titled “tools.py”Protocol tools (agent-facing API):
- mfp_send() — Send encrypted message on channel
- mfp_channels() — List all channels for agent
- mfp_status() — Query agent status (state, channel count, pending messages)
These are the primary functions LLM agents invoke.
Layer 5: Federation
Section titled “Layer 5: Federation”Modules: mfp/federation/
Cross-runtime communication and recovery.
bilateral.py
Section titled “bilateral.py”Bilateral channel establishment:
- bootstrap_deterministic() — Derive shared key from runtime IDs
- bootstrap_ceremonial() — X25519 Diffie-Hellman key exchange
- BilateralChannel — Cross-runtime channel abstraction
transport.py
Section titled “transport.py”TCP message transport:
- TransportServer — Listen for incoming envelopes
- TransportClient — Send envelopes to remote runtime
- Asynchronous I/O with configurable timeouts
wire.py
Section titled “wire.py”Wire protocol:
- build_envelope_header() — Frame + routing metadata
- validate_envelope() — Verify integrity, decrypt payload
- Uses the same frame/ratchet mechanism as local channels
recovery.py
Section titled “recovery.py”State synchronization:
- detect_divergence() — Identify ratchet state mismatch
- negotiate_recovery() — Exchange state proofs, find common ancestor
- resync() — Fast-forward ratchet to converge
Prevents federation from stalling due to dropped messages.
Message Flow
Section titled “Message Flow”Local Send (Alice → Bob, same runtime)
Section titled “Local Send (Alice → Bob, same runtime)”- Application calls
mfp_send(alice_handle, channel_id, plaintext) - Agent tools validate handle state, lookup channel
- Runtime pipeline retrieves ratchet state from storage
- Core frame builds mirror frame from
(step, local_state, global_state) - Core encoding encrypts plaintext with channel key
- Core frame wraps payload in frame:
[open | ciphertext | close] - Storage persists message to Bob’s queue, advances ratchet
- Runtime delivers to Bob’s callable:
bob(channel_id, plaintext) - Return value captured for logging/hooks
Federated Send (Alice → Charlie, remote runtime)
Section titled “Federated Send (Alice → Charlie, remote runtime)”- Application calls
mfp_send(alice_handle, bilateral_channel_id, plaintext) - Agent tools identify channel as bilateral (cross-runtime)
- Federation bilateral retrieves shared federation key
- Core frame builds frame using bilateral ratchet state
- Core encoding encrypts plaintext
- Federation wire constructs envelope with routing header
- Federation transport sends envelope over TCP to remote runtime
- Remote runtime receives envelope, validates frame
- Remote runtime delivers to Charlie’s callable
- Recovery monitors for sequence gaps, triggers resync if needed
Security Model
Section titled “Security Model”Threat Mitigation
Section titled “Threat Mitigation”Frame validation prevents:
- Prompt injection (invalid frames rejected before payload delivery)
- Replay attacks (temporal ratchet invalidates old frames)
- Forgery (frames cryptographically bound to ratchet state)
Encryption provides:
- Confidentiality (ChaCha20-Poly1305 AEAD)
- Authenticity (HMAC verification)
- Forward secrecy (ratchet never reverses)
Quarantine protects against:
- Malicious agents (rate/size limits)
- Corrupted state (validation failure thresholds)
Non-Goals
Section titled “Non-Goals”MFP does not protect against:
- Compromised runtime (trusted computing base)
- Side-channel attacks (timing, memory access patterns)
- Physical access to storage (at-rest encryption is optional)
See Security Model for full threat analysis.
Design Decisions
Section titled “Design Decisions”Why SQLite?
Section titled “Why SQLite?”- Transactional: ACID guarantees for ratchet state
- Embedded: No external database server
- Simple: Single file, easy backup/restore
- WAL mode: Concurrent reads during writes
Why Symmetric Frames?
Section titled “Why Symmetric Frames?”- Structural validation: LLMs can’t bypass mathematical constraints
- Deterministic: No policy decisions, no edge cases
- Efficient: O(k) validation, constant-size state
Why Ratchet Instead of Logs?
Section titled “Why Ratchet Instead of Logs?”- Bounded state: Fixed 32 bytes per channel, regardless of history
- Forward secrecy: Old states can’t be recovered
- Federation-friendly: Only current state needs synchronization
Why Library-First?
Section titled “Why Library-First?”- Composable: Embed in any application
- Testable: Pure functions, no global state
- Portable: No daemon dependencies
Testing Strategy
Section titled “Testing Strategy”604 tests across 6 test modules:
- Unit tests — Core primitives, frame logic, ratchet evolution
- Integration tests — Runtime + storage + agents
- E2E tests — Full message flows, quarantine triggers, federation
- Property tests — Frame symmetry, ratchet monotonicity
Coverage: ~95% (excluding server.py boilerplate).
Performance Characteristics
Section titled “Performance Characteristics”Benchmarks (single runtime, 2 agents, 10k messages):
- Frame construction: ~50 µs/message
- Encryption/decryption: ~20 µs/message
- SQLite write: ~100 µs/message
- End-to-end latency: ~200 µs/message
Federation overhead:
- TCP transport: +5-50ms (network latency)
- Recovery protocol: ~100ms (worst case, 3 round-trips)
Memory:
- Runtime: ~5 MB baseline
- Per-agent overhead: ~1 KB (ratchet state + metadata)
- Per-message queue: ~1 KB/message
Extension Points
Section titled “Extension Points”Custom Agent Types
Section titled “Custom Agent Types”Implement the agent callable signature:
def agent(channel_id: bytes, message: bytes) -> dict: # Your logic here return {"status": "ok"}Lifecycle Hooks
Section titled “Lifecycle Hooks”Enable hooks in RuntimeConfig(enable_hooks=True):
before_send— Pre-validation hookafter_send— Post-delivery hookon_quarantine— Isolation event
Storage Backends
Section titled “Storage Backends”Subclass StorageEngine to use alternative backends (Postgres, DynamoDB, etc.).
Transport Protocols
Section titled “Transport Protocols”Subclass TransportServer for non-TCP transports (WebSocket, QUIC, etc.).
See Also
Section titled “See Also”- API Reference — public interface
- Server Guide — standalone server setup
- Security Model — threat analysis
- Contributing — development guidelines
- Protocol design specs in
design/
Layer 6: Observability & Performance (v0.2.0+)
Section titled “Layer 6: Observability & Performance (v0.2.0+)”Modules: mfp/observability/, mfp/core/merkle.py, mfp/federation/rotation.py, mfp/runtime/deduplication.py
Production hardening features for monitoring, resilience, and performance.
Observability (mfp/observability/)
Section titled “Observability (mfp/observability/)”logging.py — Structured logging:
- LogContext with correlation IDs
- JSON/text format support
- Audit events for security operations
health.py — Health check endpoints:
/health/live— Liveness probe (K8s)/health/ready— Readiness probe (K8s)/health/startup— Startup probe (K8s)
metrics.py — Prometheus metrics:
- Counters: messages, failures, quarantine events
- Gauges: active channels, agents, queue depth
- Histograms: pipeline duration, Sg computation time
circuit_breaker.py — Resilience pattern:
- 3-state FSM (CLOSED/OPEN/HALF_OPEN)
- Prevents cascading failures
- Used for storage and bilateral channels
timeout.py — Time limits:
- Thread-based timeout enforcement
- Agent, pipeline, storage timeouts
- Automatic quarantine on timeout
Performance Optimizations
Section titled “Performance Optimizations”merkle.py — Incremental Sg (⚠️ Breaking):
- Merkle tree for O(log N) Sg updates
- 10x faster than v0.1.x for 1000+ channels
- <1ms Sg computation
frame.py (enhanced) — Frame caching:
- LRU cache for deterministic frames
- 16-17x speedup for cache hits
- 90% hit rate in recovery scenarios
transport.py (enhanced) — Connection pooling:
- TCP connection reuse
- Idle eviction (5 min), lifetime limits (1 hour)
- Background cleanup task
Security Hardening
Section titled “Security Hardening”rotation.py — Key rotation:
- X25519 DH-based rekey protocol
- Triggers: message count (1M), time (24h), manual
- Forward secrecy with ephemeral keypairs
deduplication.py — Replay protection:
- Per-channel message ID tracking
- O(1) duplicate detection
- Sliding window + TTL eviction
Production Features Summary
Section titled “Production Features Summary”| Feature | Module | Benefit |
|---|---|---|
| Structured Logging | observability/logging.py | Request tracing, audit trail |
| Health Checks | observability/health.py | K8s integration, uptime monitoring |
| Metrics | observability/metrics.py | Performance monitoring, alerting |
| Circuit Breakers | observability/circuit_breaker.py | Resilience, failure isolation |
| Timeouts | observability/timeout.py | Prevents blocking, auto-quarantine |
| Merkle Tree | core/merkle.py | 10x faster Sg (O(log N)) |
| Frame Caching | core/frame.py | 16x speedup for recovery |
| Connection Pool | federation/transport.py | Reduced handshake overhead |
| Key Rotation | federation/rotation.py | Forward secrecy, key fatigue mitigation |
| Deduplication | runtime/deduplication.py | Replay attack prevention |
Test Coverage: 813 tests (591 unit, 191 integration, 27 E2E, 4 benchmark)
See production-guide.md for deployment details.