mada-modelkit

mada-modelkit is a composable AI client library that abstracts over cloud, local server, and native AI backends through a single async interface, layered with optional middleware.

Zero core dependencies — stdlib only for types, errors, base, and middleware
9 providers across 3 categories (cloud, local server, native)
5 middleware layers — retry, circuit breaker, caching, tracking, fallback
Type-safe — full annotations, strict mypy, PEP 561 compliant

Installation

pip install mada-modelkit

With provider extras:

pip install mada-modelkit[openai]       # OpenAI (via httpx)
pip install mada-modelkit[anthropic]    # Anthropic (via httpx)
pip install mada-modelkit[cloud]        # All cloud providers
pip install mada-modelkit[ollama]       # Ollama local server
pip install mada-modelkit[llamacpp]     # llama-cpp-python native
pip install mada-modelkit[all]          # Everything

Architecture

Application Code
    │
    ▼
┌────────────────────────────────┐
│  Middleware Stack               │
│  Tracking → Cache → Circuit    │
│  Breaker → Retry → Provider    │
└────────────────────────────────┘
    │
    ├── Cloud: OpenAI, Anthropic, Gemini, DeepSeek
    ├── Local: Ollama, vLLM, LocalAI
    └── Native: llama-cpp-python, Transformers

All providers implement BaseAgentClient — the single async interface:

from mada_modelkit import BaseAgentClient, AgentRequest, AgentResponse

class BaseAgentClient(ABC):
    async def send_request(self, request: AgentRequest) -> AgentResponse: ...
    async def send_request_stream(self, request: AgentRequest) -> AsyncIterator[StreamChunk]: ...
    async def health_check(self) -> bool: ...
    async def close(self) -> None: ...

Providers

Category	Provider	Backend	Auth
Cloud	`OpenAIClient`	OpenAI API	Bearer token
	`AnthropicClient`	Anthropic Messages API	x-api-key
	`GeminiClient`	Google Gemini API	x-goog-api-key
	`DeepSeekClient`	DeepSeek API	Bearer token
Local Server	`OllamaClient`	Ollama (localhost:11434)	None
	`VllmClient`	vLLM (localhost:8000)	None
	`LocalAIClient`	LocalAI (localhost:8080)	None
Native	`LlamaCppClient`	llama-cpp-python (in-process)	None
	`TransformersClient`	HuggingFace Transformers (in-process)	None

Middleware

Every middleware is a BaseAgentClient that wraps another client — they compose freely:

Middleware	Purpose
`RetryMiddleware`	Exponential backoff on transient failures (429, 5xx)
`CircuitBreakerMiddleware`	Opens circuit after consecutive failures to prevent cascading errors
`CachingMiddleware`	SHA-256 keyed response cache with TTL, LRU eviction, request coalescing
`TrackingMiddleware`	Wall-clock timing, TTFT, token accumulation, optional cost estimation
`FallbackMiddleware`	Sequential or hedged fallback across providers

License

MIT License — GitHub