mada-modelkit
mada-modelkit is a composable AI client library that abstracts over cloud, local server, and native AI backends through a single async interface, layered with optional middleware.
- Zero core dependencies — stdlib only for types, errors, base, and middleware
- 9 providers across 3 categories (cloud, local server, native)
- 5 middleware layers — retry, circuit breaker, caching, tracking, fallback
- Type-safe — full annotations, strict mypy, PEP 561 compliant
Installation
Section titled “Installation”pip install mada-modelkitWith provider extras:
pip install mada-modelkit[openai] # OpenAI (via httpx)pip install mada-modelkit[anthropic] # Anthropic (via httpx)pip install mada-modelkit[cloud] # All cloud providerspip install mada-modelkit[ollama] # Ollama local serverpip install mada-modelkit[llamacpp] # llama-cpp-python nativepip install mada-modelkit[all] # EverythingArchitecture
Section titled “Architecture”Application Code │ ▼┌────────────────────────────────┐│ Middleware Stack ││ Tracking → Cache → Circuit ││ Breaker → Retry → Provider │└────────────────────────────────┘ │ ├── Cloud: OpenAI, Anthropic, Gemini, DeepSeek ├── Local: Ollama, vLLM, LocalAI └── Native: llama-cpp-python, TransformersAll providers implement BaseAgentClient — the single async interface:
from mada_modelkit import BaseAgentClient, AgentRequest, AgentResponse
class BaseAgentClient(ABC): async def send_request(self, request: AgentRequest) -> AgentResponse: ... async def send_request_stream(self, request: AgentRequest) -> AsyncIterator[StreamChunk]: ... async def health_check(self) -> bool: ... async def close(self) -> None: ...Providers
Section titled “Providers”| Category | Provider | Backend | Auth |
|---|---|---|---|
| Cloud | OpenAIClient | OpenAI API | Bearer token |
AnthropicClient | Anthropic Messages API | x-api-key | |
GeminiClient | Google Gemini API | x-goog-api-key | |
DeepSeekClient | DeepSeek API | Bearer token | |
| Local Server | OllamaClient | Ollama (localhost:11434) | None |
VllmClient | vLLM (localhost:8000) | None | |
LocalAIClient | LocalAI (localhost:8080) | None | |
| Native | LlamaCppClient | llama-cpp-python (in-process) | None |
TransformersClient | HuggingFace Transformers (in-process) | None |
Middleware
Section titled “Middleware”Every middleware is a BaseAgentClient that wraps another client — they compose freely:
| Middleware | Purpose |
|---|---|
RetryMiddleware | Exponential backoff on transient failures (429, 5xx) |
CircuitBreakerMiddleware | Opens circuit after consecutive failures to prevent cascading errors |
CachingMiddleware | SHA-256 keyed response cache with TTL, LRU eviction, request coalescing |
TrackingMiddleware | Wall-clock timing, TTFT, token accumulation, optional cost estimation |
FallbackMiddleware | Sequential or hedged fallback across providers |
License
Section titled “License”MIT License — GitHub