Skip to content

mada-modelkit

mada-modelkit is a composable AI client library that abstracts over cloud, local server, and native AI backends through a single async interface, layered with optional middleware.

  • Zero core dependencies — stdlib only for types, errors, base, and middleware
  • 9 providers across 3 categories (cloud, local server, native)
  • 5 middleware layers — retry, circuit breaker, caching, tracking, fallback
  • Type-safe — full annotations, strict mypy, PEP 561 compliant
Terminal window
pip install mada-modelkit

With provider extras:

Terminal window
pip install mada-modelkit[openai] # OpenAI (via httpx)
pip install mada-modelkit[anthropic] # Anthropic (via httpx)
pip install mada-modelkit[cloud] # All cloud providers
pip install mada-modelkit[ollama] # Ollama local server
pip install mada-modelkit[llamacpp] # llama-cpp-python native
pip install mada-modelkit[all] # Everything
Application Code
┌────────────────────────────────┐
│ Middleware Stack │
│ Tracking → Cache → Circuit │
│ Breaker → Retry → Provider │
└────────────────────────────────┘
├── Cloud: OpenAI, Anthropic, Gemini, DeepSeek
├── Local: Ollama, vLLM, LocalAI
└── Native: llama-cpp-python, Transformers

All providers implement BaseAgentClient — the single async interface:

from mada_modelkit import BaseAgentClient, AgentRequest, AgentResponse
class BaseAgentClient(ABC):
async def send_request(self, request: AgentRequest) -> AgentResponse: ...
async def send_request_stream(self, request: AgentRequest) -> AsyncIterator[StreamChunk]: ...
async def health_check(self) -> bool: ...
async def close(self) -> None: ...
CategoryProviderBackendAuth
CloudOpenAIClientOpenAI APIBearer token
AnthropicClientAnthropic Messages APIx-api-key
GeminiClientGoogle Gemini APIx-goog-api-key
DeepSeekClientDeepSeek APIBearer token
Local ServerOllamaClientOllama (localhost:11434)None
VllmClientvLLM (localhost:8000)None
LocalAIClientLocalAI (localhost:8080)None
NativeLlamaCppClientllama-cpp-python (in-process)None
TransformersClientHuggingFace Transformers (in-process)None

Every middleware is a BaseAgentClient that wraps another client — they compose freely:

MiddlewarePurpose
RetryMiddlewareExponential backoff on transient failures (429, 5xx)
CircuitBreakerMiddlewareOpens circuit after consecutive failures to prevent cascading errors
CachingMiddlewareSHA-256 keyed response cache with TTL, LRU eviction, request coalescing
TrackingMiddlewareWall-clock timing, TTFT, token accumulation, optional cost estimation
FallbackMiddlewareSequential or hedged fallback across providers

MIT License — GitHub