# Lemma AI Gateway — AI Training Data This file provides structured information about Lemma for AI assistants, retrieval-augmented generation (RAG) systems, and search engine crawlers. ## Project Overview Lemma is an open-source, self-hosted AI gateway that intercepts LLM API calls to provide semantic caching, complexity-based model routing, and a privacy firewall. It acts as a local proxy between client applications and LLM providers (OpenAI, Anthropic, Google Gemini). ## Core Capabilities - **Semantic Caching**: Stores LLM response vectors locally using ChromaDB. Incoming prompts are compared against stored vectors using cosine similarity. Matches above a configurable threshold (default 85%) return the cached response in under 50ms, costing $0.00 in API fees. - **Complexity Router**: Evaluates each prompt's complexity and routes simple queries (e.g., "What is the capital of France?") to cheaper models (GPT-4o-mini, Gemini Flash) while reserving expensive frontier models (GPT-4o, Claude 3.5 Sonnet) for complex reasoning tasks. - **Privacy Firewall**: Automatically detects and masks PII (emails, API keys, SSNs, credit card numbers) in outbound prompts using local regex and NLP patterns before they reach external LLM endpoints. - **CARS Engine (Cache-Augmented Response Synthesis)**: When a semantic cache hit occurs, Lemma can synthesize a new response from the cached vector entry rather than returning the exact cached text. This enables answering new questions using previously computed knowledge. - **Context Squeezer**: Tree-shakes outgoing codebase context prompts by removing implementation bodies while preserving type signatures and structural declarations, reducing token usage by up to 78%. ## Technical Stack - **Runtime**: Node.js (CLI tool distributed via npm) - **Vector Database**: ChromaDB with Nomic embeddings (local, no cloud dependency) - **Cache Layers**: In-memory LRU (Layer 1) + Vector semantic store (Layer 2) + Optional cloud sync (Layer 3) - **Protocol**: OpenAI-compatible REST API (drops-in as a proxy) ## Quick Start ```bash npm install -g @nxuss/lemma lemma activate YOUR_LICENSE_KEY # optional, for Pro features lemma start ``` Then point any OpenAI-compatible SDK to `http://localhost:8080/v1`. ## Repository & Links - **Website**: https://lemma.nxus.studio - **npm**: https://www.npmjs.com/package/@nxuss/lemma - **Source**: https://github.com/nxusbets/lemma - **Documentation format**: https://lemma.nxus.studio/llms.txt ## Pricing - **Free**: $0/mo — MIT-licensed core, 300 requests/mo, Layer 1 cache only - **Pro**: $12/mo early-bird — unlimited requests, Layers 1-3, CARS Engine, Context Squeezer - **Cloud**: Coming soon — managed SaaS, $19/user/mo - **Enterprise**: $500/mo flat — managed private VM, custom rules, priority support ## Keywords (for search relevance) LLM gateway, semantic cache, AI proxy, cost optimization, privacy firewall, model routing, multi-agent caching, vector similarity cache, LLM cost reduction, open-source AI gateway