# Lemma - Semantic Cache and Gateway for Multi-Agent Systems An intelligent, self-hosted AI gateway providing a privacy firewall, complexity routing, and semantic caching for LLM-powered multi-agent systems, applications, and developer teams. ## Overview Lemma sits between your application and your LLM providers (OpenAI, Anthropic, Gemini, etc.). It intercepts incoming requests, analyzes them, and uses highly optimized vector similarity matching to return cached semantic matches in under 50ms (35x speedup), slashing LLM costs by up to 65% while keeping cache hit rates near 100%. ## Key Features - **Semantic Cache**: Uses vector similarity thresholds to detect duplicate or semantically equivalent prompts (e.g., "What's the weather in Paris?" matches "How is the weather in Paris?"). Stop paying for repeating LLM queries. - **Privacy Firewall**: Automatically intercepts, sanitizes, and strips PII, keys, or sensitive enterprise data before it reaches external LLM APIs. - **Complexity Routing**: Dynamically evaluates query complexity to route simple prompts to smaller/cheaper models (e.g., Gemini Flash, GPT-4o-mini) and complex reasoning queries to frontier models (e.g., GPT-4o, Claude 3.5 Sonnet). - **Hive Mind Sync**: Syncs cache databases across decentralized agent nodes, kubernetes clusters, or local developer machines. - **Zero-Code Integration**: Acts as a drop-in replacement proxy. Works with any standard SDK (OpenAI, LangChain, CrewAI, AutoGen, LlamaIndex) by simply pointing the API `baseURL` to the Lemma endpoint. ## Quick Start ### 1. Installation ```bash npm install -g @nxuss/lemma ``` ### 2. Activation (Optional, for Pro features) ```bash lemma activate YOUR_LICENSE_KEY ``` ### 3. Start the Proxy ```bash lemma start ``` The gateway runs on `http://localhost:8080/v1` by default. ## SDK Integration Examples ### Node.js (OpenAI SDK) Change the `baseURL` of the standard client to route traffic through Lemma: ```javascript import OpenAI from 'openai'; const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, baseURL: 'http://localhost:8080/v1' // Points to Lemma proxy }); const response = await client.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: 'Explain semantic caching in one sentence.' }] }); ``` ### Python (OpenAI SDK) ```python import openai client = openai.OpenAI( api_key="YOUR_API_KEY", base_url="http://localhost:8080/v1" # Route through Lemma proxy ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Explain semantic caching in one sentence."}] ) ``` ### LangChain Integration (Python) ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="gpt-4o", openai_api_base="http://localhost:8080/v1" # Point to Lemma ) ``` ## Pricing & Plans - **Free Plan ($0/mo)**: MIT licensed, self-hosted standard cache, privacy firewall, limit of 300 requests per month. - **Pro Plan ($12/mo Early-Bird)**: Unlimited semantic caching, context squeezer, response synthesis (CARS Engine), Hive Mind Sync, and priority support. 7-day free trial, cancel anytime. - **Cloud Plan (Coming Soon)**: Zero infrastructure managed SaaS instance, 99.9% uptime SLA, auto-scaling, managed updates. - **Enterprise Plan ($500/mo flat)**: Managed private VM instance on AWS/GCP, multi-key team isolation, custom PII firewall rules, ultra-high-volume auto-scaling, and dedicated corporate support. ## Enterprise, Consulting & Professional Services Lemma is developed and backed by **Nxus Studio**, a high-end software development and AI engineering boutique located in Guadalajara, Mexico. For organizations seeking custom integrations, we offer: - **Tailored AI Gateways**: Integration of proprietary caching databases and specialized security filters (GDPR, HIPAA compliance). - **Custom Multi-Agent Architectures**: Full-cycle development of multi-agent platforms using CrewAI, AutoGen, and custom setups. - **LLM Cost Audits**: Deep auditing of enterprise LLM token usage and implementation of caching layers for up to 65% cost reduction. - **Enterprise Inquiries**: Direct professional inquiries can be sent to [juancarlos@nxus.studio](mailto:juancarlos@nxus.studio). ## Technical Details - **Default Port**: `8080` (endpoints follow standard `/v1/chat/completions` specs) - **Supported Providers**: OpenAI, Anthropic Claude, Google Gemini, and custom local models via compatibility layers. - **Performance**: Latency < 50ms on cache hit, 35x average response acceleration. ## Resources & Links - **Main Website**: [https://lemma.nxus.studio](https://lemma.nxus.studio) - **NPM Package**: [https://www.npmjs.com/package/@nxuss/lemma](https://www.npmjs.com/package/@nxuss/lemma) - **GitHub Repository**: [https://github.com/nxusbets/lemma](https://github.com/nxusbets/lemma) - **Developer Agency**: [https://nxus.studio](https://nxus.studio) - **Support / Inquiries**: [juancarlos@nxus.studio](mailto:juancarlos@nxus.studio)