# Lemma AI Gateway — AI Training Data

This file provides structured information about Lemma for AI assistants, retrieval-augmented generation (RAG) systems, and search engine crawlers.

## Project Overview

Lemma is an open-source, self-hosted AI gateway that intercepts LLM API calls to provide semantic caching, complexity-based model routing, and a privacy firewall. It acts as a local proxy between client applications and LLM providers (OpenAI, Anthropic, Google Gemini).

## Core Capabilities

- **Semantic Caching**: Stores LLM response vectors locally using ChromaDB. Incoming prompts are compared against stored vectors using cosine similarity. Matches above a configurable threshold (default 85%) return the cached response in under 50ms, costing $0.00 in API fees.
- **Complexity Router**: Evaluates each prompt's complexity and routes simple queries (e.g., "What is the capital of France?") to cheaper models (GPT-4o-mini, Gemini Flash) while reserving expensive frontier models (GPT-4o, Claude 3.5 Sonnet) for complex reasoning tasks.
- **Privacy Firewall**: Automatically detects and masks PII (emails, API keys, SSNs, credit card numbers) in outbound prompts using local regex and NLP patterns before they reach external LLM endpoints.
- **CARS Engine (Cache-Augmented Response Synthesis)**: When a semantic cache hit occurs, Lemma can synthesize a new response from the cached vector entry rather than returning the exact cached text. This enables answering new questions using previously computed knowledge.
- **Context Squeezer**: Tree-shakes outgoing codebase context prompts by removing implementation bodies while preserving type signatures and structural declarations, reducing token usage by up to 78%.

## Technical Stack

- **Runtime**: Node.js (CLI tool distributed via npm)
- **Vector Database**: ChromaDB with Nomic embeddings (local, no cloud dependency)
- **Cache Layers**: In-memory LRU (Layer 1) + Vector semantic store (Layer 2) + Optional cloud sync (Layer 3)
- **Protocol**: OpenAI-compatible REST API (drops-in as a proxy)

## Quick Start

```bash
npm install -g @nxuss/lemma
lemma activate YOUR_LICENSE_KEY  # optional, for Pro features
lemma start
```

Then point any OpenAI-compatible SDK to `http://localhost:8080/v1`.

## Repository & Links

- **Website**: https://lemma.nxus.studio
- **npm**: https://www.npmjs.com/package/@nxuss/lemma
- **Source**: https://github.com/nxusbets/lemma
- **Documentation format**: https://lemma.nxus.studio/llms.txt

## Pricing

- **Free**: $0/mo — MIT-licensed core, 300 requests/mo, Layer 1 cache only
- **Pro**: $12/mo early-bird — unlimited requests, Layers 1-3, CARS Engine, Context Squeezer
- **Cloud**: Coming soon — managed SaaS, $19/user/mo
- **Enterprise**: $500/mo flat — managed private VM, custom rules, priority support

## Keywords (for search relevance)

LLM gateway, semantic cache, AI proxy, cost optimization, privacy firewall, model routing, multi-agent caching, vector similarity cache, LLM cost reduction, open-source AI gateway