Self-Hosted • Pro Proxy • Cloud Waitlist

Stop paying for the
same LLM call twice

Zero-code semantic cache for multi-agent systems. Share LLM responses across agents, reduce costs by 65%, and get 35x faster responses.

terminal
$ lemma-proxy start
Running multi-agent demo...
Agent 1: "Capital of France?"
→ LLM call: 1,247ms | Cost: $0.002
→ Response cached ✓
Agent 2: "Capital city?"
→ Cache hit: 35ms | Cost: $0.000
✓ 35x faster • 100% cost savings

The Problem

Multi-agent systems make redundant LLM calls. When you have 10 agents asking similar questions, you're paying for 10 separate API calls.

Multiplied API costs
Slow response times
Wasted compute resources
Rate limit headaches
Without Lemma:
Agent 1
$$$
Agent 2
$$$
Agent 3
$$$
Agent 4
$$$

The Solution

Lemma caches the first LLM response and serves it instantly to all agents asking similar questions. Semantic matching means you don't need exact queries.

First call → LLM (cached)
Similar calls → Cache (instant)
65% cost reduction
35x faster responses
With Lemma:
Agent 1
$
Agent 2
Agent 3
Agent 4

Real Performance Metrics

Measured in production multi-agent systems

35x
Faster
1,247ms → 35ms average response time
65%
Cost Reduction
Save on redundant LLM API calls
100%
Cache Hit Rate
On semantically similar queries
<50ms
Latency
Average cache response time

* Benchmarked with GPT-4 on a system with 10 concurrent agents

How It Works

Three simple steps to save time and money

1

Agent 1 asks a question

First agent makes an LLM call. The response is generated and cached with semantic embeddings.

2

Response is cached

Lemma stores the response with vector embeddings in ChromaDB for semantic matching.

3

Similar queries hit cache

When Agent 2 asks a similar question, Lemma returns the cached response instantly. No LLM call needed.

Real-time Flow

Watch how queries are processed

Agent 1
Query: "What is AI?"
LLM (1,247ms)
Cache ✓
Agent 2
Query: "Explain AI"
Cache Hit (35ms) ⚡
Agent 3
Query: "What's AI?"
Cache Hit (32ms) ⚡
Result: 2 out of 3 queries served from cache • 65% cost savings

Built for Production

Everything you need for enterprise-grade caching

PRO

Semantic Matching

No exact match required. Lemma understands query intent using vector embeddings.

PRO

Zero-Code Proxy

Connect any agent system by simply changing the API base URL. No SDK required.

Multi-Agent Orchestration

Built for systems with multiple agents. Share cache across all agents seamlessly.

Real-time WebSocket

Low-latency communication between agents and cache server. Sub-50ms responses.

Self-Hosted Core

Publicly auditable engine. Inspect the code on npm, contribute, or run on your own hardware.

LLM Agnostic

Works with any LLM provider: OpenAI, Anthropic, local models, or custom APIs.

Perfect For

Multi-Agent Systems

CrewAI, AutoGPT, LangChain agents working together? Lemma ensures they share knowledge efficiently.

CrewAIAutoGPTLangChain

Production Chatbots

Multiple chatbot instances handling similar queries? Cache common responses across all instances.

Customer SupportFAQ Bots

Development & Testing

Testing agent behaviors? Stop wasting API credits on repetitive test queries.

Unit TestsIntegration Tests

Repetitive Workflows

Applications with predictable query patterns benefit from near-instant cache hits.

Data AnalysisReport Generation

Zero-Code Integration

Get up and running in under 2 minutes

1

Install Lemma Proxy

npm install @nxuss/lemma
2

Activate Pro (Optional)

lemma-proxy activate YOUR_LICENSE_KEY

Required for Semantic Matching and Zero-code proxy features.

3

Start the Proxy

lemma-proxy start

Proxy starts on localhost:8080

Zero-Code Usage

// No SDK needed! Just point your OpenAI base URL to Lemma
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'http://localhost:8080/v1' 
});

// Lemma intercepts, caches semantically, and saves you $$
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'What is the capital of France?' }]
});

* Works with any library (OpenAI, LangChain, CrewAI) by just changing the baseURL.

Choose Your Path

Start with Community, power up with Pro Proxy, or scale with Cloud

Self-Hosted

Community

Self-hosted core for developers

Free
Install via npm
  • Exact match caching
  • Local data privacy
  • Manual SDK integration
  • Community support
  • No Semantic matching
  • No Zero-code proxy
  • No Web Dashboard
Recommended

Pro Proxy

The universal zero-code proxy

$12/month

$120 / year (Save 17% - 2 months free)

  • Semantic matching enabled
  • Zero-code CLI integration
  • Local Web Dashboard
  • Infinite local history
  • Multi-agent sharing
  • Email support
Coming Soon

Cloud Starter

Managed infra, zero maintenance

$19/month
Join Waitlist
  • Everything in Pro Proxy
  • Managed infrastructure
  • Remote analytics
  • Multi-region caching
  • 99.9% Uptime SLA
Enterprise

Cloud Enterprise

For high-volume production

Custom
Contact Sales
  • Everything in Cloud
  • Dedicated resources
  • Volume discounts
  • Custom integrations
  • 24/7 Priority support

Why Choose Cloud Over Proxy?

Pro Proxy

  • Self-Managed
    Runs on your infra (Docker/Binary)
  • 🔒
    Privacy First
    Data never leaves your network
  • $
    Flat Price
    $12/mo regardless of volume
Coming Soon

Cloud

  • Zero Infrastructure
    No Docker or DB to manage
  • Managed Scaling
    We handle the traffic spikes
  • 📊
    Centralized Analytics
    Monitor all your agents in one place

Licenses are per-instance and managed via lemma.nxus.studio. Pro features like Semantic Matching require an active license.

Cloud Early Access

Join the Cloud Early Access

Be among the first to use Lemma Cloud. Zero setup, managed infrastructure, and pay only for what you use.

We'll only send you updates about Lemma Cloud. Unsubscribe anytime.

$19

Starting Price

Cloud Starter plan

Zero

Setup Required

Plug & play in minutes

99.9%

Uptime SLA

Managed infrastructure

✓ Managed Infrastructure✓ Auto-Scaling✓ Pay As You Go

Ready to save on
LLM costs?

Join developers building smarter multi-agent systems with Lemma. Open source, self-hosted, and production-ready.

MIT
Open Source
100%
Self-Hosted
0
Vendor Lock-in