Self-Hosted • Pro Proxy • Cloud Waitlist

Stop paying for the
same LLM call twice

Zero-code semantic cache for multi-agent systems. Share LLM responses across agents, reduce costs by 65%, and get 35x faster responses.

Get Started →Documentation

terminal
$ lemma-proxy start
Running multi-agent demo...
Agent 1: "What is the capital of France?""Capital of France?"
→ LLM call: 1,247ms | Cost: $0.002
→ Response cached ✓
Agent 2: "What's the capital city of France?""Capital city?"
→ Cache hit: 35ms | Cost: $0.000
✓ 35x faster • 100% cost savings

The Problem

Multi-agent systems make redundant LLM calls. When you have 10 agents asking similar questions, you're paying for 10 separate API calls.

Multiplied API costs

Slow response times

Wasted compute resources

Rate limit headaches

Without Lemma:

Agent 1

$$$

Agent 2

$$$

Agent 3

$$$

Agent 4

$$$

The Solution

Lemma caches the first LLM response and serves it instantly to all agents asking similar questions. Semantic matching means you don't need exact queries.

First call → LLM (cached)

Similar calls → Cache (instant)

65% cost reduction

35x faster responses

With Lemma:

Agent 1

Agent 2

✓

Agent 3

✓

Agent 4

✓

Real Performance Metrics

Measured in production multi-agent systems

35x

Faster

1,247ms → 35ms average response time

65%

Cost Reduction

Save on redundant LLM API calls

100%

Cache Hit Rate

On semantically similar queries

<50ms

Latency

Average cache response time

* Benchmarked with GPT-4 on a system with 10 concurrent agents

How It Works

Three simple steps to save time and money

Agent 1 asks a question

First agent makes an LLM call. The response is generated and cached with semantic embeddings.

Response is cached

Lemma stores the response with vector embeddings in ChromaDB for semantic matching.

Similar queries hit cache

When Agent 2 asks a similar question, Lemma returns the cached response instantly. No LLM call needed.

Real-time Flow

Watch how queries are processed

Agent 1

Query: "What is AI?"

→

LLM (1,247ms)

→

Cache ✓

Agent 2

Query: "Explain AI"

→

Cache Hit (35ms) ⚡

Agent 3

Query: "What's AI?"

→

Cache Hit (32ms) ⚡

Result: 2 out of 3 queries served from cache • 65% cost savings

Built for Production

Everything you need for enterprise-grade caching

PRO

Semantic Matching

No exact match required. Lemma understands query intent using vector embeddings.

PRO

Zero-Code Proxy

Connect any agent system by simply changing the API base URL. No SDK required.

Multi-Agent Orchestration

Built for systems with multiple agents. Share cache across all agents seamlessly.

Real-time WebSocket

Low-latency communication between agents and cache server. Sub-50ms responses.

Self-Hosted Core

Publicly auditable engine. Inspect the code on npm, contribute, or run on your own hardware.

LLM Agnostic

Works with any LLM provider: OpenAI, Anthropic, local models, or custom APIs.

Perfect For

Multi-Agent Systems

CrewAI, AutoGPT, LangChain agents working together? Lemma ensures they share knowledge efficiently.

CrewAIAutoGPTLangChain

Production Chatbots

Multiple chatbot instances handling similar queries? Cache common responses across all instances.

Customer SupportFAQ Bots

Development & Testing

Testing agent behaviors? Stop wasting API credits on repetitive test queries.

Unit TestsIntegration Tests

Repetitive Workflows

Applications with predictable query patterns benefit from near-instant cache hits.

Data AnalysisReport Generation

Zero-Code Integration

Get up and running in under 2 minutes

Install Lemma Proxy

npm install @nxuss/lemma

Activate Pro (Optional)

lemma-proxy activate YOUR_LICENSE_KEY

Required for Semantic Matching and Zero-code proxy features.

Start the Proxy

lemma-proxy start

Proxy starts on localhost:8080

Zero-Code Usage

// No SDK needed! Just point your OpenAI base URL to Lemma
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'http://localhost:8080/v1' 
});

// Lemma intercepts, caches semantically, and saves you $$
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'What is the capital of France?' }]
});

* Works with any library (OpenAI, LangChain, CrewAI) by just changing the baseURL.

Choose Your Path

Start with Community, power up with Pro Proxy, or scale with Cloud

Self-Hosted

Community

Self-hosted core for developers

Free

Install via npm →

Exact match caching
Local data privacy
Manual SDK integration
Community support
No Semantic matching
No Zero-code proxy
No Web Dashboard

Recommended

Pro Proxy

The universal zero-code proxy

$12/month

$120 / year (Save 17% - 2 months free)

Semantic matching enabled
Zero-code CLI integration
Local Web Dashboard
Infinite local history
Multi-agent sharing
Email support

Coming Soon

Cloud Starter

Managed infra, zero maintenance

$19/month

Join Waitlist →

Everything in Pro Proxy
Managed infrastructure
Remote analytics
Multi-region caching
99.9% Uptime SLA

Enterprise

Cloud Enterprise

For high-volume production

Custom

Contact Sales →

Everything in Cloud
Dedicated resources
Volume discounts
Custom integrations
24/7 Priority support

Why Choose Cloud Over Proxy?

Pro Proxy

⚙
Self-Managed
Runs on your infra (Docker/Binary)
🔒
Privacy First
Data never leaves your network
$
Flat Price
$12/mo regardless of volume

Coming Soon

Cloud

⚡
Zero Infrastructure
No Docker or DB to manage
✓
Managed Scaling
We handle the traffic spikes
📊
Centralized Analytics
Monitor all your agents in one place

Licenses are per-instance and managed via lemma.nxus.studio. Pro features like Semantic Matching require an active license.

Cloud Early Access

Join the Cloud Early Access

Be among the first to use Lemma Cloud. Zero setup, managed infrastructure, and pay only for what you use.

$19

Starting Price

Cloud Starter plan

Zero

Setup Required

Plug & play in minutes

99.9%

Uptime SLA

Managed infrastructure

✓ Managed Infrastructure✓ Auto-Scaling✓ Pay As You Go

Ready to save on
LLM costs?

Join developers building smarter multi-agent systems with Lemma. Open source, self-hosted, and production-ready.

Get Started →Read Docs

MIT

Open Source

100%

Self-Hosted

Vendor Lock-in

Stop paying for thesame LLM call twice

The Problem

The Solution

Real Performance Metrics

How It Works

Agent 1 asks a question

Response is cached

Similar queries hit cache

Real-time Flow

Built for Production

Semantic Matching

Zero-Code Proxy

Multi-Agent Orchestration

Real-time WebSocket

Self-Hosted Core

LLM Agnostic

Perfect For

Multi-Agent Systems

Production Chatbots

Development & Testing

Repetitive Workflows

Zero-Code Integration

Install Lemma Proxy

Activate Pro (Optional)

Start the Proxy

Zero-Code Usage

Choose Your Path

Community

Pro Proxy

Cloud Starter

Cloud Enterprise

Why Choose Cloud Over Proxy?

Pro Proxy

Cloud

Join the Cloud Early Access

Starting Price

Setup Required

Uptime SLA

Ready to save onLLM costs?

Stop paying for the
same LLM call twice

Ready to save on
LLM costs?