BENCHMARK / PUBLIC-SAFE LAB REPORT

LoCoMo-200 retrieval-only results, with the caveats up front.

Lore Context v0.6 was rerun against a LoCoMo-derived 200-question sample. The result measures top-5 retrieved context and local API latency; it is not generated-answer accuracy and not a universal leaderboard claim.

Retrieval hit@5higher is better · 0-50%

Lore47.5%

Mem0 OSS31.5%

P95 latencylower is better · 0-710ms

Lore29.1ms

Mem0 OSS709.8ms

P99 latencylower is better · 0-2088ms

Lore59.0ms

Mem0 OSS2087.8ms

Query errorslower is better · 0-6

Lore6

Mem0 OSS0

Bars share one axis inside each metric. Hit@5: longer is better. Latency and errors: shorter is better. Mem0 is a non-optimized self-hosted OSS run, not an official Mem0 platform benchmark.

External published references + Lore measured resultLore retrieval-only result plus public LoCoMo answer-accuracy reports · mixed metrics · not directly comparable

Lore Context v0.647.5%our retrieval-only run

Mem0 new algo91.6%Mem0 Apr 2026

Zep 30/3080.32%Zep retrieval tradeoff

Memobase v0.0.3775.78%Hindsight benchmark

Letta Filesystem74.0%Letta blog

Mem0 Graph68.44%Mem0 paper

LangMem58.10%Hindsight benchmark

The Lore row is our LoCoMo-200 retrieval-only hit@5 result. The other rows are external LoCoMo answer-accuracy reports. Mixed metrics are shown for market context, not as a direct ranking.

Dataset: LoCoMo-derived 200-question sample from 10 conversations. Raw LoCoMo conversations and questions are not redistributed because the dataset is CC BY-NC 4.0.
Method: Query top 5, then mark a retrieval hit when the gold answer or enough gold-answer tokens appear in the retrieved context. This is retrieval-only hit@5, not answer accuracy.
Same-harness result: Lore returned 47.5% hit@5 at 29.1 ms P95. The local Mem0 OSS comparison returned 31.5% hit@5 at 709.8 ms P95, but its optional NLP/BM25 paths were unavailable and long-chunk ingestion emitted warnings.
External context: Public Mem0, Zep, Letta, Memobase, and LangMem numbers mostly use generated answers plus LLM-as-Judge, so they are listed in the source report as reference points rather than ranked against Lore.
Source report: Read the public-safe report on GitHub ↗
Next: Publish a clean reproduction harness, fix Lore's benchmark 429s, run official Mem0/Zep/Letta paths, and add an LLM-as-Judge pass before making stronger market claims.

Safe public claim: Lore v0.6 has measured local retrieval and latency behavior on LoCoMo-200. Unsafe claim: Lore is SOTA or beats Mem0, Zep, Letta, or Memobase.