BENCHMARK / PUBLIC-SAFE LAB REPORT

LoCoMo-200 retrieval-only results, with the caveats up front.

Lore Context v0.6 was rerun against a LoCoMo-derived 200-question sample. The result measures top-5 retrieved context and local API latency; it is not generated-answer accuracy and not a universal leaderboard claim.

Same-harness comparisonLoCoMo-200 · 200 QA pairs · retrieval-only
Lore ContextMem0 OSS local
Retrieval hit@5higher is better · 0-50%
Lore47.5%
Mem0 OSS31.5%
P95 latencylower is better · 0-710ms
Lore29.1ms
Mem0 OSS709.8ms
P99 latencylower is better · 0-2088ms
Lore59.0ms
Mem0 OSS2087.8ms
Query errorslower is better · 0-6
Lore6
Mem0 OSS0

Bars share one axis inside each metric. Hit@5: longer is better. Latency and errors: shorter is better. Mem0 is a non-optimized self-hosted OSS run, not an official Mem0 platform benchmark.

External published references + Lore measured resultLore retrieval-only result plus public LoCoMo answer-accuracy reports · mixed metrics · not directly comparable
Lore Context v0.647.5%our retrieval-only run
Mem0 new algo91.6%Mem0 Apr 2026
Zep 30/3080.32%Zep retrieval tradeoff
Memobase v0.0.3775.78%Hindsight benchmark
Letta Filesystem74.0%Letta blog
Mem0 Graph68.44%Mem0 paper
LangMem58.10%Hindsight benchmark

The Lore row is our LoCoMo-200 retrieval-only hit@5 result. The other rows are external LoCoMo answer-accuracy reports. Mixed metrics are shown for market context, not as a direct ranking.

Dataset
LoCoMo-derived 200-question sample from 10 conversations. Raw LoCoMo conversations and questions are not redistributed because the dataset is CC BY-NC 4.0.
Method
Query top 5, then mark a retrieval hit when the gold answer or enough gold-answer tokens appear in the retrieved context. This is retrieval-only hit@5, not answer accuracy.
Same-harness result
Lore returned 47.5% hit@5 at 29.1 ms P95. The local Mem0 OSS comparison returned 31.5% hit@5 at 709.8 ms P95, but its optional NLP/BM25 paths were unavailable and long-chunk ingestion emitted warnings.
External context
Public Mem0, Zep, Letta, Memobase, and LangMem numbers mostly use generated answers plus LLM-as-Judge, so they are listed in the source report as reference points rather than ranked against Lore.
Source report
Read the public-safe report on GitHub ↗
Next
Publish a clean reproduction harness, fix Lore's benchmark 429s, run official Mem0/Zep/Letta paths, and add an LLM-as-Judge pass before making stronger market claims.

Safe public claim: Lore v0.6 has measured local retrieval and latency behavior on LoCoMo-200. Unsafe claim: Lore is SOTA or beats Mem0, Zep, Letta, or Memobase.