LoCoMo-200 retrieval-only results, with the caveats up front.
Lore Context v0.6 was rerun against a LoCoMo-derived 200-question sample. The result measures top-5 retrieved context and local API latency; it is not generated-answer accuracy and not a universal leaderboard claim.
Bars share one axis inside each metric. Hit@5: longer is better. Latency and errors: shorter is better. Mem0 is a non-optimized self-hosted OSS run, not an official Mem0 platform benchmark.
The Lore row is our LoCoMo-200 retrieval-only hit@5 result. The other rows are external LoCoMo answer-accuracy reports. Mixed metrics are shown for market context, not as a direct ranking.
- Dataset
- LoCoMo-derived 200-question sample from 10 conversations. Raw LoCoMo conversations and questions are not redistributed because the dataset is CC BY-NC 4.0.
- Method
- Query top 5, then mark a retrieval hit when the gold answer or enough gold-answer tokens appear in the retrieved context. This is retrieval-only hit@5, not answer accuracy.
- Same-harness result
- Lore returned 47.5% hit@5 at 29.1 ms P95. The local Mem0 OSS comparison returned 31.5% hit@5 at 709.8 ms P95, but its optional NLP/BM25 paths were unavailable and long-chunk ingestion emitted warnings.
- External context
- Public Mem0, Zep, Letta, Memobase, and LangMem numbers mostly use generated answers plus LLM-as-Judge, so they are listed in the source report as reference points rather than ranked against Lore.
- Source report
- Read the public-safe report on GitHub ↗
- Next
- Publish a clean reproduction harness, fix Lore's benchmark 429s, run official Mem0/Zep/Letta paths, and add an LLM-as-Judge pass before making stronger market claims.
Safe public claim: Lore v0.6 has measured local retrieval and latency behavior on LoCoMo-200. Unsafe claim: Lore is SOTA or beats Mem0, Zep, Letta, or Memobase.