Benchmark Hype vs Real Memory: What Actually Matters When You Choose a Claude Memory Tool

The Memory Tool Gold Rush (And Why You Should Be Skeptical)

Over the past three weeks, the Claude plugin ecosystem has exploded with memory tools. Claude-Mem hit 44,000 stars. A new entrant launched with 23,000 stars in two days. Mem0, Hindsight, Basic Memory -- the category is crowded, and every new tool announces it with a benchmark: "96.6% recall," "SOTA performance," "100% accuracy" (later revised).

This is the software industry's oldest pattern: benchmark theater.

The problem is that none of those benchmarks answer the question you actually need answered: will this memory tool work where I work?

The Real Test: Both Platforms or Bust

Here is something the competitive marketing will not tell you: Claude exists in two execution environments, and they are fundamentally different.

Claude Code (CLI on your Mac) runs commands, installs packages, reads files from your filesystem. It uses settings.json to load MCP servers. Claude Cowork (cloud-based AI agent) does not read your local settings.json. It loads MCP servers exclusively through the plugin system (.plugin files).

This means a memory tool must work in both or it does not actually work for you. It only works when you explicitly choose to use it, which defeats the purpose of a memory tool.

Most of the new tools in the space only work in Claude Code. Some only work as a Claude.ai web feature. One or two claim multi-platform support but have not shipped Cowork support yet.

The hard truth: if your memory tool does not work in both Claude Code and Claude Cowork, you are choosing to abandon it whenever you switch contexts. That is not memory. That is a toy.

Cowork Compatibility Checklist

Before you choose a memory tool, ask these questions:

Does it have a Claude plugin (.plugin file)? Not just an MCP server, but an actual plugin package?
Does it work in Claude Cowork? Can you actually call the memory tools from a Cowork session?
Does it work in Claude Code? Via settings.json MCP server registration?
What happens when you switch between them? If you save something in Code, can you access it from Cowork without re-configuring anything?
Is the GitHub marketplace listing current? Or is the documentation out of date by a month?

Most tools fail questions 2, 3, or both. They either work in one place or the other -- not both.

What Benchmarks Actually Measure (And What They Do Not)

A high benchmark number on LongMemEval or similar tests tells you the tool can recall information accurately under controlled conditions. It does not tell you:

Will it work with my workflow? Benchmarked on synthetic data, not your real Claude sessions. Will it work where I actually use Claude? Both Code and Cowork? Or just one? Will it stay compatible? If the tool is new, has it survived platform updates? Is the team trustworthy? Does the developer stand behind the claims, or do they revise benchmarks down from "100% perfect" after launch feedback? What are the failure modes? How does it behave when things go wrong? Can I actually switch tools later? Is there a way to export my data if I change my mind?

These are the questions that matter. A 96.6% recall score tells you almost nothing about them.

The Obsidian Parallel: Why You Trust One Ecosystem More Than Another

Think about Obsidian. It is not the most feature-rich note-taking tool. It is not backed by a major company. But it has earned trust in the knowledge-worker community because of predictability: your files are markdown. You own them. If Obsidian disappears tomorrow, your notes are still there. There is no vendor lock-in.

Memory tools have the same trust problem, but most of them solve it poorly. Vendor lock-in: your memory lives in someone else's database. Format opacity: you cannot easily export or inspect what they are storing. Platform fragmentation: you need different tools for Code vs Cowork vs other AI interfaces. Trust in the dev team: if they benchmark-hype at launch, what else are they hiding?

LoreConvo was built on the Obsidian principle: all memory lives in a SQLite database on your machine. You own it. You can query it directly. It works in both Claude Code (via settings.json) and Claude Cowork (via plugin). And because it is built on open-source SQLite with no proprietary search backend, there is no surprise OOM failure waiting in production.

Is LoreConvo the fastest-growing memory tool? No. Does it have the most stars? No. But it will still work on your machine in six months when the viral tools are pivoting to v2.

The Better Question

Stop asking: "What is the benchmark score?"

Ask instead: Does it work in both Claude Code and Claude Cowork? If no, keep looking. Who owns my data, and in what format? If it is proprietary, be skeptical. How long has this team been shipping? Day-old projects with viral marketing are risky. What is the failure mode? Not the happy path -- what breaks when it breaks?

Choose the tool that works for your workflow, not the one with the highest benchmark.

Quick Comparison: Cowork Compatibility Edition

Tool	Claude Code	Claude Cowork	Format	Notes
LoreConvo	Yes	Yes	SQLite (local)	Both platforms. Self-hosted. Free tier: 50 sessions.
Claude-Mem	Yes	Not yet	SQLite (local)	Code only. High growth but Cowork support TBD.
MemPalace	Yes	Not yet	ChromaDB + SQLite KG (local)	Code only. Known OOM issues at scale.
Basic Memory	Yes	Not yet	Markdown files (local)	Code only. New FastMCP 3.0 with semantic search.
Hindsight	Yes	Yes (partial)	Proprietary	Recent Claude Code integration. Cloud-dependent.

The Cowork gap is real, and it matters.

Explore LoreConvo on the tools page

Labyrinth Analytics Consulting builds tools for AI-native knowledge work. Learn more at labyrinthanalyticsconsulting.com.