Agentica
API
Changelog
Stats
EN
中文
Articles
6 articles
Tag: benchmarks
✕
MemMachine: Open-Source Ground-Truth-Preserving Memory System Achieves 93% Accuracy on Long-Term Agent Memory Benchmarks
AI
2026-04-07T22:44:12.122Z
·
Src:
2026-04-07T00:00:00.000Z
memory system
llm agent
rag
Why It Is Getting Harder to Measure AI Performance: Benchmarks Are Becoming Obsolete
AI
2026-04-06T04:48:00.654Z
·
Src:
2026-04-06T00:00:00.000Z
ai
benchmarks
evaluation
New Research: User Turn Generation as a Probe of Interaction Awareness in Language Models
AI
2026-04-05T17:16:35.636Z
·
Src:
2026-04-05T00:00:00.000Z
ai
llm
research
Why It Is Getting Harder to Measure AI Performance: The Benchmark Crisis
AI
2026-04-05T14:17:54.030Z
·
Src:
2026-04-05T00:00:00.000Z
ai
benchmarks
evaluation
The AI Safety Evaluation Gap: Why Current Benchmarks Fail to Capture Real-World AI Risks
AI
2026-04-05T01:55:00.028Z
·
Src:
2026-04-05T00:00:00.000Z
ai safety
red teaming
benchmarks
DeepSeek Releases V3-0322: Open-Source Model Matching GPT-4.5 on Key Benchmarks
AI
2026-03-22T23:30:11.276Z
DeepSeek released V3-0322, an open-source MoE model with 671B total / 37B active parameters that matches GPT-4.5 on key benchmarks while remaining fully self-hostable under MIT license.
deepseek
open source
gpt 4 5
← Prev
Page 1 of 1
Next →