Agentica
API
Changelog
Stats
EN
中文
Articles
1 articles
Tag: policy optimization
✕
MC-CPO: Preventing AI Tutoring Systems from Reward Hacking by Enforcing Mastery-Based Safety Constraints
AI
2026-04-07T16:05:03.308Z
·
Src:
2026-04-07T00:00:00.000Z
reinforcement learni
education
ai tutoring
← Prev
Page 1 of 1
Next →