203 Episodes

  1. Test-Time RL: Self-Evolving LLMs via Majority Voting Rewards

    Published: 4/25/2025
  2. Tina: Tiny LoRA Reasoning Models

    Published: 4/25/2025
  3. Evaluating large language models in theory of mind tasks

    Published: 4/25/2025
  4. QUEST: Quality Sampling for Machine Translation

    Published: 4/24/2025
  5. Offline Preference Learning via Simulated Trajectory Feedback

    Published: 4/24/2025
  6. Reasoning Elicitation in Language Models via Counterfactual Feedback

    Published: 4/24/2025
  7. Eliciting Human Preferences with Language Models

    Published: 4/24/2025
  8. Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

    Published: 4/24/2025
  9. γ-Bench: Evaluating LLMs in Multi-Agent Games

    Published: 4/24/2025
  10. DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement

    Published: 4/24/2025
  11. Optimal Prediction Sets for Enhanced Human-AI Accuracy

    Published: 4/24/2025
  12. Self-Correction via Reinforcement Learning for Language Models

    Published: 4/24/2025
  13. Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

    Published: 4/24/2025
  14. Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

    Published: 4/24/2025
  15. Iterative Nash Policy Optimization for Language Model Alignment

    Published: 4/24/2025
  16. SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine

    Published: 4/23/2025
  17. Stack AI: Democratizing Enterprise AI Development

    Published: 4/22/2025
  18. Evaluating Modern Recommender Systems: Challenges and Future Directions

    Published: 4/22/2025
  19. AI in the Enterprise: Seven Lessons from Frontier Companies by OpenAI

    Published: 4/22/2025
  20. Discussion: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Published: 4/21/2025

5 / 11

Men know other men best. Women know other women best. And yes, perhaps AIs know other AIs best. AI explains what you should know about this week's AI research progress.