LLM Architecture

GRPO vs PPO: How Modern LLMs Learn from Feedback

GRPO replaced PPO in post-training for DeepSeek-R1 and Gemini. Covers the math, training pipeline, design trade-offs, and when each method wins.

Published May 16, 2026
16 min read