Posted 2026-02-27Updated 2026-03-11Artificial Intelligence8 minutes read (About 1130 words)Business Trip: 2601-2602 verl + DanceGRPO 导言 ZJ内部出差,从0到1完成verl + MindSpeed MM + DanceGRPO算法的 t2v RL,达成reward快速持续上升。 Read more
Posted 2026-01-27Updated 2026-03-11Artificial Intelligence38 minutes read (About 5667 words)AI Post Traning: DanceGRPO 导言 DanceGRPO是25年5月发表的论文,把GRPO的方法引入到了生成领域。(类似的有flowGRPO)。字节客户基于此魔改,故学习。 Read more
2026-02-05The Mechanics of RL: How Inference Sampling Shapes the Probability LandscapeArtificial Intelligence