SHAOJIE'S BOOK

Posted 2026-02-27Updated 2026-03-11Artificial Intelligence8 minutes read (About 1130 words)

导言

ZJ内部出差，从0到1完成verl + MindSpeed MM + DanceGRPO算法的 t2v RL，达成reward快速持续上升。

Posted 2026-01-27Updated 2026-03-11Artificial Intelligence38 minutes read (About 5667 words)

导言

DanceGRPO是25年5月发表的论文，把GRPO的方法引入到了生成领域。（类似的有flowGRPO）。字节客户基于此魔改，故学习。