Posted 2026-02-27Updated 2026-03-11Artificial Intelligence8 minutes read (About 1130 words)Business Trip: 2601-2602 verl + DanceGRPO 导言 ZJ内部出差,从0到1完成verl + MindSpeed MM + DanceGRPO算法的 t2v RL,达成reward快速持续上升。 Read more
2026-02-05The Mechanics of RL: How Inference Sampling Shapes the Probability LandscapeArtificial Intelligence