AI成长的秘密：如何拿捏“奖”与“罚”的尺度

[LG] Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
[FAIR at Meta]
arxiv.org

信息