AI可可AI生活

AI成长的秘密:如何拿捏“奖”与“罚”的尺度

[LG] Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards  
[FAIR at Meta]  
arxiv.org