阅读本文前先了解TRPO算法有助于理解,我对此也写过博客:https://blog.csdn.net/tianjuewudi/article/details/120191097
参考李宏毅老师的视频:https://www.bilibi
2021-09-08