PPO Snake AI Report & weights after training
简介 Intro
本实验旨在通过深度强化学习算法(DQN 和 PPO)训练一个能够玩贪吃蛇游戏的人工智能代理。实验中,代理(即贪吃蛇)在游戏世界中行动,状态包括蛇头的坐标、蛇身的坐标列表、蛇头的方向、食物的坐标等。奖励机制基于蛇吃食物、获胜或失败的得分。实验使用 PyGame 框架进行环境模拟,并通过调整奖励参数(如吃食物的奖励保持不变,而死亡的惩罚逐渐增加)来观察训练效果。结果显示,增加死亡的惩罚可以提高平均得分,而较低的死亡惩罚策略虽然在训练过程中表现不佳,但在实际演示中表现良好。未来的工作将尝试通过增加对蛇身曲折的惩罚来优化蛇的移动路径,并将保存的模型集成到 C++框架中。
This experiment aims to train an artificial intelligence agent to play the Snake game using deep reinforcement learning algorithms(DQN and PPO).The agent(i.e.,the snake)operates within a game environment,with states including the coordinates of the snake's head,the coordinate list of the snake's body,the direction of the snake's head,and the coordinates of the food.The reward mechanism is based on scores for eating food,winning,or losing.The experiment uses the PyGame framework for environment simulation and adjusts reward parameters(such as keeping the reward for eating food constant while gradually increasing the penalty for death)to observe training outcomes.The results show that increasing the penalty for death leads to higher average scores,while a strategy with a lower death penalty performs poorly during training but well in demonstrations.Future work will attempt to optimize the snake's movement by adding penalties for excessive zigzagging and integrating the saved model into a C++framework.
使用 Usage
模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('Genius-Society/SnakeAI')
维护 Maintenance
git clone [email protected]:Genius-Society/SnakeAI
cd SnakeAI
训练曲线 Training curve
Round | 1 | 2 | 3 |
---|---|---|---|
Traing curve | |||
Evaluation | |||
Reward_eat | +2.0 | +2.0 | +2.0 |
Reward_hit | -0.5 | -1.0 | -1.5 |
Reward_bit | -0.8 | -1.5 | -2.0 |
Avg record | ≈19 | ≈23 | ≈28 |
镜像 Mirror
https://www.modelscope.cn/models/Genius-Society/SnakeAI