TaherFattahi
/

tetris-neural-network-Q-learning

Reinforcement Learning

Model card Files Files and versions Community

TaherFattahi commited on 16 days ago

Commit

abe6547

·

1 Parent(s): 793e710

update: readme.md

Files changed (2) hide show

README.md +1 -5
images/standard-deviation.png +0 -0

README.md CHANGED Viewed

@@ -56,11 +56,7 @@ The reward function for each action is based on two parts:
    - A higher standard deviation means one column may be much taller or shorter than others, which is undesirable in Tetris.
    - By *subtracting* this standard deviation from the occupancy-based reward, the agent is penalized for building unevenly and is encouraged to keep the board as level as possible.
-In other words:
-\[
-\text{Reward} = \text{OccupiedSquares} - \alpha \times \text{StdDev}(\text{ColumnDepths})
-\]
 Where \( \alpha \) is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy.

    - A higher standard deviation means one column may be much taller or shorter than others, which is undesirable in Tetris.
    - By *subtracting* this standard deviation from the occupancy-based reward, the agent is penalized for building unevenly and is encouraged to keep the board as level as possible.
+<img src="images/standard-deviation.png" />
 Where \( \alpha \) is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy.

images/standard-deviation.png ADDED Viewed