TaherFattahi commited on
Commit
abe6547
·
1 Parent(s): 793e710

update: readme.md

Browse files
Files changed (2) hide show
  1. README.md +1 -5
  2. images/standard-deviation.png +0 -0
README.md CHANGED
@@ -56,11 +56,7 @@ The reward function for each action is based on two parts:
56
  - A higher standard deviation means one column may be much taller or shorter than others, which is undesirable in Tetris.
57
  - By *subtracting* this standard deviation from the occupancy-based reward, the agent is penalized for building unevenly and is encouraged to keep the board as level as possible.
58
 
59
- In other words:
60
-
61
- \[
62
- \text{Reward} = \text{OccupiedSquares} - \alpha \times \text{StdDev}(\text{ColumnDepths})
63
- \]
64
 
65
  Where \( \alpha \) is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy.
66
 
 
56
  - A higher standard deviation means one column may be much taller or shorter than others, which is undesirable in Tetris.
57
  - By *subtracting* this standard deviation from the occupancy-based reward, the agent is penalized for building unevenly and is encouraged to keep the board as level as possible.
58
 
59
+ <img src="images/standard-deviation.png" />
 
 
 
 
60
 
61
  Where \( \alpha \) is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy.
62
 
images/standard-deviation.png ADDED