TaherFattahi
commited on
Commit
·
abe6547
1
Parent(s):
793e710
update: readme.md
Browse files- README.md +1 -5
- images/standard-deviation.png +0 -0
README.md
CHANGED
@@ -56,11 +56,7 @@ The reward function for each action is based on two parts:
|
|
56 |
- A higher standard deviation means one column may be much taller or shorter than others, which is undesirable in Tetris.
|
57 |
- By *subtracting* this standard deviation from the occupancy-based reward, the agent is penalized for building unevenly and is encouraged to keep the board as level as possible.
|
58 |
|
59 |
-
|
60 |
-
|
61 |
-
\[
|
62 |
-
\text{Reward} = \text{OccupiedSquares} - \alpha \times \text{StdDev}(\text{ColumnDepths})
|
63 |
-
\]
|
64 |
|
65 |
Where \( \alpha \) is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy.
|
66 |
|
|
|
56 |
- A higher standard deviation means one column may be much taller or shorter than others, which is undesirable in Tetris.
|
57 |
- By *subtracting* this standard deviation from the occupancy-based reward, the agent is penalized for building unevenly and is encouraged to keep the board as level as possible.
|
58 |
|
59 |
+
<img src="images/standard-deviation.png" />
|
|
|
|
|
|
|
|
|
60 |
|
61 |
Where \( \alpha \) is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy.
|
62 |
|
images/standard-deviation.png
ADDED