TaherFattahi
commited on
Commit
·
c347cec
1
Parent(s):
abe6547
update: readme.md
Browse files
README.md
CHANGED
@@ -58,7 +58,7 @@ The reward function for each action is based on two parts:
|
|
58 |
|
59 |
<img src="images/standard-deviation.png" />
|
60 |
|
61 |
-
Where
|
62 |
|
63 |
## Installation & Usage
|
64 |
1. Clone this repo or download the source code.
|
|
|
58 |
|
59 |
<img src="images/standard-deviation.png" />
|
60 |
|
61 |
+
Where alpha is a weighting factor (in this case effectively 1, or any scalar you choose) that determines the penalty's intensity. This keeps the board balanced and helps the agent learn a more efficient Tetris strategy.
|
62 |
|
63 |
## Installation & Usage
|
64 |
1. Clone this repo or download the source code.
|