Commit
·
04fbc23
1
Parent(s):
01e0e56
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,86 +0,0 @@
|
|
1 |
-
---
|
2 |
-
library_name: stable-baselines3
|
3 |
-
tags:
|
4 |
-
- PandaReach-v1
|
5 |
-
- deep-reinforcement-learning
|
6 |
-
- reinforcement-learning
|
7 |
-
- stable-baselines3
|
8 |
-
model-index:
|
9 |
-
- name: TQC
|
10 |
-
results:
|
11 |
-
- task:
|
12 |
-
type: reinforcement-learning
|
13 |
-
name: reinforcement-learning
|
14 |
-
dataset:
|
15 |
-
name: PandaReach-v1
|
16 |
-
type: PandaReach-v1
|
17 |
-
metrics:
|
18 |
-
- type: mean_reward
|
19 |
-
value: -2.10 +/- 0.70
|
20 |
-
name: mean_reward
|
21 |
-
verified: false
|
22 |
-
---
|
23 |
-
|
24 |
-
# **TQC** Agent playing **PandaReach-v1**
|
25 |
-
This is a trained model of a **TQC** agent playing **PandaReach-v1**
|
26 |
-
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
|
27 |
-
and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
|
28 |
-
|
29 |
-
The RL Zoo is a training framework for Stable Baselines3
|
30 |
-
reinforcement learning agents,
|
31 |
-
with hyperparameter optimization and pre-trained agents included.
|
32 |
-
|
33 |
-
## Usage (with SB3 RL Zoo)
|
34 |
-
|
35 |
-
RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
|
36 |
-
SB3: https://github.com/DLR-RM/stable-baselines3<br/>
|
37 |
-
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
|
38 |
-
|
39 |
-
Install the RL Zoo (with SB3 and SB3-Contrib):
|
40 |
-
```bash
|
41 |
-
pip install rl_zoo3
|
42 |
-
```
|
43 |
-
|
44 |
-
```
|
45 |
-
# Download model and save it into the logs/ folder
|
46 |
-
python -m rl_zoo3.load_from_hub --algo tqc --env PandaReach-v1 -orga qgallouedec -f logs/
|
47 |
-
python -m rl_zoo3.enjoy --algo tqc --env PandaReach-v1 -f logs/
|
48 |
-
```
|
49 |
-
|
50 |
-
If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
|
51 |
-
```
|
52 |
-
python -m rl_zoo3.load_from_hub --algo tqc --env PandaReach-v1 -orga qgallouedec -f logs/
|
53 |
-
python -m rl_zoo3.enjoy --algo tqc --env PandaReach-v1 -f logs/
|
54 |
-
```
|
55 |
-
|
56 |
-
## Training (with the RL Zoo)
|
57 |
-
```
|
58 |
-
python -m rl_zoo3.train --algo tqc --env PandaReach-v1 -f logs/
|
59 |
-
# Upload the model and generate video (when possible)
|
60 |
-
python -m rl_zoo3.push_to_hub --algo tqc --env PandaReach-v1 -f logs/ -orga qgallouedec
|
61 |
-
```
|
62 |
-
|
63 |
-
## Hyperparameters
|
64 |
-
```python
|
65 |
-
OrderedDict([('batch_size', 256),
|
66 |
-
('buffer_size', 1000000),
|
67 |
-
('ent_coef', 'auto'),
|
68 |
-
('env_wrapper', 'sb3_contrib.common.wrappers.TimeFeatureWrapper'),
|
69 |
-
('gamma', 0.95),
|
70 |
-
('learning_rate', 0.001),
|
71 |
-
('learning_starts', 1000),
|
72 |
-
('n_timesteps', 20000.0),
|
73 |
-
('normalize', True),
|
74 |
-
('policy', 'MultiInputPolicy'),
|
75 |
-
('policy_kwargs', 'dict(net_arch=[64, 64], n_critics=1)'),
|
76 |
-
('replay_buffer_class', 'HerReplayBuffer'),
|
77 |
-
('replay_buffer_kwargs',
|
78 |
-
"dict( online_sampling=True, goal_selection_strategy='future', "
|
79 |
-
'n_sampled_goal=4 )'),
|
80 |
-
('normalize_kwargs', {'norm_obs': True, 'norm_reward': False})])
|
81 |
-
```
|
82 |
-
|
83 |
-
# Environment Arguments
|
84 |
-
```python
|
85 |
-
{'render': True}
|
86 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|