lorenzorod88's picture
Upload folder using huggingface_hub
249c996 verified
[2024-09-11 16:15:08,895][00309] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-09-11 16:15:08,898][00309] Rollout worker 0 uses device cpu
[2024-09-11 16:15:08,899][00309] Rollout worker 1 uses device cpu
[2024-09-11 16:15:08,901][00309] Rollout worker 2 uses device cpu
[2024-09-11 16:15:08,902][00309] Rollout worker 3 uses device cpu
[2024-09-11 16:15:08,903][00309] Rollout worker 4 uses device cpu
[2024-09-11 16:15:08,905][00309] Rollout worker 5 uses device cpu
[2024-09-11 16:15:08,906][00309] Rollout worker 6 uses device cpu
[2024-09-11 16:15:08,907][00309] Rollout worker 7 uses device cpu
[2024-09-11 16:15:09,049][00309] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-11 16:15:09,051][00309] InferenceWorker_p0-w0: min num requests: 2
[2024-09-11 16:15:09,084][00309] Starting all processes...
[2024-09-11 16:15:09,085][00309] Starting process learner_proc0
[2024-09-11 16:15:09,134][00309] Starting all processes...
[2024-09-11 16:15:09,142][00309] Starting process inference_proc0-0
[2024-09-11 16:15:09,143][00309] Starting process rollout_proc0
[2024-09-11 16:15:09,143][00309] Starting process rollout_proc1
[2024-09-11 16:15:09,143][00309] Starting process rollout_proc2
[2024-09-11 16:15:09,143][00309] Starting process rollout_proc3
[2024-09-11 16:15:09,143][00309] Starting process rollout_proc4
[2024-09-11 16:15:09,143][00309] Starting process rollout_proc5
[2024-09-11 16:15:09,143][00309] Starting process rollout_proc6
[2024-09-11 16:15:09,143][00309] Starting process rollout_proc7
[2024-09-11 16:15:20,175][02996] Worker 3 uses CPU cores [1]
[2024-09-11 16:15:20,251][02980] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-11 16:15:20,251][02980] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-09-11 16:15:20,307][02980] Num visible devices: 1
[2024-09-11 16:15:20,341][02980] Starting seed is not provided
[2024-09-11 16:15:20,342][02980] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-11 16:15:20,342][02980] Initializing actor-critic model on device cuda:0
[2024-09-11 16:15:20,343][02980] RunningMeanStd input shape: (3, 72, 128)
[2024-09-11 16:15:20,344][02980] RunningMeanStd input shape: (1,)
[2024-09-11 16:15:20,385][02994] Worker 1 uses CPU cores [1]
[2024-09-11 16:15:20,430][02980] ConvEncoder: input_channels=3
[2024-09-11 16:15:20,581][02999] Worker 5 uses CPU cores [1]
[2024-09-11 16:15:20,599][02997] Worker 2 uses CPU cores [0]
[2024-09-11 16:15:20,613][03000] Worker 6 uses CPU cores [0]
[2024-09-11 16:15:20,639][03001] Worker 7 uses CPU cores [1]
[2024-09-11 16:15:20,646][02995] Worker 0 uses CPU cores [0]
[2024-09-11 16:15:20,641][02998] Worker 4 uses CPU cores [0]
[2024-09-11 16:15:20,673][02993] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-11 16:15:20,673][02993] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-09-11 16:15:20,691][02993] Num visible devices: 1
[2024-09-11 16:15:20,776][02980] Conv encoder output size: 512
[2024-09-11 16:15:20,776][02980] Policy head output size: 512
[2024-09-11 16:15:20,793][02980] Created Actor Critic model with architecture:
[2024-09-11 16:15:20,793][02980] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-09-11 16:15:25,073][02980] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-09-11 16:15:25,074][02980] No checkpoints found
[2024-09-11 16:15:25,075][02980] Did not load from checkpoint, starting from scratch!
[2024-09-11 16:15:25,075][02980] Initialized policy 0 weights for model version 0
[2024-09-11 16:15:25,078][02980] LearnerWorker_p0 finished initialization!
[2024-09-11 16:15:25,080][02980] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-11 16:15:25,278][02993] RunningMeanStd input shape: (3, 72, 128)
[2024-09-11 16:15:25,280][02993] RunningMeanStd input shape: (1,)
[2024-09-11 16:15:25,297][02993] ConvEncoder: input_channels=3
[2024-09-11 16:15:25,403][02993] Conv encoder output size: 512
[2024-09-11 16:15:25,403][02993] Policy head output size: 512
[2024-09-11 16:15:25,518][00309] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-11 16:15:26,939][00309] Inference worker 0-0 is ready!
[2024-09-11 16:15:26,941][00309] All inference workers are ready! Signal rollout workers to start!
[2024-09-11 16:15:27,026][03000] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:15:27,044][02995] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:15:27,050][02997] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:15:27,056][02998] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:15:27,057][02996] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:15:27,064][03001] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:15:27,067][02999] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:15:27,075][02994] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:15:28,441][02994] Decorrelating experience for 0 frames...
[2024-09-11 16:15:28,441][02995] Decorrelating experience for 0 frames...
[2024-09-11 16:15:28,521][02996] Decorrelating experience for 0 frames...
[2024-09-11 16:15:29,044][00309] Heartbeat connected on Batcher_0
[2024-09-11 16:15:29,047][00309] Heartbeat connected on LearnerWorker_p0
[2024-09-11 16:15:29,086][00309] Heartbeat connected on InferenceWorker_p0-w0
[2024-09-11 16:15:29,470][02994] Decorrelating experience for 32 frames...
[2024-09-11 16:15:29,629][02996] Decorrelating experience for 32 frames...
[2024-09-11 16:15:29,639][02995] Decorrelating experience for 32 frames...
[2024-09-11 16:15:29,718][02998] Decorrelating experience for 0 frames...
[2024-09-11 16:15:30,303][02998] Decorrelating experience for 32 frames...
[2024-09-11 16:15:30,523][00309] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-11 16:15:30,702][02994] Decorrelating experience for 64 frames...
[2024-09-11 16:15:30,721][02999] Decorrelating experience for 0 frames...
[2024-09-11 16:15:31,187][02998] Decorrelating experience for 64 frames...
[2024-09-11 16:15:31,920][02999] Decorrelating experience for 32 frames...
[2024-09-11 16:15:31,934][02996] Decorrelating experience for 64 frames...
[2024-09-11 16:15:32,008][02994] Decorrelating experience for 96 frames...
[2024-09-11 16:15:32,191][00309] Heartbeat connected on RolloutWorker_w1
[2024-09-11 16:15:32,879][02996] Decorrelating experience for 96 frames...
[2024-09-11 16:15:33,064][02995] Decorrelating experience for 64 frames...
[2024-09-11 16:15:33,180][02998] Decorrelating experience for 96 frames...
[2024-09-11 16:15:33,247][00309] Heartbeat connected on RolloutWorker_w3
[2024-09-11 16:15:33,312][00309] Heartbeat connected on RolloutWorker_w4
[2024-09-11 16:15:33,648][02995] Decorrelating experience for 96 frames...
[2024-09-11 16:15:33,779][00309] Heartbeat connected on RolloutWorker_w0
[2024-09-11 16:15:34,189][02999] Decorrelating experience for 64 frames...
[2024-09-11 16:15:34,517][02999] Decorrelating experience for 96 frames...
[2024-09-11 16:15:34,586][00309] Heartbeat connected on RolloutWorker_w5
[2024-09-11 16:15:35,518][00309] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 24. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-11 16:15:38,389][02980] Signal inference workers to stop experience collection...
[2024-09-11 16:15:38,404][02993] InferenceWorker_p0-w0: stopping experience collection
[2024-09-11 16:15:40,193][02980] Signal inference workers to resume experience collection...
[2024-09-11 16:15:40,196][02993] InferenceWorker_p0-w0: resuming experience collection
[2024-09-11 16:15:40,518][00309] Fps is (10 sec: 409.8, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 162.5. Samples: 2438. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-09-11 16:15:40,522][00309] Avg episode reward: [(0, '2.712')]
[2024-09-11 16:15:45,518][00309] Fps is (10 sec: 2457.6, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 323.8. Samples: 6476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:15:45,522][00309] Avg episode reward: [(0, '3.622')]
[2024-09-11 16:15:50,518][00309] Fps is (10 sec: 3276.8, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 331.1. Samples: 8278. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:15:50,520][00309] Avg episode reward: [(0, '3.964')]
[2024-09-11 16:15:51,203][02993] Updated weights for policy 0, policy_version 10 (0.0362)
[2024-09-11 16:15:55,518][00309] Fps is (10 sec: 3276.9, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 455.5. Samples: 13664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:15:55,525][00309] Avg episode reward: [(0, '4.436')]
[2024-09-11 16:16:00,458][02993] Updated weights for policy 0, policy_version 20 (0.0018)
[2024-09-11 16:16:00,518][00309] Fps is (10 sec: 4505.6, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 579.0. Samples: 20266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:16:00,525][00309] Avg episode reward: [(0, '4.537')]
[2024-09-11 16:16:05,521][00309] Fps is (10 sec: 3685.3, 60 sec: 2355.0, 300 sec: 2355.0). Total num frames: 94208. Throughput: 0: 565.8. Samples: 22632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:16:05,523][00309] Avg episode reward: [(0, '4.641')]
[2024-09-11 16:16:10,518][00309] Fps is (10 sec: 3276.8, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 114688. Throughput: 0: 616.7. Samples: 27752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:16:10,520][00309] Avg episode reward: [(0, '4.569')]
[2024-09-11 16:16:10,525][02980] Saving new best policy, reward=4.569!
[2024-09-11 16:16:12,242][02993] Updated weights for policy 0, policy_version 30 (0.0014)
[2024-09-11 16:16:15,518][00309] Fps is (10 sec: 4097.2, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 759.6. Samples: 34180. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:16:15,520][00309] Avg episode reward: [(0, '4.417')]
[2024-09-11 16:16:20,518][00309] Fps is (10 sec: 3686.4, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 817.4. Samples: 36808. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-09-11 16:16:20,522][00309] Avg episode reward: [(0, '4.334')]
[2024-09-11 16:16:23,620][02993] Updated weights for policy 0, policy_version 40 (0.0026)
[2024-09-11 16:16:25,518][00309] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 869.4. Samples: 41562. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:16:25,524][00309] Avg episode reward: [(0, '4.280')]
[2024-09-11 16:16:30,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3208.8, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 923.6. Samples: 48038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:16:30,522][00309] Avg episode reward: [(0, '4.446')]
[2024-09-11 16:16:33,369][02993] Updated weights for policy 0, policy_version 50 (0.0014)
[2024-09-11 16:16:35,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 951.1. Samples: 51078. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:16:35,520][00309] Avg episode reward: [(0, '4.474')]
[2024-09-11 16:16:40,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3003.7). Total num frames: 225280. Throughput: 0: 929.9. Samples: 55510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:16:40,520][00309] Avg episode reward: [(0, '4.725')]
[2024-09-11 16:16:40,525][02980] Saving new best policy, reward=4.725!
[2024-09-11 16:16:44,759][02993] Updated weights for policy 0, policy_version 60 (0.0015)
[2024-09-11 16:16:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 925.1. Samples: 61896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:16:45,520][00309] Avg episode reward: [(0, '4.551')]
[2024-09-11 16:16:50,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 943.0. Samples: 65064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:16:50,525][00309] Avg episode reward: [(0, '4.390')]
[2024-09-11 16:16:55,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 922.3. Samples: 69256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:16:55,521][00309] Avg episode reward: [(0, '4.516')]
[2024-09-11 16:16:56,456][02993] Updated weights for policy 0, policy_version 70 (0.0020)
[2024-09-11 16:17:00,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3190.6). Total num frames: 303104. Throughput: 0: 924.5. Samples: 75784. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:17:00,520][00309] Avg episode reward: [(0, '4.634')]
[2024-09-11 16:17:05,521][00309] Fps is (10 sec: 4094.8, 60 sec: 3822.9, 300 sec: 3235.7). Total num frames: 323584. Throughput: 0: 937.5. Samples: 78998. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:17:05,526][00309] Avg episode reward: [(0, '4.464')]
[2024-09-11 16:17:05,541][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_323584.pth...
[2024-09-11 16:17:06,998][02993] Updated weights for policy 0, policy_version 80 (0.0012)
[2024-09-11 16:17:10,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3198.8). Total num frames: 335872. Throughput: 0: 932.8. Samples: 83538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:17:10,523][00309] Avg episode reward: [(0, '4.453')]
[2024-09-11 16:17:15,518][00309] Fps is (10 sec: 3277.8, 60 sec: 3686.4, 300 sec: 3239.6). Total num frames: 356352. Throughput: 0: 926.4. Samples: 89728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:17:15,526][00309] Avg episode reward: [(0, '4.498')]
[2024-09-11 16:17:17,398][02993] Updated weights for policy 0, policy_version 90 (0.0020)
[2024-09-11 16:17:20,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 376832. Throughput: 0: 931.2. Samples: 92982. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:17:20,523][00309] Avg episode reward: [(0, '4.421')]
[2024-09-11 16:17:25,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 393216. Throughput: 0: 935.7. Samples: 97618. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:17:25,520][00309] Avg episode reward: [(0, '4.532')]
[2024-09-11 16:17:29,155][02993] Updated weights for policy 0, policy_version 100 (0.0014)
[2024-09-11 16:17:30,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3309.6). Total num frames: 413696. Throughput: 0: 928.4. Samples: 103672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:17:30,524][00309] Avg episode reward: [(0, '4.466')]
[2024-09-11 16:17:35,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3339.8). Total num frames: 434176. Throughput: 0: 928.9. Samples: 106866. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:17:35,528][00309] Avg episode reward: [(0, '4.522')]
[2024-09-11 16:17:40,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3307.1). Total num frames: 446464. Throughput: 0: 948.0. Samples: 111916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:17:40,520][00309] Avg episode reward: [(0, '4.448')]
[2024-09-11 16:17:40,629][02993] Updated weights for policy 0, policy_version 110 (0.0019)
[2024-09-11 16:17:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3364.6). Total num frames: 471040. Throughput: 0: 930.6. Samples: 117662. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:17:45,525][00309] Avg episode reward: [(0, '4.633')]
[2024-09-11 16:17:50,203][02993] Updated weights for policy 0, policy_version 120 (0.0024)
[2024-09-11 16:17:50,518][00309] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 931.1. Samples: 120896. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:17:50,523][00309] Avg episode reward: [(0, '4.754')]
[2024-09-11 16:17:50,525][02980] Saving new best policy, reward=4.754!
[2024-09-11 16:17:55,523][00309] Fps is (10 sec: 3275.2, 60 sec: 3686.1, 300 sec: 3358.6). Total num frames: 503808. Throughput: 0: 943.0. Samples: 125976. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:17:55,525][00309] Avg episode reward: [(0, '4.926')]
[2024-09-11 16:17:55,533][02980] Saving new best policy, reward=4.926!
[2024-09-11 16:18:00,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3382.5). Total num frames: 524288. Throughput: 0: 926.8. Samples: 131436. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:18:00,524][00309] Avg episode reward: [(0, '4.990')]
[2024-09-11 16:18:00,526][02980] Saving new best policy, reward=4.990!
[2024-09-11 16:18:01,880][02993] Updated weights for policy 0, policy_version 130 (0.0012)
[2024-09-11 16:18:05,518][00309] Fps is (10 sec: 4098.0, 60 sec: 3686.6, 300 sec: 3404.8). Total num frames: 544768. Throughput: 0: 920.8. Samples: 134418. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:18:05,521][00309] Avg episode reward: [(0, '4.883')]
[2024-09-11 16:18:10,518][00309] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3400.9). Total num frames: 561152. Throughput: 0: 942.5. Samples: 140030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:18:10,524][00309] Avg episode reward: [(0, '5.164')]
[2024-09-11 16:18:10,530][02980] Saving new best policy, reward=5.164!
[2024-09-11 16:18:13,455][02993] Updated weights for policy 0, policy_version 140 (0.0023)
[2024-09-11 16:18:15,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3421.4). Total num frames: 581632. Throughput: 0: 924.0. Samples: 145254. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:18:15,520][00309] Avg episode reward: [(0, '5.115')]
[2024-09-11 16:18:20,518][00309] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3440.6). Total num frames: 602112. Throughput: 0: 924.3. Samples: 148460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:18:20,525][00309] Avg episode reward: [(0, '4.737')]
[2024-09-11 16:18:23,714][02993] Updated weights for policy 0, policy_version 150 (0.0020)
[2024-09-11 16:18:25,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3436.1). Total num frames: 618496. Throughput: 0: 938.0. Samples: 154128. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:18:25,520][00309] Avg episode reward: [(0, '4.775')]
[2024-09-11 16:18:30,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3431.8). Total num frames: 634880. Throughput: 0: 922.1. Samples: 159158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:18:30,520][00309] Avg episode reward: [(0, '4.643')]
[2024-09-11 16:18:34,587][02993] Updated weights for policy 0, policy_version 160 (0.0013)
[2024-09-11 16:18:35,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3449.3). Total num frames: 655360. Throughput: 0: 922.4. Samples: 162406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:18:35,524][00309] Avg episode reward: [(0, '5.005')]
[2024-09-11 16:18:40,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3444.8). Total num frames: 671744. Throughput: 0: 938.0. Samples: 168182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:18:40,529][00309] Avg episode reward: [(0, '5.026')]
[2024-09-11 16:18:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3461.1). Total num frames: 692224. Throughput: 0: 922.9. Samples: 172966. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:18:45,523][00309] Avg episode reward: [(0, '5.000')]
[2024-09-11 16:18:46,348][02993] Updated weights for policy 0, policy_version 170 (0.0012)
[2024-09-11 16:18:50,518][00309] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3476.6). Total num frames: 712704. Throughput: 0: 926.4. Samples: 176108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:18:50,523][00309] Avg episode reward: [(0, '4.941')]
[2024-09-11 16:18:55,518][00309] Fps is (10 sec: 3686.3, 60 sec: 3755.0, 300 sec: 3471.8). Total num frames: 729088. Throughput: 0: 939.8. Samples: 182322. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:18:55,522][00309] Avg episode reward: [(0, '4.932')]
[2024-09-11 16:18:57,327][02993] Updated weights for policy 0, policy_version 180 (0.0012)
[2024-09-11 16:19:00,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3467.3). Total num frames: 745472. Throughput: 0: 924.0. Samples: 186834. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:19:00,519][00309] Avg episode reward: [(0, '5.241')]
[2024-09-11 16:19:00,522][02980] Saving new best policy, reward=5.241!
[2024-09-11 16:19:05,518][00309] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3481.6). Total num frames: 765952. Throughput: 0: 924.5. Samples: 190062. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:19:05,522][00309] Avg episode reward: [(0, '5.459')]
[2024-09-11 16:19:05,584][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth...
[2024-09-11 16:19:05,689][02980] Saving new best policy, reward=5.459!
[2024-09-11 16:19:07,516][02993] Updated weights for policy 0, policy_version 190 (0.0018)
[2024-09-11 16:19:10,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3495.3). Total num frames: 786432. Throughput: 0: 938.2. Samples: 196346. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:19:10,525][00309] Avg episode reward: [(0, '5.452')]
[2024-09-11 16:19:15,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3490.5). Total num frames: 802816. Throughput: 0: 923.0. Samples: 200692. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:19:15,524][00309] Avg episode reward: [(0, '5.518')]
[2024-09-11 16:19:15,536][02980] Saving new best policy, reward=5.518!
[2024-09-11 16:19:19,163][02993] Updated weights for policy 0, policy_version 200 (0.0012)
[2024-09-11 16:19:20,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3503.4). Total num frames: 823296. Throughput: 0: 921.2. Samples: 203860. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:19:20,520][00309] Avg episode reward: [(0, '5.355')]
[2024-09-11 16:19:25,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3515.7). Total num frames: 843776. Throughput: 0: 934.3. Samples: 210226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:19:25,524][00309] Avg episode reward: [(0, '5.258')]
[2024-09-11 16:19:30,519][00309] Fps is (10 sec: 3276.5, 60 sec: 3686.3, 300 sec: 3494.1). Total num frames: 856064. Throughput: 0: 925.9. Samples: 214634. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:19:30,526][00309] Avg episode reward: [(0, '5.708')]
[2024-09-11 16:19:30,528][02980] Saving new best policy, reward=5.708!
[2024-09-11 16:19:30,750][02993] Updated weights for policy 0, policy_version 210 (0.0019)
[2024-09-11 16:19:35,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3506.2). Total num frames: 876544. Throughput: 0: 924.0. Samples: 217686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:19:35,525][00309] Avg episode reward: [(0, '5.621')]
[2024-09-11 16:19:40,264][02993] Updated weights for policy 0, policy_version 220 (0.0013)
[2024-09-11 16:19:40,518][00309] Fps is (10 sec: 4506.1, 60 sec: 3822.9, 300 sec: 3533.8). Total num frames: 901120. Throughput: 0: 930.6. Samples: 224198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:19:40,520][00309] Avg episode reward: [(0, '5.775')]
[2024-09-11 16:19:40,523][02980] Saving new best policy, reward=5.775!
[2024-09-11 16:19:45,518][00309] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3513.1). Total num frames: 913408. Throughput: 0: 931.7. Samples: 228762. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:19:45,525][00309] Avg episode reward: [(0, '5.692')]
[2024-09-11 16:19:50,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3524.1). Total num frames: 933888. Throughput: 0: 924.1. Samples: 231646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:19:50,520][00309] Avg episode reward: [(0, '5.991')]
[2024-09-11 16:19:50,525][02980] Saving new best policy, reward=5.991!
[2024-09-11 16:19:52,198][02993] Updated weights for policy 0, policy_version 230 (0.0012)
[2024-09-11 16:19:55,518][00309] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3534.7). Total num frames: 954368. Throughput: 0: 923.1. Samples: 237884. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:19:55,520][00309] Avg episode reward: [(0, '6.163')]
[2024-09-11 16:19:55,534][02980] Saving new best policy, reward=6.163!
[2024-09-11 16:20:00,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3530.0). Total num frames: 970752. Throughput: 0: 934.4. Samples: 242738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:20:00,522][00309] Avg episode reward: [(0, '6.136')]
[2024-09-11 16:20:03,584][02993] Updated weights for policy 0, policy_version 240 (0.0017)
[2024-09-11 16:20:05,518][00309] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3525.5). Total num frames: 987136. Throughput: 0: 922.1. Samples: 245356. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-09-11 16:20:05,526][00309] Avg episode reward: [(0, '5.979')]
[2024-09-11 16:20:10,518][00309] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 1011712. Throughput: 0: 925.4. Samples: 251870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:20:10,521][00309] Avg episode reward: [(0, '6.165')]
[2024-09-11 16:20:10,523][02980] Saving new best policy, reward=6.165!
[2024-09-11 16:20:14,003][02993] Updated weights for policy 0, policy_version 250 (0.0012)
[2024-09-11 16:20:15,518][00309] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3531.0). Total num frames: 1024000. Throughput: 0: 939.2. Samples: 256898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-11 16:20:15,526][00309] Avg episode reward: [(0, '6.128')]
[2024-09-11 16:20:20,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1044480. Throughput: 0: 924.6. Samples: 259294. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:20:20,520][00309] Avg episode reward: [(0, '6.270')]
[2024-09-11 16:20:20,525][02980] Saving new best policy, reward=6.270!
[2024-09-11 16:20:24,801][02993] Updated weights for policy 0, policy_version 260 (0.0023)
[2024-09-11 16:20:25,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3610.1). Total num frames: 1064960. Throughput: 0: 921.9. Samples: 265684. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:20:25,520][00309] Avg episode reward: [(0, '5.977')]
[2024-09-11 16:20:30,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 940.4. Samples: 271082. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:20:30,524][00309] Avg episode reward: [(0, '5.945')]
[2024-09-11 16:20:35,518][00309] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1097728. Throughput: 0: 921.6. Samples: 273120. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:20:35,528][00309] Avg episode reward: [(0, '6.555')]
[2024-09-11 16:20:35,544][02980] Saving new best policy, reward=6.555!
[2024-09-11 16:20:36,474][02993] Updated weights for policy 0, policy_version 270 (0.0014)
[2024-09-11 16:20:40,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1122304. Throughput: 0: 924.0. Samples: 279462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:20:40,522][00309] Avg episode reward: [(0, '6.532')]
[2024-09-11 16:20:45,521][00309] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 1138688. Throughput: 0: 944.9. Samples: 285260. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:20:45,526][00309] Avg episode reward: [(0, '6.276')]
[2024-09-11 16:20:47,599][02993] Updated weights for policy 0, policy_version 280 (0.0012)
[2024-09-11 16:20:50,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1155072. Throughput: 0: 931.9. Samples: 287290. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:20:50,520][00309] Avg episode reward: [(0, '6.437')]
[2024-09-11 16:20:55,518][00309] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1175552. Throughput: 0: 924.4. Samples: 293466. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:20:55,520][00309] Avg episode reward: [(0, '6.835')]
[2024-09-11 16:20:55,535][02980] Saving new best policy, reward=6.835!
[2024-09-11 16:20:57,531][02993] Updated weights for policy 0, policy_version 290 (0.0012)
[2024-09-11 16:21:00,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1196032. Throughput: 0: 946.7. Samples: 299500. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:21:00,537][00309] Avg episode reward: [(0, '7.366')]
[2024-09-11 16:21:00,547][02980] Saving new best policy, reward=7.366!
[2024-09-11 16:21:05,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1212416. Throughput: 0: 936.8. Samples: 301448. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:21:05,523][00309] Avg episode reward: [(0, '7.587')]
[2024-09-11 16:21:05,531][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth...
[2024-09-11 16:21:05,637][02980] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_323584.pth
[2024-09-11 16:21:05,651][02980] Saving new best policy, reward=7.587!
[2024-09-11 16:21:09,379][02993] Updated weights for policy 0, policy_version 300 (0.0022)
[2024-09-11 16:21:10,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1232896. Throughput: 0: 922.0. Samples: 307174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:21:10,520][00309] Avg episode reward: [(0, '6.621')]
[2024-09-11 16:21:15,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1253376. Throughput: 0: 944.0. Samples: 313562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:21:15,525][00309] Avg episode reward: [(0, '7.050')]
[2024-09-11 16:21:20,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1265664. Throughput: 0: 944.1. Samples: 315606. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:21:20,520][00309] Avg episode reward: [(0, '7.627')]
[2024-09-11 16:21:20,522][02980] Saving new best policy, reward=7.627!
[2024-09-11 16:21:20,846][02993] Updated weights for policy 0, policy_version 310 (0.0014)
[2024-09-11 16:21:25,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1286144. Throughput: 0: 925.9. Samples: 321126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:21:25,520][00309] Avg episode reward: [(0, '8.096')]
[2024-09-11 16:21:25,532][02980] Saving new best policy, reward=8.096!
[2024-09-11 16:21:30,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1306624. Throughput: 0: 937.5. Samples: 327444. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:21:30,523][00309] Avg episode reward: [(0, '7.549')]
[2024-09-11 16:21:30,697][02993] Updated weights for policy 0, policy_version 320 (0.0012)
[2024-09-11 16:21:35,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1323008. Throughput: 0: 939.1. Samples: 329548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:21:35,520][00309] Avg episode reward: [(0, '7.628')]
[2024-09-11 16:21:40,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1343488. Throughput: 0: 920.5. Samples: 334888. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:21:40,524][00309] Avg episode reward: [(0, '7.426')]
[2024-09-11 16:21:42,052][02993] Updated weights for policy 0, policy_version 330 (0.0017)
[2024-09-11 16:21:45,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 1363968. Throughput: 0: 930.9. Samples: 341390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:21:45,520][00309] Avg episode reward: [(0, '6.767')]
[2024-09-11 16:21:50,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1380352. Throughput: 0: 941.5. Samples: 343816. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:21:50,522][00309] Avg episode reward: [(0, '7.092')]
[2024-09-11 16:21:53,654][02993] Updated weights for policy 0, policy_version 340 (0.0019)
[2024-09-11 16:21:55,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1396736. Throughput: 0: 924.3. Samples: 348768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:21:55,524][00309] Avg episode reward: [(0, '7.407')]
[2024-09-11 16:22:00,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1421312. Throughput: 0: 928.2. Samples: 355330. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:00,523][00309] Avg episode reward: [(0, '7.953')]
[2024-09-11 16:22:03,943][02993] Updated weights for policy 0, policy_version 350 (0.0014)
[2024-09-11 16:22:05,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1437696. Throughput: 0: 942.1. Samples: 358000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:22:05,524][00309] Avg episode reward: [(0, '7.880')]
[2024-09-11 16:22:10,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1454080. Throughput: 0: 927.2. Samples: 362850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:22:10,526][00309] Avg episode reward: [(0, '8.001')]
[2024-09-11 16:22:14,649][02993] Updated weights for policy 0, policy_version 360 (0.0013)
[2024-09-11 16:22:15,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1474560. Throughput: 0: 930.7. Samples: 369324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:15,524][00309] Avg episode reward: [(0, '8.100')]
[2024-09-11 16:22:15,589][02980] Saving new best policy, reward=8.100!
[2024-09-11 16:22:20,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1490944. Throughput: 0: 946.8. Samples: 372156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:20,521][00309] Avg episode reward: [(0, '8.573')]
[2024-09-11 16:22:20,525][02980] Saving new best policy, reward=8.573!
[2024-09-11 16:22:25,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1511424. Throughput: 0: 925.2. Samples: 376524. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:22:25,522][00309] Avg episode reward: [(0, '8.814')]
[2024-09-11 16:22:25,533][02980] Saving new best policy, reward=8.814!
[2024-09-11 16:22:26,563][02993] Updated weights for policy 0, policy_version 370 (0.0023)
[2024-09-11 16:22:30,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1531904. Throughput: 0: 924.4. Samples: 382990. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:30,522][00309] Avg episode reward: [(0, '9.316')]
[2024-09-11 16:22:30,525][02980] Saving new best policy, reward=9.316!
[2024-09-11 16:22:35,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1548288. Throughput: 0: 942.0. Samples: 386206. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:35,520][00309] Avg episode reward: [(0, '9.485')]
[2024-09-11 16:22:35,537][02980] Saving new best policy, reward=9.485!
[2024-09-11 16:22:37,849][02993] Updated weights for policy 0, policy_version 380 (0.0013)
[2024-09-11 16:22:40,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1564672. Throughput: 0: 923.3. Samples: 390316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:40,520][00309] Avg episode reward: [(0, '9.571')]
[2024-09-11 16:22:40,524][02980] Saving new best policy, reward=9.571!
[2024-09-11 16:22:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1585152. Throughput: 0: 920.2. Samples: 396740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:45,523][00309] Avg episode reward: [(0, '10.427')]
[2024-09-11 16:22:45,533][02980] Saving new best policy, reward=10.427!
[2024-09-11 16:22:47,794][02993] Updated weights for policy 0, policy_version 390 (0.0013)
[2024-09-11 16:22:50,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.1). Total num frames: 1605632. Throughput: 0: 931.7. Samples: 399926. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:50,526][00309] Avg episode reward: [(0, '10.769')]
[2024-09-11 16:22:50,535][02980] Saving new best policy, reward=10.769!
[2024-09-11 16:22:55,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1617920. Throughput: 0: 921.7. Samples: 404328. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:22:55,520][00309] Avg episode reward: [(0, '10.867')]
[2024-09-11 16:22:55,633][02980] Saving new best policy, reward=10.867!
[2024-09-11 16:22:59,465][02993] Updated weights for policy 0, policy_version 400 (0.0012)
[2024-09-11 16:23:00,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1642496. Throughput: 0: 914.1. Samples: 410458. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:23:00,520][00309] Avg episode reward: [(0, '10.709')]
[2024-09-11 16:23:05,518][00309] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1662976. Throughput: 0: 922.6. Samples: 413674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:23:05,521][00309] Avg episode reward: [(0, '10.966')]
[2024-09-11 16:23:05,527][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth...
[2024-09-11 16:23:05,666][02980] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth
[2024-09-11 16:23:05,680][02980] Saving new best policy, reward=10.966!
[2024-09-11 16:23:10,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1675264. Throughput: 0: 930.3. Samples: 418388. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:23:10,520][00309] Avg episode reward: [(0, '10.790')]
[2024-09-11 16:23:11,141][02993] Updated weights for policy 0, policy_version 410 (0.0012)
[2024-09-11 16:23:15,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1695744. Throughput: 0: 916.6. Samples: 424238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:23:15,520][00309] Avg episode reward: [(0, '11.260')]
[2024-09-11 16:23:15,530][02980] Saving new best policy, reward=11.260!
[2024-09-11 16:23:20,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1716224. Throughput: 0: 915.8. Samples: 427418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:23:20,520][00309] Avg episode reward: [(0, '11.632')]
[2024-09-11 16:23:20,526][02980] Saving new best policy, reward=11.632!
[2024-09-11 16:23:20,846][02993] Updated weights for policy 0, policy_version 420 (0.0012)
[2024-09-11 16:23:25,518][00309] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1732608. Throughput: 0: 935.9. Samples: 432432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:23:25,520][00309] Avg episode reward: [(0, '11.562')]
[2024-09-11 16:23:30,520][00309] Fps is (10 sec: 3276.1, 60 sec: 3618.0, 300 sec: 3707.2). Total num frames: 1748992. Throughput: 0: 914.4. Samples: 437890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:23:30,522][00309] Avg episode reward: [(0, '11.775')]
[2024-09-11 16:23:30,526][02980] Saving new best policy, reward=11.775!
[2024-09-11 16:23:32,583][02993] Updated weights for policy 0, policy_version 430 (0.0012)
[2024-09-11 16:23:35,518][00309] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1769472. Throughput: 0: 913.6. Samples: 441036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:23:35,520][00309] Avg episode reward: [(0, '11.251')]
[2024-09-11 16:23:40,518][00309] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1785856. Throughput: 0: 936.5. Samples: 446472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:23:40,522][00309] Avg episode reward: [(0, '11.251')]
[2024-09-11 16:23:44,296][02993] Updated weights for policy 0, policy_version 440 (0.0017)
[2024-09-11 16:23:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1806336. Throughput: 0: 916.0. Samples: 451680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:23:45,520][00309] Avg episode reward: [(0, '13.213')]
[2024-09-11 16:23:45,532][02980] Saving new best policy, reward=13.213!
[2024-09-11 16:23:50,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1826816. Throughput: 0: 914.4. Samples: 454824. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:23:50,522][00309] Avg episode reward: [(0, '13.809')]
[2024-09-11 16:23:50,526][02980] Saving new best policy, reward=13.809!
[2024-09-11 16:23:54,907][02993] Updated weights for policy 0, policy_version 450 (0.0014)
[2024-09-11 16:23:55,519][00309] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 1843200. Throughput: 0: 936.8. Samples: 460546. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:23:55,528][00309] Avg episode reward: [(0, '15.031')]
[2024-09-11 16:23:55,537][02980] Saving new best policy, reward=15.031!
[2024-09-11 16:24:00,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1859584. Throughput: 0: 913.7. Samples: 465356. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:24:00,520][00309] Avg episode reward: [(0, '15.273')]
[2024-09-11 16:24:00,526][02980] Saving new best policy, reward=15.273!
[2024-09-11 16:24:05,518][00309] Fps is (10 sec: 3686.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1880064. Throughput: 0: 912.2. Samples: 468468. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:24:05,520][00309] Avg episode reward: [(0, '13.878')]
[2024-09-11 16:24:05,762][02993] Updated weights for policy 0, policy_version 460 (0.0012)
[2024-09-11 16:24:10,520][00309] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3707.2). Total num frames: 1896448. Throughput: 0: 935.1. Samples: 474512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:24:10,526][00309] Avg episode reward: [(0, '13.544')]
[2024-09-11 16:24:15,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1912832. Throughput: 0: 913.6. Samples: 479000. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:24:15,524][00309] Avg episode reward: [(0, '13.444')]
[2024-09-11 16:24:17,283][02993] Updated weights for policy 0, policy_version 470 (0.0021)
[2024-09-11 16:24:20,518][00309] Fps is (10 sec: 4096.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1937408. Throughput: 0: 915.2. Samples: 482218. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:24:20,520][00309] Avg episode reward: [(0, '14.500')]
[2024-09-11 16:24:25,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1953792. Throughput: 0: 935.7. Samples: 488578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:24:25,522][00309] Avg episode reward: [(0, '14.643')]
[2024-09-11 16:24:28,717][02993] Updated weights for policy 0, policy_version 480 (0.0023)
[2024-09-11 16:24:30,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3707.2). Total num frames: 1970176. Throughput: 0: 916.2. Samples: 492908. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:24:30,526][00309] Avg episode reward: [(0, '16.089')]
[2024-09-11 16:24:30,531][02980] Saving new best policy, reward=16.089!
[2024-09-11 16:24:35,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1990656. Throughput: 0: 917.8. Samples: 496124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:24:35,521][00309] Avg episode reward: [(0, '17.184')]
[2024-09-11 16:24:35,610][02980] Saving new best policy, reward=17.184!
[2024-09-11 16:24:38,582][02993] Updated weights for policy 0, policy_version 490 (0.0020)
[2024-09-11 16:24:40,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2011136. Throughput: 0: 932.6. Samples: 502514. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:24:40,526][00309] Avg episode reward: [(0, '16.703')]
[2024-09-11 16:24:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2027520. Throughput: 0: 920.9. Samples: 506796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:24:45,522][00309] Avg episode reward: [(0, '17.165')]
[2024-09-11 16:24:50,040][02993] Updated weights for policy 0, policy_version 500 (0.0012)
[2024-09-11 16:24:50,518][00309] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2048000. Throughput: 0: 924.3. Samples: 510062. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:24:50,522][00309] Avg episode reward: [(0, '17.776')]
[2024-09-11 16:24:50,528][02980] Saving new best policy, reward=17.776!
[2024-09-11 16:24:55,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2068480. Throughput: 0: 930.8. Samples: 516394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:24:55,523][00309] Avg episode reward: [(0, '16.781')]
[2024-09-11 16:25:00,518][00309] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2080768. Throughput: 0: 930.1. Samples: 520854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:25:00,526][00309] Avg episode reward: [(0, '17.221')]
[2024-09-11 16:25:01,717][02993] Updated weights for policy 0, policy_version 510 (0.0022)
[2024-09-11 16:25:05,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2105344. Throughput: 0: 926.4. Samples: 523904. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:25:05,524][00309] Avg episode reward: [(0, '17.268')]
[2024-09-11 16:25:05,539][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000514_2105344.pth...
[2024-09-11 16:25:05,547][00309] Components not started: RolloutWorker_w2, RolloutWorker_w6, RolloutWorker_w7, wait_time=600.0 seconds
[2024-09-11 16:25:05,640][02980] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth
[2024-09-11 16:25:10,518][00309] Fps is (10 sec: 4505.6, 60 sec: 3823.1, 300 sec: 3735.0). Total num frames: 2125824. Throughput: 0: 928.1. Samples: 530344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:25:10,523][00309] Avg episode reward: [(0, '16.734')]
[2024-09-11 16:25:11,727][02993] Updated weights for policy 0, policy_version 520 (0.0016)
[2024-09-11 16:25:15,525][00309] Fps is (10 sec: 3274.5, 60 sec: 3754.2, 300 sec: 3707.1). Total num frames: 2138112. Throughput: 0: 935.8. Samples: 535024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:25:15,527][00309] Avg episode reward: [(0, '16.248')]
[2024-09-11 16:25:20,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2158592. Throughput: 0: 926.0. Samples: 537796. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:25:20,521][00309] Avg episode reward: [(0, '16.876')]
[2024-09-11 16:25:22,961][02993] Updated weights for policy 0, policy_version 530 (0.0023)
[2024-09-11 16:25:25,518][00309] Fps is (10 sec: 4098.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2179072. Throughput: 0: 924.2. Samples: 544104. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:25:25,520][00309] Avg episode reward: [(0, '17.672')]
[2024-09-11 16:25:30,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2195456. Throughput: 0: 939.8. Samples: 549088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:25:30,524][00309] Avg episode reward: [(0, '19.163')]
[2024-09-11 16:25:30,526][02980] Saving new best policy, reward=19.163!
[2024-09-11 16:25:34,655][02993] Updated weights for policy 0, policy_version 540 (0.0012)
[2024-09-11 16:25:35,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2211840. Throughput: 0: 921.4. Samples: 551524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:25:35,522][00309] Avg episode reward: [(0, '19.799')]
[2024-09-11 16:25:35,588][02980] Saving new best policy, reward=19.799!
[2024-09-11 16:25:40,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2236416. Throughput: 0: 924.0. Samples: 557972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:25:40,524][00309] Avg episode reward: [(0, '21.926')]
[2024-09-11 16:25:40,527][02980] Saving new best policy, reward=21.926!
[2024-09-11 16:25:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2248704. Throughput: 0: 939.2. Samples: 563120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:25:45,520][00309] Avg episode reward: [(0, '22.394')]
[2024-09-11 16:25:45,540][02980] Saving new best policy, reward=22.394!
[2024-09-11 16:25:45,788][02993] Updated weights for policy 0, policy_version 550 (0.0017)
[2024-09-11 16:25:50,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2269184. Throughput: 0: 919.4. Samples: 565276. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:25:50,525][00309] Avg episode reward: [(0, '22.711')]
[2024-09-11 16:25:50,528][02980] Saving new best policy, reward=22.711!
[2024-09-11 16:25:55,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2289664. Throughput: 0: 917.8. Samples: 571644. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:25:55,524][00309] Avg episode reward: [(0, '22.362')]
[2024-09-11 16:25:55,861][02993] Updated weights for policy 0, policy_version 560 (0.0016)
[2024-09-11 16:26:00,522][00309] Fps is (10 sec: 3685.0, 60 sec: 3754.4, 300 sec: 3707.2). Total num frames: 2306048. Throughput: 0: 934.2. Samples: 577062. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:26:00,523][00309] Avg episode reward: [(0, '22.033')]
[2024-09-11 16:26:05,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2322432. Throughput: 0: 918.2. Samples: 579114. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:26:05,523][00309] Avg episode reward: [(0, '21.711')]
[2024-09-11 16:26:07,582][02993] Updated weights for policy 0, policy_version 570 (0.0022)
[2024-09-11 16:26:10,518][00309] Fps is (10 sec: 3687.7, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2342912. Throughput: 0: 918.4. Samples: 585434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:26:10,526][00309] Avg episode reward: [(0, '20.975')]
[2024-09-11 16:26:15,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3755.1, 300 sec: 3721.1). Total num frames: 2363392. Throughput: 0: 935.3. Samples: 591176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:26:15,529][00309] Avg episode reward: [(0, '20.427')]
[2024-09-11 16:26:19,375][02993] Updated weights for policy 0, policy_version 580 (0.0017)
[2024-09-11 16:26:20,518][00309] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2379776. Throughput: 0: 923.8. Samples: 593094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:26:20,520][00309] Avg episode reward: [(0, '19.963')]
[2024-09-11 16:26:25,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2400256. Throughput: 0: 914.6. Samples: 599130. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:26:25,522][00309] Avg episode reward: [(0, '20.521')]
[2024-09-11 16:26:29,014][02993] Updated weights for policy 0, policy_version 590 (0.0013)
[2024-09-11 16:26:30,518][00309] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2420736. Throughput: 0: 935.3. Samples: 605210. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:26:30,530][00309] Avg episode reward: [(0, '20.105')]
[2024-09-11 16:26:35,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2433024. Throughput: 0: 933.7. Samples: 607292. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:26:35,520][00309] Avg episode reward: [(0, '20.360')]
[2024-09-11 16:26:40,378][02993] Updated weights for policy 0, policy_version 600 (0.0012)
[2024-09-11 16:26:40,518][00309] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2457600. Throughput: 0: 924.6. Samples: 613252. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:26:40,520][00309] Avg episode reward: [(0, '20.344')]
[2024-09-11 16:26:45,518][00309] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2478080. Throughput: 0: 946.3. Samples: 619640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:26:45,527][00309] Avg episode reward: [(0, '20.811')]
[2024-09-11 16:26:50,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2490368. Throughput: 0: 946.1. Samples: 621688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:26:50,528][00309] Avg episode reward: [(0, '20.833')]
[2024-09-11 16:26:51,618][02993] Updated weights for policy 0, policy_version 610 (0.0020)
[2024-09-11 16:26:55,518][00309] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2510848. Throughput: 0: 929.6. Samples: 627266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:26:55,520][00309] Avg episode reward: [(0, '19.542')]
[2024-09-11 16:27:00,522][00309] Fps is (10 sec: 4094.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2531328. Throughput: 0: 946.9. Samples: 633788. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:27:00,524][00309] Avg episode reward: [(0, '19.367')]
[2024-09-11 16:27:02,045][02993] Updated weights for policy 0, policy_version 620 (0.0020)
[2024-09-11 16:27:05,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2547712. Throughput: 0: 950.5. Samples: 635866. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:27:05,521][00309] Avg episode reward: [(0, '19.717')]
[2024-09-11 16:27:05,532][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000622_2547712.pth...
[2024-09-11 16:27:05,645][02980] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth
[2024-09-11 16:27:10,518][00309] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2568192. Throughput: 0: 936.8. Samples: 641284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:27:10,526][00309] Avg episode reward: [(0, '20.019')]
[2024-09-11 16:27:12,887][02993] Updated weights for policy 0, policy_version 630 (0.0012)
[2024-09-11 16:27:15,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2588672. Throughput: 0: 947.2. Samples: 647836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:27:15,528][00309] Avg episode reward: [(0, '19.443')]
[2024-09-11 16:27:20,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2605056. Throughput: 0: 950.5. Samples: 650066. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:27:20,522][00309] Avg episode reward: [(0, '19.681')]
[2024-09-11 16:27:24,606][02993] Updated weights for policy 0, policy_version 640 (0.0014)
[2024-09-11 16:27:25,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2621440. Throughput: 0: 925.9. Samples: 654916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:27:25,524][00309] Avg episode reward: [(0, '18.819')]
[2024-09-11 16:27:30,520][00309] Fps is (10 sec: 4095.2, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 2646016. Throughput: 0: 928.5. Samples: 661424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-11 16:27:30,524][00309] Avg episode reward: [(0, '17.802')]
[2024-09-11 16:27:35,479][02993] Updated weights for policy 0, policy_version 650 (0.0013)
[2024-09-11 16:27:35,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2662400. Throughput: 0: 942.0. Samples: 664078. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:27:35,524][00309] Avg episode reward: [(0, '18.057')]
[2024-09-11 16:27:40,518][00309] Fps is (10 sec: 3277.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2678784. Throughput: 0: 926.3. Samples: 668948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:27:40,520][00309] Avg episode reward: [(0, '18.671')]
[2024-09-11 16:27:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2699264. Throughput: 0: 927.1. Samples: 675504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:27:45,528][00309] Avg episode reward: [(0, '20.872')]
[2024-09-11 16:27:45,602][02993] Updated weights for policy 0, policy_version 660 (0.0014)
[2024-09-11 16:27:50,518][00309] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2715648. Throughput: 0: 945.3. Samples: 678406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:27:50,521][00309] Avg episode reward: [(0, '21.731')]
[2024-09-11 16:27:55,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2736128. Throughput: 0: 924.6. Samples: 682890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:27:55,520][00309] Avg episode reward: [(0, '22.435')]
[2024-09-11 16:27:57,153][02993] Updated weights for policy 0, policy_version 670 (0.0012)
[2024-09-11 16:28:00,518][00309] Fps is (10 sec: 4096.1, 60 sec: 3754.9, 300 sec: 3707.2). Total num frames: 2756608. Throughput: 0: 924.8. Samples: 689452. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:28:00,521][00309] Avg episode reward: [(0, '22.539')]
[2024-09-11 16:28:05,520][00309] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3721.1). Total num frames: 2772992. Throughput: 0: 944.7. Samples: 692580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-09-11 16:28:05,529][00309] Avg episode reward: [(0, '23.436')]
[2024-09-11 16:28:05,544][02980] Saving new best policy, reward=23.436!
[2024-09-11 16:28:08,790][02993] Updated weights for policy 0, policy_version 680 (0.0014)
[2024-09-11 16:28:10,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2789376. Throughput: 0: 932.8. Samples: 696890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:28:10,522][00309] Avg episode reward: [(0, '23.408')]
[2024-09-11 16:28:15,518][00309] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2813952. Throughput: 0: 932.7. Samples: 703392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:28:15,520][00309] Avg episode reward: [(0, '22.331')]
[2024-09-11 16:28:18,110][02993] Updated weights for policy 0, policy_version 690 (0.0012)
[2024-09-11 16:28:20,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2830336. Throughput: 0: 946.4. Samples: 706664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:28:20,523][00309] Avg episode reward: [(0, '22.060')]
[2024-09-11 16:28:25,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2846720. Throughput: 0: 934.4. Samples: 710994. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:28:25,523][00309] Avg episode reward: [(0, '21.786')]
[2024-09-11 16:28:29,901][02993] Updated weights for policy 0, policy_version 700 (0.0012)
[2024-09-11 16:28:30,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 2867200. Throughput: 0: 927.2. Samples: 717230. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:28:30,521][00309] Avg episode reward: [(0, '21.219')]
[2024-09-11 16:28:35,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2887680. Throughput: 0: 935.0. Samples: 720482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-11 16:28:35,520][00309] Avg episode reward: [(0, '22.595')]
[2024-09-11 16:28:40,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2904064. Throughput: 0: 941.3. Samples: 725248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:28:40,522][00309] Avg episode reward: [(0, '22.416')]
[2024-09-11 16:28:41,345][02993] Updated weights for policy 0, policy_version 710 (0.0018)
[2024-09-11 16:28:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2924544. Throughput: 0: 930.5. Samples: 731324. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:28:45,521][00309] Avg episode reward: [(0, '22.171')]
[2024-09-11 16:28:50,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2945024. Throughput: 0: 932.6. Samples: 734544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:28:50,525][00309] Avg episode reward: [(0, '23.361')]
[2024-09-11 16:28:50,954][02993] Updated weights for policy 0, policy_version 720 (0.0016)
[2024-09-11 16:28:55,518][00309] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2957312. Throughput: 0: 947.6. Samples: 739534. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:28:55,522][00309] Avg episode reward: [(0, '22.206')]
[2024-09-11 16:29:00,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2977792. Throughput: 0: 929.3. Samples: 745210. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:29:00,525][00309] Avg episode reward: [(0, '20.916')]
[2024-09-11 16:29:02,482][02993] Updated weights for policy 0, policy_version 730 (0.0016)
[2024-09-11 16:29:05,518][00309] Fps is (10 sec: 4505.6, 60 sec: 3823.1, 300 sec: 3748.9). Total num frames: 3002368. Throughput: 0: 928.5. Samples: 748448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:29:05,525][00309] Avg episode reward: [(0, '21.615')]
[2024-09-11 16:29:05,539][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000733_3002368.pth...
[2024-09-11 16:29:05,658][02980] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000514_2105344.pth
[2024-09-11 16:29:10,520][00309] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 3014656. Throughput: 0: 947.4. Samples: 753630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:29:10,527][00309] Avg episode reward: [(0, '21.655')]
[2024-09-11 16:29:14,147][02993] Updated weights for policy 0, policy_version 740 (0.0014)
[2024-09-11 16:29:15,518][00309] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3035136. Throughput: 0: 930.8. Samples: 759118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:29:15,521][00309] Avg episode reward: [(0, '20.989')]
[2024-09-11 16:29:20,518][00309] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3055616. Throughput: 0: 930.9. Samples: 762374. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:29:20,520][00309] Avg episode reward: [(0, '22.821')]
[2024-09-11 16:29:24,625][02993] Updated weights for policy 0, policy_version 750 (0.0023)
[2024-09-11 16:29:25,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3072000. Throughput: 0: 948.3. Samples: 767922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:29:25,521][00309] Avg episode reward: [(0, '22.586')]
[2024-09-11 16:29:30,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3092480. Throughput: 0: 927.8. Samples: 773076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:29:30,524][00309] Avg episode reward: [(0, '21.637')]
[2024-09-11 16:29:35,084][02993] Updated weights for policy 0, policy_version 760 (0.0015)
[2024-09-11 16:29:35,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3112960. Throughput: 0: 929.6. Samples: 776374. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:29:35,525][00309] Avg episode reward: [(0, '21.820')]
[2024-09-11 16:29:40,519][00309] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 3129344. Throughput: 0: 950.2. Samples: 782292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:29:40,527][00309] Avg episode reward: [(0, '21.214')]
[2024-09-11 16:29:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3149824. Throughput: 0: 936.2. Samples: 787340. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:29:45,520][00309] Avg episode reward: [(0, '20.283')]
[2024-09-11 16:29:46,326][02993] Updated weights for policy 0, policy_version 770 (0.0015)
[2024-09-11 16:29:50,518][00309] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3170304. Throughput: 0: 937.4. Samples: 790630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:29:50,519][00309] Avg episode reward: [(0, '21.435')]
[2024-09-11 16:29:55,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3748.9). Total num frames: 3186688. Throughput: 0: 958.7. Samples: 796768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:29:55,523][00309] Avg episode reward: [(0, '21.455')]
[2024-09-11 16:29:57,361][02993] Updated weights for policy 0, policy_version 780 (0.0021)
[2024-09-11 16:30:00,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3203072. Throughput: 0: 940.7. Samples: 801450. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:30:00,520][00309] Avg episode reward: [(0, '22.283')]
[2024-09-11 16:30:05,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3227648. Throughput: 0: 940.8. Samples: 804710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:30:05,524][00309] Avg episode reward: [(0, '22.110')]
[2024-09-11 16:30:07,098][02993] Updated weights for policy 0, policy_version 790 (0.0012)
[2024-09-11 16:30:10,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3749.0). Total num frames: 3244032. Throughput: 0: 960.8. Samples: 811158. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:30:10,524][00309] Avg episode reward: [(0, '22.371')]
[2024-09-11 16:30:15,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3260416. Throughput: 0: 944.0. Samples: 815558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:30:15,522][00309] Avg episode reward: [(0, '22.117')]
[2024-09-11 16:30:18,708][02993] Updated weights for policy 0, policy_version 800 (0.0019)
[2024-09-11 16:30:20,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3280896. Throughput: 0: 941.8. Samples: 818754. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:30:20,521][00309] Avg episode reward: [(0, '23.395')]
[2024-09-11 16:30:25,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3301376. Throughput: 0: 950.2. Samples: 825048. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:30:25,520][00309] Avg episode reward: [(0, '24.123')]
[2024-09-11 16:30:25,530][02980] Saving new best policy, reward=24.123!
[2024-09-11 16:30:30,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3313664. Throughput: 0: 928.3. Samples: 829114. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:30:30,521][00309] Avg episode reward: [(0, '23.419')]
[2024-09-11 16:30:30,799][02993] Updated weights for policy 0, policy_version 810 (0.0017)
[2024-09-11 16:30:35,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3338240. Throughput: 0: 921.9. Samples: 832116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:30:35,520][00309] Avg episode reward: [(0, '24.359')]
[2024-09-11 16:30:35,527][02980] Saving new best policy, reward=24.359!
[2024-09-11 16:30:40,374][02993] Updated weights for policy 0, policy_version 820 (0.0012)
[2024-09-11 16:30:40,518][00309] Fps is (10 sec: 4505.5, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 3358720. Throughput: 0: 928.7. Samples: 838560. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:30:40,520][00309] Avg episode reward: [(0, '23.830')]
[2024-09-11 16:30:45,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3371008. Throughput: 0: 928.9. Samples: 843250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:30:45,520][00309] Avg episode reward: [(0, '23.890')]
[2024-09-11 16:30:50,518][00309] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3391488. Throughput: 0: 917.8. Samples: 846012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-11 16:30:50,526][00309] Avg episode reward: [(0, '23.675')]
[2024-09-11 16:30:51,910][02993] Updated weights for policy 0, policy_version 830 (0.0024)
[2024-09-11 16:30:55,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3411968. Throughput: 0: 920.7. Samples: 852590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:30:55,520][00309] Avg episode reward: [(0, '23.123')]
[2024-09-11 16:31:00,518][00309] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3428352. Throughput: 0: 930.7. Samples: 857440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:31:00,526][00309] Avg episode reward: [(0, '24.282')]
[2024-09-11 16:31:03,416][02993] Updated weights for policy 0, policy_version 840 (0.0013)
[2024-09-11 16:31:05,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3448832. Throughput: 0: 915.9. Samples: 859970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:31:05,524][00309] Avg episode reward: [(0, '23.686')]
[2024-09-11 16:31:05,537][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000842_3448832.pth...
[2024-09-11 16:31:05,678][02980] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000622_2547712.pth
[2024-09-11 16:31:10,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3469312. Throughput: 0: 920.4. Samples: 866466. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:31:10,523][00309] Avg episode reward: [(0, '22.266')]
[2024-09-11 16:31:13,559][02993] Updated weights for policy 0, policy_version 850 (0.0012)
[2024-09-11 16:31:15,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3485696. Throughput: 0: 947.5. Samples: 871750. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:31:15,523][00309] Avg episode reward: [(0, '22.831')]
[2024-09-11 16:31:20,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3502080. Throughput: 0: 929.7. Samples: 873954. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:31:20,526][00309] Avg episode reward: [(0, '22.217')]
[2024-09-11 16:31:24,295][02993] Updated weights for policy 0, policy_version 860 (0.0015)
[2024-09-11 16:31:25,521][00309] Fps is (10 sec: 4094.8, 60 sec: 3754.5, 300 sec: 3748.8). Total num frames: 3526656. Throughput: 0: 931.8. Samples: 880494. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:31:25,523][00309] Avg episode reward: [(0, '21.411')]
[2024-09-11 16:31:30,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3543040. Throughput: 0: 945.2. Samples: 885784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:31:30,524][00309] Avg episode reward: [(0, '22.512')]
[2024-09-11 16:31:35,518][00309] Fps is (10 sec: 3277.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3559424. Throughput: 0: 930.1. Samples: 887866. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:31:35,522][00309] Avg episode reward: [(0, '23.963')]
[2024-09-11 16:31:36,213][02993] Updated weights for policy 0, policy_version 870 (0.0015)
[2024-09-11 16:31:40,518][00309] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3579904. Throughput: 0: 926.4. Samples: 894280. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:31:40,522][00309] Avg episode reward: [(0, '24.856')]
[2024-09-11 16:31:40,528][02980] Saving new best policy, reward=24.856!
[2024-09-11 16:31:45,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3596288. Throughput: 0: 946.7. Samples: 900040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:31:45,524][00309] Avg episode reward: [(0, '25.334')]
[2024-09-11 16:31:45,537][02980] Saving new best policy, reward=25.334!
[2024-09-11 16:31:46,916][02993] Updated weights for policy 0, policy_version 880 (0.0012)
[2024-09-11 16:31:50,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3616768. Throughput: 0: 935.3. Samples: 902058. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:31:50,521][00309] Avg episode reward: [(0, '26.345')]
[2024-09-11 16:31:50,525][02980] Saving new best policy, reward=26.345!
[2024-09-11 16:31:55,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3637248. Throughput: 0: 924.6. Samples: 908074. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:31:55,523][00309] Avg episode reward: [(0, '25.689')]
[2024-09-11 16:31:57,406][02993] Updated weights for policy 0, policy_version 890 (0.0026)
[2024-09-11 16:32:00,518][00309] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 3653632. Throughput: 0: 942.2. Samples: 914150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:32:00,520][00309] Avg episode reward: [(0, '24.505')]
[2024-09-11 16:32:05,518][00309] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3670016. Throughput: 0: 939.6. Samples: 916234. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:32:05,520][00309] Avg episode reward: [(0, '23.610')]
[2024-09-11 16:32:08,847][02993] Updated weights for policy 0, policy_version 900 (0.0013)
[2024-09-11 16:32:10,518][00309] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3690496. Throughput: 0: 923.2. Samples: 922034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:32:10,524][00309] Avg episode reward: [(0, '23.730')]
[2024-09-11 16:32:15,518][00309] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3710976. Throughput: 0: 949.1. Samples: 928492. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:32:15,525][00309] Avg episode reward: [(0, '23.419')]
[2024-09-11 16:32:20,329][02993] Updated weights for policy 0, policy_version 910 (0.0017)
[2024-09-11 16:32:20,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3727360. Throughput: 0: 947.3. Samples: 930496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:32:20,520][00309] Avg episode reward: [(0, '22.600')]
[2024-09-11 16:32:25,518][00309] Fps is (10 sec: 3686.3, 60 sec: 3686.6, 300 sec: 3735.0). Total num frames: 3747840. Throughput: 0: 927.3. Samples: 936010. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:32:25,526][00309] Avg episode reward: [(0, '24.178')]
[2024-09-11 16:32:30,158][02993] Updated weights for policy 0, policy_version 920 (0.0012)
[2024-09-11 16:32:30,519][00309] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 3768320. Throughput: 0: 939.7. Samples: 942326. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:32:30,527][00309] Avg episode reward: [(0, '24.599')]
[2024-09-11 16:32:35,518][00309] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3780608. Throughput: 0: 947.3. Samples: 944688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:32:35,521][00309] Avg episode reward: [(0, '24.675')]
[2024-09-11 16:32:40,518][00309] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3805184. Throughput: 0: 930.5. Samples: 949946. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:32:40,520][00309] Avg episode reward: [(0, '24.856')]
[2024-09-11 16:32:41,508][02993] Updated weights for policy 0, policy_version 930 (0.0016)
[2024-09-11 16:32:45,518][00309] Fps is (10 sec: 4505.9, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3825664. Throughput: 0: 940.9. Samples: 956490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:32:45,526][00309] Avg episode reward: [(0, '24.282')]
[2024-09-11 16:32:50,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3837952. Throughput: 0: 950.5. Samples: 959006. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:32:50,522][00309] Avg episode reward: [(0, '25.885')]
[2024-09-11 16:32:53,031][02993] Updated weights for policy 0, policy_version 940 (0.0017)
[2024-09-11 16:32:55,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3858432. Throughput: 0: 931.7. Samples: 963960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:32:55,524][00309] Avg episode reward: [(0, '24.799')]
[2024-09-11 16:33:00,518][00309] Fps is (10 sec: 4095.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3878912. Throughput: 0: 931.4. Samples: 970406. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:33:00,523][00309] Avg episode reward: [(0, '24.883')]
[2024-09-11 16:33:02,970][02993] Updated weights for policy 0, policy_version 950 (0.0013)
[2024-09-11 16:33:05,518][00309] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3895296. Throughput: 0: 948.8. Samples: 973192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-11 16:33:05,520][00309] Avg episode reward: [(0, '24.331')]
[2024-09-11 16:33:05,529][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000951_3895296.pth...
[2024-09-11 16:33:05,699][02980] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000733_3002368.pth
[2024-09-11 16:33:10,518][00309] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3915776. Throughput: 0: 926.4. Samples: 977698. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:33:10,524][00309] Avg episode reward: [(0, '24.430')]
[2024-09-11 16:33:14,421][02993] Updated weights for policy 0, policy_version 960 (0.0021)
[2024-09-11 16:33:15,518][00309] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3936256. Throughput: 0: 924.8. Samples: 983940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:33:15,525][00309] Avg episode reward: [(0, '24.533')]
[2024-09-11 16:33:20,519][00309] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 3952640. Throughput: 0: 941.9. Samples: 987072. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-09-11 16:33:20,527][00309] Avg episode reward: [(0, '24.362')]
[2024-09-11 16:33:25,518][00309] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3969024. Throughput: 0: 922.7. Samples: 991468. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:33:25,522][00309] Avg episode reward: [(0, '24.807')]
[2024-09-11 16:33:25,957][02993] Updated weights for policy 0, policy_version 970 (0.0014)
[2024-09-11 16:33:30,518][00309] Fps is (10 sec: 3686.8, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 3989504. Throughput: 0: 918.6. Samples: 997828. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-11 16:33:30,520][00309] Avg episode reward: [(0, '24.280')]
[2024-09-11 16:33:33,592][02980] Stopping Batcher_0...
[2024-09-11 16:33:33,597][02980] Loop batcher_evt_loop terminating...
[2024-09-11 16:33:33,597][00309] Component Batcher_0 stopped!
[2024-09-11 16:33:33,600][00309] Component RolloutWorker_w2 process died already! Don't wait for it.
[2024-09-11 16:33:33,603][00309] Component RolloutWorker_w6 process died already! Don't wait for it.
[2024-09-11 16:33:33,605][00309] Component RolloutWorker_w7 process died already! Don't wait for it.
[2024-09-11 16:33:33,614][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-11 16:33:33,651][02993] Weights refcount: 2 0
[2024-09-11 16:33:33,654][02993] Stopping InferenceWorker_p0-w0...
[2024-09-11 16:33:33,655][02993] Loop inference_proc0-0_evt_loop terminating...
[2024-09-11 16:33:33,654][00309] Component InferenceWorker_p0-w0 stopped!
[2024-09-11 16:33:33,769][02980] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000842_3448832.pth
[2024-09-11 16:33:33,780][02980] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-11 16:33:34,016][00309] Component LearnerWorker_p0 stopped!
[2024-09-11 16:33:34,023][02980] Stopping LearnerWorker_p0...
[2024-09-11 16:33:34,024][02980] Loop learner_proc0_evt_loop terminating...
[2024-09-11 16:33:34,101][00309] Component RolloutWorker_w1 stopped!
[2024-09-11 16:33:34,101][02994] Stopping RolloutWorker_w1...
[2024-09-11 16:33:34,107][02994] Loop rollout_proc1_evt_loop terminating...
[2024-09-11 16:33:34,145][02999] Stopping RolloutWorker_w5...
[2024-09-11 16:33:34,145][00309] Component RolloutWorker_w5 stopped!
[2024-09-11 16:33:34,146][02999] Loop rollout_proc5_evt_loop terminating...
[2024-09-11 16:33:34,189][02996] Stopping RolloutWorker_w3...
[2024-09-11 16:33:34,189][00309] Component RolloutWorker_w3 stopped!
[2024-09-11 16:33:34,193][02996] Loop rollout_proc3_evt_loop terminating...
[2024-09-11 16:33:34,252][00309] Component RolloutWorker_w4 stopped!
[2024-09-11 16:33:34,263][02998] Stopping RolloutWorker_w4...
[2024-09-11 16:33:34,263][02998] Loop rollout_proc4_evt_loop terminating...
[2024-09-11 16:33:34,306][00309] Component RolloutWorker_w0 stopped!
[2024-09-11 16:33:34,315][00309] Waiting for process learner_proc0 to stop...
[2024-09-11 16:33:34,329][02995] Stopping RolloutWorker_w0...
[2024-09-11 16:33:34,349][02995] Loop rollout_proc0_evt_loop terminating...
[2024-09-11 16:33:36,203][00309] Waiting for process inference_proc0-0 to join...
[2024-09-11 16:33:36,430][00309] Waiting for process rollout_proc0 to join...
[2024-09-11 16:33:37,700][00309] Waiting for process rollout_proc1 to join...
[2024-09-11 16:33:37,707][00309] Waiting for process rollout_proc2 to join...
[2024-09-11 16:33:37,709][00309] Waiting for process rollout_proc3 to join...
[2024-09-11 16:33:37,714][00309] Waiting for process rollout_proc4 to join...
[2024-09-11 16:33:37,718][00309] Waiting for process rollout_proc5 to join...
[2024-09-11 16:33:37,720][00309] Waiting for process rollout_proc6 to join...
[2024-09-11 16:33:37,722][00309] Waiting for process rollout_proc7 to join...
[2024-09-11 16:33:37,724][00309] Batcher 0 profile tree view:
batching: 22.9736, releasing_batches: 0.0204
[2024-09-11 16:33:37,725][00309] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 460.5630
update_model: 8.5711
weight_update: 0.0012
one_step: 0.0026
handle_policy_step: 572.4623
deserialize: 15.7031, stack: 3.4890, obs_to_device_normalize: 129.4760, forward: 284.8172, send_messages: 23.2723
prepare_outputs: 85.5444
to_cpu: 53.3818
[2024-09-11 16:33:37,726][00309] Learner 0 profile tree view:
misc: 0.0063, prepare_batch: 14.6935
train: 69.9821
epoch_init: 0.0062, minibatch_init: 0.0074, losses_postprocess: 0.5184, kl_divergence: 0.4991, after_optimizer: 32.8391
calculate_losses: 22.5222
losses_init: 0.0034, forward_head: 1.6125, bptt_initial: 14.9126, tail: 0.8834, advantages_returns: 0.2257, losses: 2.6885
bptt: 1.9089
bptt_forward_core: 1.8352
update: 13.1073
clip: 1.3652
[2024-09-11 16:33:37,728][00309] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3901, enqueue_policy_requests: 216.1536, env_step: 749.1586, overhead: 16.5858, complete_rollouts: 4.4170
save_policy_outputs: 28.6281
split_output_tensors: 9.7136
[2024-09-11 16:33:37,730][00309] Loop Runner_EvtLoop terminating...
[2024-09-11 16:33:37,732][00309] Runner profile tree view:
main_loop: 1108.6477
[2024-09-11 16:33:37,733][00309] Collected {0: 4005888}, FPS: 3613.3
[2024-09-11 16:53:49,890][00309] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-11 16:53:49,892][00309] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-11 16:53:49,895][00309] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-11 16:53:49,897][00309] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-11 16:53:49,899][00309] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-11 16:53:49,900][00309] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-11 16:53:49,901][00309] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-09-11 16:53:49,902][00309] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-11 16:53:49,903][00309] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-09-11 16:53:49,904][00309] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-09-11 16:53:49,905][00309] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-11 16:53:49,907][00309] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-11 16:53:49,908][00309] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-11 16:53:49,909][00309] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-11 16:53:49,910][00309] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-11 16:53:49,929][00309] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-11 16:53:49,933][00309] RunningMeanStd input shape: (3, 72, 128)
[2024-09-11 16:53:49,935][00309] RunningMeanStd input shape: (1,)
[2024-09-11 16:53:49,950][00309] ConvEncoder: input_channels=3
[2024-09-11 16:53:50,073][00309] Conv encoder output size: 512
[2024-09-11 16:53:50,075][00309] Policy head output size: 512
[2024-09-11 16:53:51,736][00309] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-11 16:53:52,604][00309] Num frames 100...
[2024-09-11 16:53:52,724][00309] Num frames 200...
[2024-09-11 16:53:52,843][00309] Num frames 300...
[2024-09-11 16:53:52,963][00309] Num frames 400...
[2024-09-11 16:53:53,078][00309] Num frames 500...
[2024-09-11 16:53:53,194][00309] Num frames 600...
[2024-09-11 16:53:53,310][00309] Num frames 700...
[2024-09-11 16:53:53,428][00309] Num frames 800...
[2024-09-11 16:53:53,554][00309] Num frames 900...
[2024-09-11 16:53:53,674][00309] Num frames 1000...
[2024-09-11 16:53:53,800][00309] Num frames 1100...
[2024-09-11 16:53:53,895][00309] Avg episode rewards: #0: 23.250, true rewards: #0: 11.250
[2024-09-11 16:53:53,897][00309] Avg episode reward: 23.250, avg true_objective: 11.250
[2024-09-11 16:53:53,986][00309] Num frames 1200...
[2024-09-11 16:53:54,104][00309] Num frames 1300...
[2024-09-11 16:53:54,220][00309] Num frames 1400...
[2024-09-11 16:53:54,343][00309] Num frames 1500...
[2024-09-11 16:53:54,460][00309] Num frames 1600...
[2024-09-11 16:53:54,587][00309] Num frames 1700...
[2024-09-11 16:53:54,706][00309] Num frames 1800...
[2024-09-11 16:53:54,828][00309] Num frames 1900...
[2024-09-11 16:53:54,946][00309] Num frames 2000...
[2024-09-11 16:53:55,066][00309] Num frames 2100...
[2024-09-11 16:53:55,189][00309] Num frames 2200...
[2024-09-11 16:53:55,354][00309] Num frames 2300...
[2024-09-11 16:53:55,530][00309] Avg episode rewards: #0: 25.365, true rewards: #0: 11.865
[2024-09-11 16:53:55,532][00309] Avg episode reward: 25.365, avg true_objective: 11.865
[2024-09-11 16:53:55,581][00309] Num frames 2400...
[2024-09-11 16:53:55,742][00309] Num frames 2500...
[2024-09-11 16:53:55,901][00309] Num frames 2600...
[2024-09-11 16:53:56,062][00309] Num frames 2700...
[2024-09-11 16:53:56,219][00309] Num frames 2800...
[2024-09-11 16:53:56,381][00309] Num frames 2900...
[2024-09-11 16:53:56,548][00309] Num frames 3000...
[2024-09-11 16:53:56,673][00309] Avg episode rewards: #0: 21.457, true rewards: #0: 10.123
[2024-09-11 16:53:56,674][00309] Avg episode reward: 21.457, avg true_objective: 10.123
[2024-09-11 16:53:56,777][00309] Num frames 3100...
[2024-09-11 16:53:56,953][00309] Num frames 3200...
[2024-09-11 16:53:57,124][00309] Num frames 3300...
[2024-09-11 16:53:57,292][00309] Num frames 3400...
[2024-09-11 16:53:57,465][00309] Num frames 3500...
[2024-09-11 16:53:57,616][00309] Num frames 3600...
[2024-09-11 16:53:57,727][00309] Avg episode rewards: #0: 19.363, true rewards: #0: 9.112
[2024-09-11 16:53:57,729][00309] Avg episode reward: 19.363, avg true_objective: 9.112
[2024-09-11 16:53:57,798][00309] Num frames 3700...
[2024-09-11 16:53:57,917][00309] Num frames 3800...
[2024-09-11 16:53:58,038][00309] Num frames 3900...
[2024-09-11 16:53:58,156][00309] Num frames 4000...
[2024-09-11 16:53:58,272][00309] Num frames 4100...
[2024-09-11 16:53:58,390][00309] Num frames 4200...
[2024-09-11 16:53:58,513][00309] Num frames 4300...
[2024-09-11 16:53:58,643][00309] Num frames 4400...
[2024-09-11 16:53:58,760][00309] Num frames 4500...
[2024-09-11 16:53:58,885][00309] Num frames 4600...
[2024-09-11 16:53:59,012][00309] Avg episode rewards: #0: 19.916, true rewards: #0: 9.316
[2024-09-11 16:53:59,013][00309] Avg episode reward: 19.916, avg true_objective: 9.316
[2024-09-11 16:53:59,065][00309] Num frames 4700...
[2024-09-11 16:53:59,179][00309] Num frames 4800...
[2024-09-11 16:53:59,296][00309] Num frames 4900...
[2024-09-11 16:53:59,415][00309] Num frames 5000...
[2024-09-11 16:53:59,530][00309] Num frames 5100...
[2024-09-11 16:53:59,658][00309] Num frames 5200...
[2024-09-11 16:53:59,778][00309] Num frames 5300...
[2024-09-11 16:53:59,877][00309] Avg episode rewards: #0: 18.717, true rewards: #0: 8.883
[2024-09-11 16:53:59,879][00309] Avg episode reward: 18.717, avg true_objective: 8.883
[2024-09-11 16:53:59,963][00309] Num frames 5400...
[2024-09-11 16:54:00,080][00309] Num frames 5500...
[2024-09-11 16:54:00,200][00309] Num frames 5600...
[2024-09-11 16:54:00,317][00309] Num frames 5700...
[2024-09-11 16:54:00,433][00309] Num frames 5800...
[2024-09-11 16:54:00,552][00309] Num frames 5900...
[2024-09-11 16:54:00,615][00309] Avg episode rewards: #0: 17.580, true rewards: #0: 8.437
[2024-09-11 16:54:00,617][00309] Avg episode reward: 17.580, avg true_objective: 8.437
[2024-09-11 16:54:00,740][00309] Num frames 6000...
[2024-09-11 16:54:00,866][00309] Num frames 6100...
[2024-09-11 16:54:00,989][00309] Num frames 6200...
[2024-09-11 16:54:01,107][00309] Num frames 6300...
[2024-09-11 16:54:01,224][00309] Num frames 6400...
[2024-09-11 16:54:01,341][00309] Num frames 6500...
[2024-09-11 16:54:01,461][00309] Num frames 6600...
[2024-09-11 16:54:01,582][00309] Num frames 6700...
[2024-09-11 16:54:01,709][00309] Num frames 6800...
[2024-09-11 16:54:01,887][00309] Avg episode rewards: #0: 17.748, true rewards: #0: 8.622
[2024-09-11 16:54:01,889][00309] Avg episode reward: 17.748, avg true_objective: 8.622
[2024-09-11 16:54:01,893][00309] Num frames 6900...
[2024-09-11 16:54:02,009][00309] Num frames 7000...
[2024-09-11 16:54:02,125][00309] Num frames 7100...
[2024-09-11 16:54:02,244][00309] Num frames 7200...
[2024-09-11 16:54:02,359][00309] Num frames 7300...
[2024-09-11 16:54:02,475][00309] Num frames 7400...
[2024-09-11 16:54:02,591][00309] Num frames 7500...
[2024-09-11 16:54:02,715][00309] Num frames 7600...
[2024-09-11 16:54:02,777][00309] Avg episode rewards: #0: 17.002, true rewards: #0: 8.447
[2024-09-11 16:54:02,779][00309] Avg episode reward: 17.002, avg true_objective: 8.447
[2024-09-11 16:54:02,904][00309] Num frames 7700...
[2024-09-11 16:54:03,022][00309] Num frames 7800...
[2024-09-11 16:54:03,138][00309] Num frames 7900...
[2024-09-11 16:54:03,255][00309] Num frames 8000...
[2024-09-11 16:54:03,373][00309] Num frames 8100...
[2024-09-11 16:54:03,490][00309] Num frames 8200...
[2024-09-11 16:54:03,609][00309] Num frames 8300...
[2024-09-11 16:54:03,740][00309] Num frames 8400...
[2024-09-11 16:54:03,868][00309] Num frames 8500...
[2024-09-11 16:54:03,987][00309] Num frames 8600...
[2024-09-11 16:54:04,102][00309] Num frames 8700...
[2024-09-11 16:54:04,222][00309] Num frames 8800...
[2024-09-11 16:54:04,300][00309] Avg episode rewards: #0: 18.018, true rewards: #0: 8.818
[2024-09-11 16:54:04,301][00309] Avg episode reward: 18.018, avg true_objective: 8.818
[2024-09-11 16:54:56,815][00309] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-11 16:58:41,409][00309] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-11 16:58:41,411][00309] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-11 16:58:41,413][00309] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-11 16:58:41,415][00309] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-11 16:58:41,416][00309] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-11 16:58:41,417][00309] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-11 16:58:41,419][00309] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-11 16:58:41,420][00309] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-11 16:58:41,421][00309] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-11 16:58:41,422][00309] Adding new argument 'hf_repository'='lorenzorod88/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-11 16:58:41,423][00309] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-11 16:58:41,424][00309] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-11 16:58:41,425][00309] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-11 16:58:41,426][00309] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-11 16:58:41,428][00309] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-11 16:58:41,437][00309] RunningMeanStd input shape: (3, 72, 128)
[2024-09-11 16:58:41,446][00309] RunningMeanStd input shape: (1,)
[2024-09-11 16:58:41,463][00309] ConvEncoder: input_channels=3
[2024-09-11 16:58:41,502][00309] Conv encoder output size: 512
[2024-09-11 16:58:41,503][00309] Policy head output size: 512
[2024-09-11 16:58:41,521][00309] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-11 16:58:42,007][00309] Num frames 100...
[2024-09-11 16:58:42,122][00309] Num frames 200...
[2024-09-11 16:58:42,243][00309] Num frames 300...
[2024-09-11 16:58:42,359][00309] Num frames 400...
[2024-09-11 16:58:42,474][00309] Num frames 500...
[2024-09-11 16:58:42,603][00309] Num frames 600...
[2024-09-11 16:58:42,724][00309] Num frames 700...
[2024-09-11 16:58:42,849][00309] Num frames 800...
[2024-09-11 16:58:42,967][00309] Num frames 900...
[2024-09-11 16:58:43,087][00309] Num frames 1000...
[2024-09-11 16:58:43,206][00309] Num frames 1100...
[2024-09-11 16:58:43,331][00309] Num frames 1200...
[2024-09-11 16:58:43,449][00309] Num frames 1300...
[2024-09-11 16:58:43,575][00309] Num frames 1400...
[2024-09-11 16:58:43,641][00309] Avg episode rewards: #0: 40.080, true rewards: #0: 14.080
[2024-09-11 16:58:43,643][00309] Avg episode reward: 40.080, avg true_objective: 14.080
[2024-09-11 16:58:43,754][00309] Num frames 1500...
[2024-09-11 16:58:43,879][00309] Num frames 1600...
[2024-09-11 16:58:43,998][00309] Num frames 1700...
[2024-09-11 16:58:44,116][00309] Num frames 1800...
[2024-09-11 16:58:44,234][00309] Num frames 1900...
[2024-09-11 16:58:44,356][00309] Num frames 2000...
[2024-09-11 16:58:44,472][00309] Num frames 2100...
[2024-09-11 16:58:44,597][00309] Num frames 2200...
[2024-09-11 16:58:44,717][00309] Num frames 2300...
[2024-09-11 16:58:44,842][00309] Num frames 2400...
[2024-09-11 16:58:44,964][00309] Num frames 2500...
[2024-09-11 16:58:45,079][00309] Num frames 2600...
[2024-09-11 16:58:45,198][00309] Num frames 2700...
[2024-09-11 16:58:45,317][00309] Num frames 2800...
[2024-09-11 16:58:45,445][00309] Num frames 2900...
[2024-09-11 16:58:45,569][00309] Num frames 3000...
[2024-09-11 16:58:45,696][00309] Num frames 3100...
[2024-09-11 16:58:45,820][00309] Num frames 3200...
[2024-09-11 16:58:45,942][00309] Num frames 3300...
[2024-09-11 16:58:46,059][00309] Num frames 3400...
[2024-09-11 16:58:46,175][00309] Num frames 3500...
[2024-09-11 16:58:46,242][00309] Avg episode rewards: #0: 51.539, true rewards: #0: 17.540
[2024-09-11 16:58:46,244][00309] Avg episode reward: 51.539, avg true_objective: 17.540
[2024-09-11 16:58:46,351][00309] Num frames 3600...
[2024-09-11 16:58:46,468][00309] Num frames 3700...
[2024-09-11 16:58:46,584][00309] Num frames 3800...
[2024-09-11 16:58:46,706][00309] Num frames 3900...
[2024-09-11 16:58:46,830][00309] Num frames 4000...
[2024-09-11 16:58:46,946][00309] Num frames 4100...
[2024-09-11 16:58:47,067][00309] Num frames 4200...
[2024-09-11 16:58:47,223][00309] Num frames 4300...
[2024-09-11 16:58:47,389][00309] Num frames 4400...
[2024-09-11 16:58:47,548][00309] Num frames 4500...
[2024-09-11 16:58:47,722][00309] Num frames 4600...
[2024-09-11 16:58:47,889][00309] Num frames 4700...
[2024-09-11 16:58:48,010][00309] Avg episode rewards: #0: 43.776, true rewards: #0: 15.777
[2024-09-11 16:58:48,012][00309] Avg episode reward: 43.776, avg true_objective: 15.777
[2024-09-11 16:58:48,118][00309] Num frames 4800...
[2024-09-11 16:58:48,272][00309] Num frames 4900...
[2024-09-11 16:58:48,438][00309] Num frames 5000...
[2024-09-11 16:58:48,609][00309] Num frames 5100...
[2024-09-11 16:58:48,784][00309] Num frames 5200...
[2024-09-11 16:58:48,959][00309] Num frames 5300...
[2024-09-11 16:58:49,126][00309] Num frames 5400...
[2024-09-11 16:58:49,290][00309] Num frames 5500...
[2024-09-11 16:58:49,464][00309] Num frames 5600...
[2024-09-11 16:58:49,631][00309] Avg episode rewards: #0: 37.652, true rewards: #0: 14.152
[2024-09-11 16:58:49,632][00309] Avg episode reward: 37.652, avg true_objective: 14.152
[2024-09-11 16:58:49,684][00309] Num frames 5700...
[2024-09-11 16:58:49,815][00309] Num frames 5800...
[2024-09-11 16:58:49,933][00309] Num frames 5900...
[2024-09-11 16:58:50,058][00309] Num frames 6000...
[2024-09-11 16:58:50,180][00309] Num frames 6100...
[2024-09-11 16:58:50,300][00309] Num frames 6200...
[2024-09-11 16:58:50,420][00309] Num frames 6300...
[2024-09-11 16:58:50,542][00309] Num frames 6400...
[2024-09-11 16:58:50,658][00309] Num frames 6500...
[2024-09-11 16:58:50,800][00309] Num frames 6600...
[2024-09-11 16:58:50,927][00309] Num frames 6700...
[2024-09-11 16:58:51,047][00309] Num frames 6800...
[2024-09-11 16:58:51,164][00309] Num frames 6900...
[2024-09-11 16:58:51,281][00309] Num frames 7000...
[2024-09-11 16:58:51,417][00309] Num frames 7100...
[2024-09-11 16:58:51,538][00309] Num frames 7200...
[2024-09-11 16:58:51,608][00309] Avg episode rewards: #0: 36.824, true rewards: #0: 14.424
[2024-09-11 16:58:51,609][00309] Avg episode reward: 36.824, avg true_objective: 14.424
[2024-09-11 16:58:51,714][00309] Num frames 7300...
[2024-09-11 16:58:51,854][00309] Num frames 7400...
[2024-09-11 16:58:51,973][00309] Num frames 7500...
[2024-09-11 16:58:52,092][00309] Num frames 7600...
[2024-09-11 16:58:52,208][00309] Num frames 7700...
[2024-09-11 16:58:52,328][00309] Num frames 7800...
[2024-09-11 16:58:52,446][00309] Num frames 7900...
[2024-09-11 16:58:52,520][00309] Avg episode rewards: #0: 32.526, true rewards: #0: 13.193
[2024-09-11 16:58:52,523][00309] Avg episode reward: 32.526, avg true_objective: 13.193
[2024-09-11 16:58:52,623][00309] Num frames 8000...
[2024-09-11 16:58:52,740][00309] Num frames 8100...
[2024-09-11 16:58:52,873][00309] Num frames 8200...
[2024-09-11 16:58:53,020][00309] Num frames 8300...
[2024-09-11 16:58:53,140][00309] Num frames 8400...
[2024-09-11 16:58:53,258][00309] Num frames 8500...
[2024-09-11 16:58:53,380][00309] Num frames 8600...
[2024-09-11 16:58:53,536][00309] Avg episode rewards: #0: 29.834, true rewards: #0: 12.406
[2024-09-11 16:58:53,537][00309] Avg episode reward: 29.834, avg true_objective: 12.406
[2024-09-11 16:58:53,563][00309] Num frames 8700...
[2024-09-11 16:58:53,681][00309] Num frames 8800...
[2024-09-11 16:58:53,812][00309] Num frames 8900...
[2024-09-11 16:58:53,939][00309] Num frames 9000...
[2024-09-11 16:58:54,061][00309] Num frames 9100...
[2024-09-11 16:58:54,183][00309] Num frames 9200...
[2024-09-11 16:58:54,305][00309] Num frames 9300...
[2024-09-11 16:58:54,427][00309] Num frames 9400...
[2024-09-11 16:58:54,548][00309] Num frames 9500...
[2024-09-11 16:58:54,701][00309] Avg episode rewards: #0: 28.725, true rewards: #0: 11.975
[2024-09-11 16:58:54,703][00309] Avg episode reward: 28.725, avg true_objective: 11.975
[2024-09-11 16:58:54,730][00309] Num frames 9600...
[2024-09-11 16:58:54,863][00309] Num frames 9700...
[2024-09-11 16:58:54,983][00309] Num frames 9800...
[2024-09-11 16:58:55,103][00309] Num frames 9900...
[2024-09-11 16:58:55,155][00309] Avg episode rewards: #0: 26.000, true rewards: #0: 11.000
[2024-09-11 16:58:55,156][00309] Avg episode reward: 26.000, avg true_objective: 11.000
[2024-09-11 16:58:55,281][00309] Num frames 10000...
[2024-09-11 16:58:55,398][00309] Num frames 10100...
[2024-09-11 16:58:55,522][00309] Num frames 10200...
[2024-09-11 16:58:55,641][00309] Num frames 10300...
[2024-09-11 16:58:55,764][00309] Num frames 10400...
[2024-09-11 16:58:55,897][00309] Num frames 10500...
[2024-09-11 16:58:55,962][00309] Avg episode rewards: #0: 24.308, true rewards: #0: 10.508
[2024-09-11 16:58:55,964][00309] Avg episode reward: 24.308, avg true_objective: 10.508
[2024-09-11 16:59:55,146][00309] Replay video saved to /content/train_dir/default_experiment/replay.mp4!