[2024-11-09 11:01:48,563][01612] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-09 11:01:48,565][01612] Rollout worker 0 uses device cpu [2024-11-09 11:01:48,566][01612] Rollout worker 1 uses device cpu [2024-11-09 11:01:48,567][01612] Rollout worker 2 uses device cpu [2024-11-09 11:01:48,570][01612] Rollout worker 3 uses device cpu [2024-11-09 11:01:48,571][01612] Rollout worker 4 uses device cpu [2024-11-09 11:01:48,572][01612] Rollout worker 5 uses device cpu [2024-11-09 11:01:48,573][01612] Rollout worker 6 uses device cpu [2024-11-09 11:01:48,574][01612] Rollout worker 7 uses device cpu [2024-11-09 11:01:48,766][01612] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-09 11:01:48,771][01612] InferenceWorker_p0-w0: min num requests: 2 [2024-11-09 11:01:48,814][01612] Starting all processes... [2024-11-09 11:01:48,818][01612] Starting process learner_proc0 [2024-11-09 11:01:48,884][01612] Starting all processes... [2024-11-09 11:01:48,905][01612] Starting process inference_proc0-0 [2024-11-09 11:01:48,907][01612] Starting process rollout_proc0 [2024-11-09 11:01:48,907][01612] Starting process rollout_proc1 [2024-11-09 11:01:48,907][01612] Starting process rollout_proc2 [2024-11-09 11:01:48,907][01612] Starting process rollout_proc3 [2024-11-09 11:01:48,907][01612] Starting process rollout_proc4 [2024-11-09 11:01:48,907][01612] Starting process rollout_proc5 [2024-11-09 11:01:48,907][01612] Starting process rollout_proc6 [2024-11-09 11:01:48,907][01612] Starting process rollout_proc7 [2024-11-09 11:02:07,961][03311] Worker 3 uses CPU cores [1] [2024-11-09 11:02:08,200][03294] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-09 11:02:08,205][03294] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-09 11:02:08,294][03307] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-09 11:02:08,298][03307] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-09 11:02:08,301][03294] Num visible devices: 1 [2024-11-09 11:02:08,346][03294] Starting seed is not provided [2024-11-09 11:02:08,346][03294] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-09 11:02:08,346][03294] Initializing actor-critic model on device cuda:0 [2024-11-09 11:02:08,347][03294] RunningMeanStd input shape: (3, 72, 128) [2024-11-09 11:02:08,350][03294] RunningMeanStd input shape: (1,) [2024-11-09 11:02:08,357][03310] Worker 2 uses CPU cores [0] [2024-11-09 11:02:08,362][03308] Worker 0 uses CPU cores [0] [2024-11-09 11:02:08,414][03307] Num visible devices: 1 [2024-11-09 11:02:08,431][03294] ConvEncoder: input_channels=3 [2024-11-09 11:02:08,585][03315] Worker 7 uses CPU cores [1] [2024-11-09 11:02:08,611][03309] Worker 1 uses CPU cores [1] [2024-11-09 11:02:08,664][03312] Worker 4 uses CPU cores [0] [2024-11-09 11:02:08,767][01612] Heartbeat connected on Batcher_0 [2024-11-09 11:02:08,773][01612] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-09 11:02:08,780][01612] Heartbeat connected on RolloutWorker_w0 [2024-11-09 11:02:08,785][01612] Heartbeat connected on RolloutWorker_w1 [2024-11-09 11:02:08,790][01612] Heartbeat connected on RolloutWorker_w2 [2024-11-09 11:02:08,795][01612] Heartbeat connected on RolloutWorker_w3 [2024-11-09 11:02:08,800][01612] Heartbeat connected on RolloutWorker_w4 [2024-11-09 11:02:08,813][01612] Heartbeat connected on RolloutWorker_w7 [2024-11-09 11:02:08,834][03313] Worker 5 uses CPU cores [1] [2024-11-09 11:02:08,923][01612] Heartbeat connected on RolloutWorker_w5 [2024-11-09 11:02:08,937][03314] Worker 6 uses CPU cores [0] [2024-11-09 11:02:08,972][01612] Heartbeat connected on RolloutWorker_w6 [2024-11-09 11:02:09,012][03294] Conv encoder output size: 512 [2024-11-09 11:02:09,013][03294] Policy head output size: 512 [2024-11-09 11:02:09,083][03294] Created Actor Critic model with architecture: [2024-11-09 11:02:09,084][03294] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-09 11:02:09,544][03294] Using optimizer [2024-11-09 11:02:15,204][03294] No checkpoints found [2024-11-09 11:02:15,204][03294] Did not load from checkpoint, starting from scratch! [2024-11-09 11:02:15,205][03294] Initialized policy 0 weights for model version 0 [2024-11-09 11:02:15,208][03294] LearnerWorker_p0 finished initialization! [2024-11-09 11:02:15,213][03294] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-09 11:02:15,209][01612] Heartbeat connected on LearnerWorker_p0 [2024-11-09 11:02:15,327][03307] RunningMeanStd input shape: (3, 72, 128) [2024-11-09 11:02:15,328][03307] RunningMeanStd input shape: (1,) [2024-11-09 11:02:15,349][03307] ConvEncoder: input_channels=3 [2024-11-09 11:02:15,523][03307] Conv encoder output size: 512 [2024-11-09 11:02:15,524][03307] Policy head output size: 512 [2024-11-09 11:02:15,600][01612] Inference worker 0-0 is ready! [2024-11-09 11:02:15,602][01612] All inference workers are ready! Signal rollout workers to start! [2024-11-09 11:02:15,863][03315] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:02:15,860][03311] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:02:15,864][03309] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:02:15,861][03313] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:02:15,938][03310] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:02:15,946][03314] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:02:15,943][03312] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:02:15,949][03308] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:02:18,115][01612] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-09 11:02:18,155][03314] Decorrelating experience for 0 frames... [2024-11-09 11:02:18,156][03310] Decorrelating experience for 0 frames... [2024-11-09 11:02:18,157][03312] Decorrelating experience for 0 frames... [2024-11-09 11:02:18,289][03309] Decorrelating experience for 0 frames... [2024-11-09 11:02:18,288][03313] Decorrelating experience for 0 frames... [2024-11-09 11:02:18,296][03311] Decorrelating experience for 0 frames... [2024-11-09 11:02:18,292][03315] Decorrelating experience for 0 frames... [2024-11-09 11:02:19,675][03309] Decorrelating experience for 32 frames... [2024-11-09 11:02:19,678][03313] Decorrelating experience for 32 frames... [2024-11-09 11:02:19,932][03308] Decorrelating experience for 0 frames... [2024-11-09 11:02:20,010][03314] Decorrelating experience for 32 frames... [2024-11-09 11:02:20,185][03310] Decorrelating experience for 32 frames... [2024-11-09 11:02:20,183][03312] Decorrelating experience for 32 frames... [2024-11-09 11:02:21,374][03314] Decorrelating experience for 64 frames... [2024-11-09 11:02:21,474][03312] Decorrelating experience for 64 frames... [2024-11-09 11:02:21,671][03311] Decorrelating experience for 32 frames... [2024-11-09 11:02:21,818][03315] Decorrelating experience for 32 frames... [2024-11-09 11:02:22,109][03313] Decorrelating experience for 64 frames... [2024-11-09 11:02:22,113][03309] Decorrelating experience for 64 frames... [2024-11-09 11:02:22,301][03314] Decorrelating experience for 96 frames... [2024-11-09 11:02:22,888][03308] Decorrelating experience for 32 frames... [2024-11-09 11:02:23,114][01612] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-09 11:02:23,177][03312] Decorrelating experience for 96 frames... [2024-11-09 11:02:23,294][03311] Decorrelating experience for 64 frames... [2024-11-09 11:02:23,392][03309] Decorrelating experience for 96 frames... [2024-11-09 11:02:23,642][03310] Decorrelating experience for 64 frames... [2024-11-09 11:02:23,860][03313] Decorrelating experience for 96 frames... [2024-11-09 11:02:24,361][03311] Decorrelating experience for 96 frames... [2024-11-09 11:02:24,666][03308] Decorrelating experience for 64 frames... [2024-11-09 11:02:24,963][03310] Decorrelating experience for 96 frames... [2024-11-09 11:02:27,611][03315] Decorrelating experience for 64 frames... [2024-11-09 11:02:27,904][03308] Decorrelating experience for 96 frames... [2024-11-09 11:02:28,114][01612] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 216.0. Samples: 2160. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-09 11:02:28,121][01612] Avg episode reward: [(0, '2.765')] [2024-11-09 11:02:28,185][03294] Signal inference workers to stop experience collection... [2024-11-09 11:02:28,222][03307] InferenceWorker_p0-w0: stopping experience collection [2024-11-09 11:02:28,593][03315] Decorrelating experience for 96 frames... [2024-11-09 11:02:31,074][03294] Signal inference workers to resume experience collection... [2024-11-09 11:02:31,074][03307] InferenceWorker_p0-w0: resuming experience collection [2024-11-09 11:02:33,115][01612] Fps is (10 sec: 819.1, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 8192. Throughput: 0: 169.7. Samples: 2546. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-09 11:02:33,117][01612] Avg episode reward: [(0, '3.129')] [2024-11-09 11:02:38,114][01612] Fps is (10 sec: 2457.6, 60 sec: 1228.9, 300 sec: 1228.9). Total num frames: 24576. Throughput: 0: 288.7. Samples: 5774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:02:38,117][01612] Avg episode reward: [(0, '3.600')] [2024-11-09 11:02:41,812][03307] Updated weights for policy 0, policy_version 10 (0.0027) [2024-11-09 11:02:43,117][01612] Fps is (10 sec: 3685.5, 60 sec: 1802.1, 300 sec: 1802.1). Total num frames: 45056. Throughput: 0: 464.9. Samples: 11624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:02:43,120][01612] Avg episode reward: [(0, '4.216')] [2024-11-09 11:02:48,114][01612] Fps is (10 sec: 4505.6, 60 sec: 2321.2, 300 sec: 2321.2). Total num frames: 69632. Throughput: 0: 503.8. Samples: 15112. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-09 11:02:48,116][01612] Avg episode reward: [(0, '4.607')] [2024-11-09 11:02:52,273][03307] Updated weights for policy 0, policy_version 20 (0.0018) [2024-11-09 11:02:53,118][01612] Fps is (10 sec: 3686.1, 60 sec: 2340.4, 300 sec: 2340.4). Total num frames: 81920. Throughput: 0: 587.7. Samples: 20570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:02:53,121][01612] Avg episode reward: [(0, '4.515')] [2024-11-09 11:02:58,114][01612] Fps is (10 sec: 3276.8, 60 sec: 2560.1, 300 sec: 2560.1). Total num frames: 102400. Throughput: 0: 642.9. Samples: 25714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:02:58,116][01612] Avg episode reward: [(0, '4.492')] [2024-11-09 11:02:58,120][03294] Saving new best policy, reward=4.492! [2024-11-09 11:03:02,686][03307] Updated weights for policy 0, policy_version 30 (0.0020) [2024-11-09 11:03:03,114][01612] Fps is (10 sec: 4097.6, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 647.7. Samples: 29148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:03:03,119][01612] Avg episode reward: [(0, '4.486')] [2024-11-09 11:03:08,114][01612] Fps is (10 sec: 3686.4, 60 sec: 2785.4, 300 sec: 2785.4). Total num frames: 139264. Throughput: 0: 780.8. Samples: 35136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:03:08,116][01612] Avg episode reward: [(0, '4.365')] [2024-11-09 11:03:13,114][01612] Fps is (10 sec: 3276.9, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 825.7. Samples: 39318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:03:13,121][01612] Avg episode reward: [(0, '4.374')] [2024-11-09 11:03:14,775][03307] Updated weights for policy 0, policy_version 40 (0.0031) [2024-11-09 11:03:18,115][01612] Fps is (10 sec: 3685.9, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 176128. Throughput: 0: 888.4. Samples: 42526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:03:18,120][01612] Avg episode reward: [(0, '4.342')] [2024-11-09 11:03:23,116][01612] Fps is (10 sec: 4504.8, 60 sec: 3345.0, 300 sec: 3087.7). Total num frames: 200704. Throughput: 0: 972.4. Samples: 49532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:03:23,122][01612] Avg episode reward: [(0, '4.256')] [2024-11-09 11:03:23,739][03307] Updated weights for policy 0, policy_version 50 (0.0025) [2024-11-09 11:03:28,114][01612] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3042.8). Total num frames: 212992. Throughput: 0: 948.7. Samples: 54314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:03:28,122][01612] Avg episode reward: [(0, '4.246')] [2024-11-09 11:03:33,114][01612] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3113.0). Total num frames: 233472. Throughput: 0: 924.7. Samples: 56722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:03:33,120][01612] Avg episode reward: [(0, '4.276')] [2024-11-09 11:03:35,090][03307] Updated weights for policy 0, policy_version 60 (0.0033) [2024-11-09 11:03:38,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3225.7). Total num frames: 258048. Throughput: 0: 959.5. Samples: 63744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:03:38,121][01612] Avg episode reward: [(0, '4.344')] [2024-11-09 11:03:43,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3228.7). Total num frames: 274432. Throughput: 0: 965.6. Samples: 69164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:03:43,116][01612] Avg episode reward: [(0, '4.419')] [2024-11-09 11:03:43,128][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth... [2024-11-09 11:03:47,090][03307] Updated weights for policy 0, policy_version 70 (0.0022) [2024-11-09 11:03:48,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3231.3). Total num frames: 290816. Throughput: 0: 935.4. Samples: 71242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:03:48,118][01612] Avg episode reward: [(0, '4.631')] [2024-11-09 11:03:48,124][03294] Saving new best policy, reward=4.631! [2024-11-09 11:03:53,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3276.8). Total num frames: 311296. Throughput: 0: 938.6. Samples: 77374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:03:53,118][01612] Avg episode reward: [(0, '4.664')] [2024-11-09 11:03:53,129][03294] Saving new best policy, reward=4.664! [2024-11-09 11:03:56,237][03307] Updated weights for policy 0, policy_version 80 (0.0015) [2024-11-09 11:03:58,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3317.8). Total num frames: 331776. Throughput: 0: 991.6. Samples: 83942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:03:58,117][01612] Avg episode reward: [(0, '4.247')] [2024-11-09 11:04:03,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3315.9). Total num frames: 348160. Throughput: 0: 965.4. Samples: 85968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:04:03,119][01612] Avg episode reward: [(0, '4.307')] [2024-11-09 11:04:08,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3314.1). Total num frames: 364544. Throughput: 0: 920.6. Samples: 90958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:04:08,120][01612] Avg episode reward: [(0, '4.335')] [2024-11-09 11:04:08,169][03307] Updated weights for policy 0, policy_version 90 (0.0015) [2024-11-09 11:04:13,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 964.6. Samples: 97722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:04:13,116][01612] Avg episode reward: [(0, '4.409')] [2024-11-09 11:04:18,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 974.9. Samples: 100592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:04:18,118][01612] Avg episode reward: [(0, '4.463')] [2024-11-09 11:04:18,845][03307] Updated weights for policy 0, policy_version 100 (0.0027) [2024-11-09 11:04:23,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3375.1). Total num frames: 421888. Throughput: 0: 911.7. Samples: 104770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:04:23,122][01612] Avg episode reward: [(0, '4.561')] [2024-11-09 11:04:28,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3402.9). Total num frames: 442368. Throughput: 0: 943.3. Samples: 111612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:04:28,116][01612] Avg episode reward: [(0, '4.656')] [2024-11-09 11:04:29,198][03307] Updated weights for policy 0, policy_version 110 (0.0030) [2024-11-09 11:04:33,116][01612] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 970.8. Samples: 114928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:04:33,125][01612] Avg episode reward: [(0, '4.648')] [2024-11-09 11:04:38,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3423.1). Total num frames: 479232. Throughput: 0: 936.4. Samples: 119514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:04:38,117][01612] Avg episode reward: [(0, '4.613')] [2024-11-09 11:04:41,109][03307] Updated weights for policy 0, policy_version 120 (0.0015) [2024-11-09 11:04:43,114][01612] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3446.3). Total num frames: 499712. Throughput: 0: 914.3. Samples: 125086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:04:43,121][01612] Avg episode reward: [(0, '4.636')] [2024-11-09 11:04:48,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3468.0). Total num frames: 520192. Throughput: 0: 946.5. Samples: 128562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:04:48,120][01612] Avg episode reward: [(0, '4.550')] [2024-11-09 11:04:50,929][03307] Updated weights for policy 0, policy_version 130 (0.0016) [2024-11-09 11:04:53,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3461.8). Total num frames: 536576. Throughput: 0: 963.2. Samples: 134302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:04:53,120][01612] Avg episode reward: [(0, '4.433')] [2024-11-09 11:04:58,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3456.0). Total num frames: 552960. Throughput: 0: 918.4. Samples: 139052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:04:58,116][01612] Avg episode reward: [(0, '4.369')] [2024-11-09 11:05:01,940][03307] Updated weights for policy 0, policy_version 140 (0.0027) [2024-11-09 11:05:03,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3500.2). Total num frames: 577536. Throughput: 0: 932.6. Samples: 142560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:05:03,116][01612] Avg episode reward: [(0, '4.361')] [2024-11-09 11:05:08,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3517.8). Total num frames: 598016. Throughput: 0: 987.7. Samples: 149216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:05:08,116][01612] Avg episode reward: [(0, '4.409')] [2024-11-09 11:05:13,115][01612] Fps is (10 sec: 2866.9, 60 sec: 3618.1, 300 sec: 3464.1). Total num frames: 606208. Throughput: 0: 911.9. Samples: 152648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:05:13,123][01612] Avg episode reward: [(0, '4.239')] [2024-11-09 11:05:15,202][03307] Updated weights for policy 0, policy_version 150 (0.0023) [2024-11-09 11:05:18,114][01612] Fps is (10 sec: 2048.0, 60 sec: 3549.9, 300 sec: 3436.1). Total num frames: 618496. Throughput: 0: 875.4. Samples: 154318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:05:18,116][01612] Avg episode reward: [(0, '4.422')] [2024-11-09 11:05:23,114][01612] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3476.1). Total num frames: 643072. Throughput: 0: 898.9. Samples: 159966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:05:23,120][01612] Avg episode reward: [(0, '4.414')] [2024-11-09 11:05:25,406][03307] Updated weights for policy 0, policy_version 160 (0.0018) [2024-11-09 11:05:28,116][01612] Fps is (10 sec: 4504.6, 60 sec: 3686.3, 300 sec: 3492.4). Total num frames: 663552. Throughput: 0: 928.0. Samples: 166848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:05:28,123][01612] Avg episode reward: [(0, '4.453')] [2024-11-09 11:05:33,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3465.9). Total num frames: 675840. Throughput: 0: 894.7. Samples: 168824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:05:33,122][01612] Avg episode reward: [(0, '4.509')] [2024-11-09 11:05:37,234][03307] Updated weights for policy 0, policy_version 170 (0.0025) [2024-11-09 11:05:38,114][01612] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3481.6). Total num frames: 696320. Throughput: 0: 879.8. Samples: 173892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:05:38,116][01612] Avg episode reward: [(0, '4.558')] [2024-11-09 11:05:43,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3516.6). Total num frames: 720896. Throughput: 0: 923.0. Samples: 180588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:05:43,116][01612] Avg episode reward: [(0, '4.401')] [2024-11-09 11:05:43,125][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000176_720896.pth... [2024-11-09 11:05:47,621][03307] Updated weights for policy 0, policy_version 180 (0.0043) [2024-11-09 11:05:48,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3510.9). Total num frames: 737280. Throughput: 0: 909.8. Samples: 183500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:05:48,117][01612] Avg episode reward: [(0, '4.437')] [2024-11-09 11:05:53,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3505.4). Total num frames: 753664. Throughput: 0: 854.3. Samples: 187660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:05:53,122][01612] Avg episode reward: [(0, '4.681')] [2024-11-09 11:05:53,132][03294] Saving new best policy, reward=4.681! [2024-11-09 11:05:58,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3518.9). Total num frames: 774144. Throughput: 0: 924.5. Samples: 194248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:05:58,122][01612] Avg episode reward: [(0, '4.940')] [2024-11-09 11:05:58,128][03294] Saving new best policy, reward=4.940! [2024-11-09 11:05:58,482][03307] Updated weights for policy 0, policy_version 190 (0.0023) [2024-11-09 11:06:03,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3531.7). Total num frames: 794624. Throughput: 0: 963.5. Samples: 197676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:06:03,116][01612] Avg episode reward: [(0, '4.754')] [2024-11-09 11:06:08,116][01612] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3508.3). Total num frames: 806912. Throughput: 0: 940.6. Samples: 202294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:06:08,122][01612] Avg episode reward: [(0, '4.650')] [2024-11-09 11:06:10,520][03307] Updated weights for policy 0, policy_version 200 (0.0020) [2024-11-09 11:06:13,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3538.3). Total num frames: 831488. Throughput: 0: 909.8. Samples: 207786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:06:13,117][01612] Avg episode reward: [(0, '4.684')] [2024-11-09 11:06:18,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3549.9). Total num frames: 851968. Throughput: 0: 942.6. Samples: 211242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:06:18,117][01612] Avg episode reward: [(0, '4.653')] [2024-11-09 11:06:19,690][03307] Updated weights for policy 0, policy_version 210 (0.0018) [2024-11-09 11:06:23,114][01612] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3544.3). Total num frames: 868352. Throughput: 0: 957.6. Samples: 216984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:06:23,118][01612] Avg episode reward: [(0, '4.533')] [2024-11-09 11:06:28,114][01612] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3539.0). Total num frames: 884736. Throughput: 0: 912.4. Samples: 221648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:06:28,116][01612] Avg episode reward: [(0, '4.530')] [2024-11-09 11:06:31,523][03307] Updated weights for policy 0, policy_version 220 (0.0039) [2024-11-09 11:06:33,114][01612] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 905216. Throughput: 0: 921.5. Samples: 224966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:06:33,117][01612] Avg episode reward: [(0, '4.561')] [2024-11-09 11:06:38,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3560.4). Total num frames: 925696. Throughput: 0: 980.4. Samples: 231780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:06:38,116][01612] Avg episode reward: [(0, '4.563')] [2024-11-09 11:06:42,999][03307] Updated weights for policy 0, policy_version 230 (0.0021) [2024-11-09 11:06:43,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3555.0). Total num frames: 942080. Throughput: 0: 922.8. Samples: 235776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:06:43,117][01612] Avg episode reward: [(0, '4.600')] [2024-11-09 11:06:48,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3565.1). Total num frames: 962560. Throughput: 0: 910.1. Samples: 238630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:06:48,120][01612] Avg episode reward: [(0, '4.517')] [2024-11-09 11:06:52,276][03307] Updated weights for policy 0, policy_version 240 (0.0021) [2024-11-09 11:06:53,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3574.7). Total num frames: 983040. Throughput: 0: 958.9. Samples: 245444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:06:53,116][01612] Avg episode reward: [(0, '4.633')] [2024-11-09 11:06:58,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3569.4). Total num frames: 999424. Throughput: 0: 951.2. Samples: 250590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:06:58,120][01612] Avg episode reward: [(0, '4.581')] [2024-11-09 11:07:03,114][01612] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3564.3). Total num frames: 1015808. Throughput: 0: 922.7. Samples: 252762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-09 11:07:03,116][01612] Avg episode reward: [(0, '4.404')] [2024-11-09 11:07:04,320][03307] Updated weights for policy 0, policy_version 250 (0.0013) [2024-11-09 11:07:08,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3587.5). Total num frames: 1040384. Throughput: 0: 943.3. Samples: 259434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:07:08,117][01612] Avg episode reward: [(0, '4.735')] [2024-11-09 11:07:13,114][01612] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3596.2). Total num frames: 1060864. Throughput: 0: 975.1. Samples: 265526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:07:13,120][01612] Avg episode reward: [(0, '4.778')] [2024-11-09 11:07:14,449][03307] Updated weights for policy 0, policy_version 260 (0.0013) [2024-11-09 11:07:18,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 946.5. Samples: 267560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:07:18,122][01612] Avg episode reward: [(0, '4.618')] [2024-11-09 11:07:23,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1093632. Throughput: 0: 914.2. Samples: 272918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:07:23,119][01612] Avg episode reward: [(0, '4.632')] [2024-11-09 11:07:25,124][03307] Updated weights for policy 0, policy_version 270 (0.0027) [2024-11-09 11:07:28,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 1118208. Throughput: 0: 979.2. Samples: 279842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:07:28,121][01612] Avg episode reward: [(0, '4.788')] [2024-11-09 11:07:33,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1130496. Throughput: 0: 972.4. Samples: 282386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:07:33,117][01612] Avg episode reward: [(0, '5.106')] [2024-11-09 11:07:33,130][03294] Saving new best policy, reward=5.106! [2024-11-09 11:07:37,049][03307] Updated weights for policy 0, policy_version 280 (0.0028) [2024-11-09 11:07:38,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1150976. Throughput: 0: 919.9. Samples: 286838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:07:38,118][01612] Avg episode reward: [(0, '5.261')] [2024-11-09 11:07:38,124][03294] Saving new best policy, reward=5.261! [2024-11-09 11:07:43,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1171456. Throughput: 0: 952.2. Samples: 293440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:07:43,117][01612] Avg episode reward: [(0, '4.951')] [2024-11-09 11:07:43,129][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000286_1171456.pth... [2024-11-09 11:07:43,254][03294] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth [2024-11-09 11:07:46,506][03307] Updated weights for policy 0, policy_version 290 (0.0019) [2024-11-09 11:07:48,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1191936. Throughput: 0: 976.2. Samples: 296690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:07:48,119][01612] Avg episode reward: [(0, '4.861')] [2024-11-09 11:07:53,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1204224. Throughput: 0: 922.2. Samples: 300932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:07:53,119][01612] Avg episode reward: [(0, '4.827')] [2024-11-09 11:07:58,016][03307] Updated weights for policy 0, policy_version 300 (0.0023) [2024-11-09 11:07:58,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1228800. Throughput: 0: 926.2. Samples: 307206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:07:58,116][01612] Avg episode reward: [(0, '5.037')] [2024-11-09 11:08:03,114][01612] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 1249280. Throughput: 0: 957.4. Samples: 310644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:08:03,117][01612] Avg episode reward: [(0, '5.292')] [2024-11-09 11:08:03,132][03294] Saving new best policy, reward=5.292! [2024-11-09 11:08:08,117][01612] Fps is (10 sec: 3275.7, 60 sec: 3686.2, 300 sec: 3748.8). Total num frames: 1261568. Throughput: 0: 951.7. Samples: 315748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:08:08,119][01612] Avg episode reward: [(0, '5.314')] [2024-11-09 11:08:08,123][03294] Saving new best policy, reward=5.314! [2024-11-09 11:08:09,821][03307] Updated weights for policy 0, policy_version 310 (0.0050) [2024-11-09 11:08:13,114][01612] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1282048. Throughput: 0: 913.4. Samples: 320946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:08:13,116][01612] Avg episode reward: [(0, '5.464')] [2024-11-09 11:08:13,123][03294] Saving new best policy, reward=5.464! [2024-11-09 11:08:18,116][01612] Fps is (10 sec: 4096.3, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 1302528. Throughput: 0: 930.8. Samples: 324274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:08:18,121][01612] Avg episode reward: [(0, '5.211')] [2024-11-09 11:08:19,138][03307] Updated weights for policy 0, policy_version 320 (0.0017) [2024-11-09 11:08:23,114][01612] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1323008. Throughput: 0: 968.4. Samples: 330418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:08:23,117][01612] Avg episode reward: [(0, '4.970')] [2024-11-09 11:08:28,114][01612] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 1335296. Throughput: 0: 917.7. Samples: 334736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:08:28,116][01612] Avg episode reward: [(0, '4.954')] [2024-11-09 11:08:30,940][03307] Updated weights for policy 0, policy_version 330 (0.0019) [2024-11-09 11:08:33,114][01612] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1359872. Throughput: 0: 922.0. Samples: 338178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:08:33,117][01612] Avg episode reward: [(0, '4.983')] [2024-11-09 11:08:38,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1380352. Throughput: 0: 981.6. Samples: 345104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:08:38,119][01612] Avg episode reward: [(0, '5.013')] [2024-11-09 11:08:41,530][03307] Updated weights for policy 0, policy_version 340 (0.0026) [2024-11-09 11:08:43,114][01612] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1392640. Throughput: 0: 937.1. Samples: 349374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:08:43,116][01612] Avg episode reward: [(0, '5.035')] [2024-11-09 11:08:48,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1413120. Throughput: 0: 912.0. Samples: 351682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:08:48,120][01612] Avg episode reward: [(0, '4.996')] [2024-11-09 11:08:51,968][03307] Updated weights for policy 0, policy_version 350 (0.0023) [2024-11-09 11:08:53,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1437696. Throughput: 0: 949.5. Samples: 358474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:08:53,116][01612] Avg episode reward: [(0, '5.288')] [2024-11-09 11:08:58,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1454080. Throughput: 0: 962.1. Samples: 364240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:08:58,123][01612] Avg episode reward: [(0, '5.133')] [2024-11-09 11:09:03,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1470464. Throughput: 0: 935.4. Samples: 366366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:09:03,116][01612] Avg episode reward: [(0, '5.577')] [2024-11-09 11:09:03,131][03294] Saving new best policy, reward=5.577! [2024-11-09 11:09:04,035][03307] Updated weights for policy 0, policy_version 360 (0.0019) [2024-11-09 11:09:08,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3721.1). Total num frames: 1486848. Throughput: 0: 924.6. Samples: 372024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:09:08,119][01612] Avg episode reward: [(0, '5.520')] [2024-11-09 11:09:13,114][01612] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1499136. Throughput: 0: 919.2. Samples: 376098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:09:13,116][01612] Avg episode reward: [(0, '5.431')] [2024-11-09 11:09:17,866][03307] Updated weights for policy 0, policy_version 370 (0.0030) [2024-11-09 11:09:18,114][01612] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3707.2). Total num frames: 1515520. Throughput: 0: 883.7. Samples: 377944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:09:18,120][01612] Avg episode reward: [(0, '4.993')] [2024-11-09 11:09:23,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3693.3). Total num frames: 1531904. Throughput: 0: 833.1. Samples: 382592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:09:23,116][01612] Avg episode reward: [(0, '5.178')] [2024-11-09 11:09:27,908][03307] Updated weights for policy 0, policy_version 380 (0.0021) [2024-11-09 11:09:28,114][01612] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1556480. Throughput: 0: 889.6. Samples: 389406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:09:28,117][01612] Avg episode reward: [(0, '5.118')] [2024-11-09 11:09:33,114][01612] Fps is (10 sec: 4095.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 1572864. Throughput: 0: 912.8. Samples: 392758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:09:33,122][01612] Avg episode reward: [(0, '5.129')] [2024-11-09 11:09:38,114][01612] Fps is (10 sec: 3277.0, 60 sec: 3481.6, 300 sec: 3693.3). Total num frames: 1589248. Throughput: 0: 853.6. Samples: 396884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:09:38,121][01612] Avg episode reward: [(0, '5.493')] [2024-11-09 11:09:39,731][03307] Updated weights for policy 0, policy_version 390 (0.0027) [2024-11-09 11:09:43,114][01612] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1609728. Throughput: 0: 862.7. Samples: 403060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:09:43,121][01612] Avg episode reward: [(0, '5.809')] [2024-11-09 11:09:43,131][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000393_1609728.pth... [2024-11-09 11:09:43,296][03294] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000176_720896.pth [2024-11-09 11:09:43,311][03294] Saving new best policy, reward=5.809! [2024-11-09 11:09:48,116][01612] Fps is (10 sec: 4095.0, 60 sec: 3618.0, 300 sec: 3707.2). Total num frames: 1630208. Throughput: 0: 885.4. Samples: 406210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:09:48,118][01612] Avg episode reward: [(0, '5.510')] [2024-11-09 11:09:49,804][03307] Updated weights for policy 0, policy_version 400 (0.0024) [2024-11-09 11:09:53,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 1646592. Throughput: 0: 873.6. Samples: 411334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:09:53,121][01612] Avg episode reward: [(0, '5.699')] [2024-11-09 11:09:58,114][01612] Fps is (10 sec: 3277.6, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 1662976. Throughput: 0: 901.3. Samples: 416656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:09:58,122][01612] Avg episode reward: [(0, '6.063')] [2024-11-09 11:09:58,124][03294] Saving new best policy, reward=6.063! [2024-11-09 11:10:00,988][03307] Updated weights for policy 0, policy_version 410 (0.0043) [2024-11-09 11:10:03,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1687552. Throughput: 0: 935.8. Samples: 420054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:10:03,121][01612] Avg episode reward: [(0, '6.304')] [2024-11-09 11:10:03,131][03294] Saving new best policy, reward=6.304! [2024-11-09 11:10:08,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1703936. Throughput: 0: 961.9. Samples: 425876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:10:08,120][01612] Avg episode reward: [(0, '6.181')] [2024-11-09 11:10:13,038][03307] Updated weights for policy 0, policy_version 420 (0.0019) [2024-11-09 11:10:13,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1720320. Throughput: 0: 907.8. Samples: 430258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:10:13,120][01612] Avg episode reward: [(0, '5.990')] [2024-11-09 11:10:18,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1740800. Throughput: 0: 905.1. Samples: 433488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:10:18,121][01612] Avg episode reward: [(0, '6.309')] [2024-11-09 11:10:18,125][03294] Saving new best policy, reward=6.309! [2024-11-09 11:10:22,288][03307] Updated weights for policy 0, policy_version 430 (0.0017) [2024-11-09 11:10:23,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1761280. Throughput: 0: 961.6. Samples: 440158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:10:23,116][01612] Avg episode reward: [(0, '6.942')] [2024-11-09 11:10:23,136][03294] Saving new best policy, reward=6.942! [2024-11-09 11:10:28,116][01612] Fps is (10 sec: 3275.9, 60 sec: 3618.0, 300 sec: 3721.1). Total num frames: 1773568. Throughput: 0: 920.9. Samples: 444504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:10:28,119][01612] Avg episode reward: [(0, '6.811')] [2024-11-09 11:10:33,114][01612] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1794048. Throughput: 0: 907.4. Samples: 447042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:10:33,116][01612] Avg episode reward: [(0, '6.462')] [2024-11-09 11:10:34,182][03307] Updated weights for policy 0, policy_version 440 (0.0026) [2024-11-09 11:10:38,114][01612] Fps is (10 sec: 4506.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1818624. Throughput: 0: 947.9. Samples: 453990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:10:38,116][01612] Avg episode reward: [(0, '6.512')] [2024-11-09 11:10:43,114][01612] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1835008. Throughput: 0: 948.6. Samples: 459342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:10:43,116][01612] Avg episode reward: [(0, '6.777')] [2024-11-09 11:10:45,415][03307] Updated weights for policy 0, policy_version 450 (0.0029) [2024-11-09 11:10:48,115][01612] Fps is (10 sec: 3276.3, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 1851392. Throughput: 0: 918.9. Samples: 461406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:10:48,119][01612] Avg episode reward: [(0, '7.211')] [2024-11-09 11:10:48,123][03294] Saving new best policy, reward=7.211! [2024-11-09 11:10:53,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1871872. Throughput: 0: 932.9. Samples: 467858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:10:53,119][01612] Avg episode reward: [(0, '8.034')] [2024-11-09 11:10:53,131][03294] Saving new best policy, reward=8.034! [2024-11-09 11:10:55,046][03307] Updated weights for policy 0, policy_version 460 (0.0020) [2024-11-09 11:10:58,115][01612] Fps is (10 sec: 4095.9, 60 sec: 3822.8, 300 sec: 3721.1). Total num frames: 1892352. Throughput: 0: 972.9. Samples: 474040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:10:58,122][01612] Avg episode reward: [(0, '7.933')] [2024-11-09 11:11:03,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1908736. Throughput: 0: 947.2. Samples: 476114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:11:03,120][01612] Avg episode reward: [(0, '7.845')] [2024-11-09 11:11:07,050][03307] Updated weights for policy 0, policy_version 470 (0.0032) [2024-11-09 11:11:08,114][01612] Fps is (10 sec: 3686.9, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 1929216. Throughput: 0: 917.7. Samples: 481454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:11:08,119][01612] Avg episode reward: [(0, '7.557')] [2024-11-09 11:11:13,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1949696. Throughput: 0: 975.1. Samples: 488380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:11:13,120][01612] Avg episode reward: [(0, '8.667')] [2024-11-09 11:11:13,174][03294] Saving new best policy, reward=8.667! [2024-11-09 11:11:17,375][03307] Updated weights for policy 0, policy_version 480 (0.0021) [2024-11-09 11:11:18,115][01612] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 1966080. Throughput: 0: 976.0. Samples: 490962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:11:18,118][01612] Avg episode reward: [(0, '8.660')] [2024-11-09 11:11:23,114][01612] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1982464. Throughput: 0: 914.2. Samples: 495128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-09 11:11:23,118][01612] Avg episode reward: [(0, '8.664')] [2024-11-09 11:11:27,864][03307] Updated weights for policy 0, policy_version 490 (0.0022) [2024-11-09 11:11:28,114][01612] Fps is (10 sec: 4096.6, 60 sec: 3891.4, 300 sec: 3735.0). Total num frames: 2007040. Throughput: 0: 948.4. Samples: 502022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:11:28,119][01612] Avg episode reward: [(0, '8.536')] [2024-11-09 11:11:33,114][01612] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 2027520. Throughput: 0: 976.3. Samples: 505336. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:11:33,117][01612] Avg episode reward: [(0, '8.886')] [2024-11-09 11:11:33,127][03294] Saving new best policy, reward=8.886! [2024-11-09 11:11:38,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2039808. Throughput: 0: 931.1. Samples: 509756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:11:38,118][01612] Avg episode reward: [(0, '9.602')] [2024-11-09 11:11:38,120][03294] Saving new best policy, reward=9.602! [2024-11-09 11:11:39,723][03307] Updated weights for policy 0, policy_version 500 (0.0030) [2024-11-09 11:11:43,114][01612] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 2060288. Throughput: 0: 927.3. Samples: 515768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:11:43,117][01612] Avg episode reward: [(0, '11.273')] [2024-11-09 11:11:43,129][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000503_2060288.pth... [2024-11-09 11:11:43,264][03294] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000286_1171456.pth [2024-11-09 11:11:43,275][03294] Saving new best policy, reward=11.273! [2024-11-09 11:11:48,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3735.0). Total num frames: 2084864. Throughput: 0: 953.6. Samples: 519024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:11:48,117][01612] Avg episode reward: [(0, '11.492')] [2024-11-09 11:11:48,123][03294] Saving new best policy, reward=11.492! [2024-11-09 11:11:49,461][03307] Updated weights for policy 0, policy_version 510 (0.0026) [2024-11-09 11:11:53,114][01612] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2097152. Throughput: 0: 954.2. Samples: 524394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:11:53,120][01612] Avg episode reward: [(0, '11.944')] [2024-11-09 11:11:53,132][03294] Saving new best policy, reward=11.944! [2024-11-09 11:11:58,114][01612] Fps is (10 sec: 2867.2, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 2113536. Throughput: 0: 906.5. Samples: 529172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:11:58,116][01612] Avg episode reward: [(0, '11.992')] [2024-11-09 11:11:58,122][03294] Saving new best policy, reward=11.992! [2024-11-09 11:12:01,103][03307] Updated weights for policy 0, policy_version 520 (0.0027) [2024-11-09 11:12:03,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2138112. Throughput: 0: 925.4. Samples: 532604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:12:03,116][01612] Avg episode reward: [(0, '11.190')] [2024-11-09 11:12:08,114][01612] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2154496. Throughput: 0: 975.6. Samples: 539032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:12:08,117][01612] Avg episode reward: [(0, '11.786')] [2024-11-09 11:12:12,907][03307] Updated weights for policy 0, policy_version 530 (0.0029) [2024-11-09 11:12:13,114][01612] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2170880. Throughput: 0: 914.7. Samples: 543182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:12:13,118][01612] Avg episode reward: [(0, '12.729')] [2024-11-09 11:12:13,127][03294] Saving new best policy, reward=12.729! [2024-11-09 11:12:18,114][01612] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 2191360. Throughput: 0: 912.4. Samples: 546392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:12:18,122][01612] Avg episode reward: [(0, '13.444')] [2024-11-09 11:12:18,124][03294] Saving new best policy, reward=13.444! [2024-11-09 11:12:22,197][03307] Updated weights for policy 0, policy_version 540 (0.0017) [2024-11-09 11:12:23,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 2215936. Throughput: 0: 961.3. Samples: 553014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:12:23,121][01612] Avg episode reward: [(0, '13.135')] [2024-11-09 11:12:28,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2228224. Throughput: 0: 934.4. Samples: 557816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:12:28,118][01612] Avg episode reward: [(0, '13.803')] [2024-11-09 11:12:28,124][03294] Saving new best policy, reward=13.803! [2024-11-09 11:12:33,114][01612] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2248704. Throughput: 0: 911.9. Samples: 560060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:12:33,120][01612] Avg episode reward: [(0, '13.902')] [2024-11-09 11:12:33,129][03294] Saving new best policy, reward=13.902! [2024-11-09 11:12:34,060][03307] Updated weights for policy 0, policy_version 550 (0.0021) [2024-11-09 11:12:38,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2269184. Throughput: 0: 944.1. Samples: 566878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:12:38,120][01612] Avg episode reward: [(0, '13.733')] [2024-11-09 11:12:43,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2285568. Throughput: 0: 965.8. Samples: 572634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:12:43,119][01612] Avg episode reward: [(0, '14.473')] [2024-11-09 11:12:43,127][03294] Saving new best policy, reward=14.473! [2024-11-09 11:12:44,828][03307] Updated weights for policy 0, policy_version 560 (0.0024) [2024-11-09 11:12:48,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 2301952. Throughput: 0: 933.4. Samples: 574608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:12:48,116][01612] Avg episode reward: [(0, '13.425')] [2024-11-09 11:12:53,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2326528. Throughput: 0: 924.0. Samples: 580614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:12:53,116][01612] Avg episode reward: [(0, '12.209')] [2024-11-09 11:12:54,941][03307] Updated weights for policy 0, policy_version 570 (0.0015) [2024-11-09 11:12:58,114][01612] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 2347008. Throughput: 0: 982.6. Samples: 587398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:12:58,117][01612] Avg episode reward: [(0, '13.087')] [2024-11-09 11:13:03,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 2359296. Throughput: 0: 959.6. Samples: 589572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:13:03,119][01612] Avg episode reward: [(0, '12.781')] [2024-11-09 11:13:07,214][03307] Updated weights for policy 0, policy_version 580 (0.0016) [2024-11-09 11:13:08,114][01612] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2375680. Throughput: 0: 915.8. Samples: 594224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:13:08,116][01612] Avg episode reward: [(0, '13.489')] [2024-11-09 11:13:13,114][01612] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 2392064. Throughput: 0: 901.6. Samples: 598388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:13:13,119][01612] Avg episode reward: [(0, '14.198')] [2024-11-09 11:13:18,115][01612] Fps is (10 sec: 3276.3, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 2408448. Throughput: 0: 917.9. Samples: 601366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:13:18,120][01612] Avg episode reward: [(0, '16.245')] [2024-11-09 11:13:18,122][03294] Saving new best policy, reward=16.245! [2024-11-09 11:13:20,789][03307] Updated weights for policy 0, policy_version 590 (0.0029) [2024-11-09 11:13:23,114][01612] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3693.3). Total num frames: 2424832. Throughput: 0: 856.2. Samples: 605406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:13:23,116][01612] Avg episode reward: [(0, '16.562')] [2024-11-09 11:13:23,126][03294] Saving new best policy, reward=16.562! [2024-11-09 11:13:28,114][01612] Fps is (10 sec: 3687.0, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2445312. Throughput: 0: 874.8. Samples: 612000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:13:28,116][01612] Avg episode reward: [(0, '17.682')] [2024-11-09 11:13:28,124][03294] Saving new best policy, reward=17.682! [2024-11-09 11:13:30,274][03307] Updated weights for policy 0, policy_version 600 (0.0023) [2024-11-09 11:13:33,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2465792. Throughput: 0: 903.9. Samples: 615284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:13:33,116][01612] Avg episode reward: [(0, '18.636')] [2024-11-09 11:13:33,137][03294] Saving new best policy, reward=18.636! [2024-11-09 11:13:38,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 2478080. Throughput: 0: 872.4. Samples: 619874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:13:38,120][01612] Avg episode reward: [(0, '17.948')] [2024-11-09 11:13:42,461][03307] Updated weights for policy 0, policy_version 610 (0.0017) [2024-11-09 11:13:43,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 2498560. Throughput: 0: 846.0. Samples: 625470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:13:43,117][01612] Avg episode reward: [(0, '17.334')] [2024-11-09 11:13:43,128][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000610_2498560.pth... [2024-11-09 11:13:43,257][03294] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000393_1609728.pth [2024-11-09 11:13:48,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2523136. Throughput: 0: 869.3. Samples: 628692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:13:48,121][01612] Avg episode reward: [(0, '16.283')] [2024-11-09 11:13:52,778][03307] Updated weights for policy 0, policy_version 620 (0.0022) [2024-11-09 11:13:53,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 2539520. Throughput: 0: 895.7. Samples: 634530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:13:53,118][01612] Avg episode reward: [(0, '15.864')] [2024-11-09 11:13:58,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 2555904. Throughput: 0: 902.6. Samples: 639004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:13:58,121][01612] Avg episode reward: [(0, '17.377')] [2024-11-09 11:14:03,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2576384. Throughput: 0: 913.4. Samples: 642468. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-09 11:14:03,121][01612] Avg episode reward: [(0, '17.317')] [2024-11-09 11:14:03,380][03307] Updated weights for policy 0, policy_version 630 (0.0016) [2024-11-09 11:14:08,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2596864. Throughput: 0: 977.2. Samples: 649380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:14:08,118][01612] Avg episode reward: [(0, '18.487')] [2024-11-09 11:14:13,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3707.2). Total num frames: 2609152. Throughput: 0: 924.0. Samples: 653578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:14:13,118][01612] Avg episode reward: [(0, '18.701')] [2024-11-09 11:14:13,139][03294] Saving new best policy, reward=18.701! [2024-11-09 11:14:15,244][03307] Updated weights for policy 0, policy_version 640 (0.0027) [2024-11-09 11:14:18,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 2633728. Throughput: 0: 912.3. Samples: 656336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:14:18,119][01612] Avg episode reward: [(0, '19.811')] [2024-11-09 11:14:18,123][03294] Saving new best policy, reward=19.811! [2024-11-09 11:14:23,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2654208. Throughput: 0: 955.4. Samples: 662868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:14:23,119][01612] Avg episode reward: [(0, '19.228')] [2024-11-09 11:14:24,937][03307] Updated weights for policy 0, policy_version 650 (0.0024) [2024-11-09 11:14:28,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2670592. Throughput: 0: 943.8. Samples: 667942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:14:28,122][01612] Avg episode reward: [(0, '20.091')] [2024-11-09 11:14:28,124][03294] Saving new best policy, reward=20.091! [2024-11-09 11:14:33,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2686976. Throughput: 0: 917.6. Samples: 669984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:14:33,117][01612] Avg episode reward: [(0, '19.410')] [2024-11-09 11:14:36,529][03307] Updated weights for policy 0, policy_version 660 (0.0019) [2024-11-09 11:14:38,116][01612] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3721.1). Total num frames: 2707456. Throughput: 0: 930.6. Samples: 676410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:14:38,120][01612] Avg episode reward: [(0, '19.645')] [2024-11-09 11:14:43,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2727936. Throughput: 0: 969.1. Samples: 682614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:14:43,116][01612] Avg episode reward: [(0, '19.040')] [2024-11-09 11:14:48,115][01612] Fps is (10 sec: 3277.0, 60 sec: 3618.0, 300 sec: 3707.2). Total num frames: 2740224. Throughput: 0: 935.1. Samples: 684550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:14:48,118][01612] Avg episode reward: [(0, '20.827')] [2024-11-09 11:14:48,123][03294] Saving new best policy, reward=20.827! [2024-11-09 11:14:48,736][03307] Updated weights for policy 0, policy_version 670 (0.0024) [2024-11-09 11:14:53,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2760704. Throughput: 0: 902.8. Samples: 690006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:14:53,121][01612] Avg episode reward: [(0, '20.970')] [2024-11-09 11:14:53,186][03294] Saving new best policy, reward=20.970! [2024-11-09 11:14:57,734][03307] Updated weights for policy 0, policy_version 680 (0.0014) [2024-11-09 11:14:58,114][01612] Fps is (10 sec: 4506.2, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2785280. Throughput: 0: 958.7. Samples: 696718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:14:58,117][01612] Avg episode reward: [(0, '21.003')] [2024-11-09 11:14:58,122][03294] Saving new best policy, reward=21.003! [2024-11-09 11:15:03,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2797568. Throughput: 0: 952.4. Samples: 699196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:15:03,119][01612] Avg episode reward: [(0, '20.628')] [2024-11-09 11:15:08,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2818048. Throughput: 0: 907.2. Samples: 703694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:15:08,123][01612] Avg episode reward: [(0, '21.222')] [2024-11-09 11:15:08,125][03294] Saving new best policy, reward=21.222! [2024-11-09 11:15:09,789][03307] Updated weights for policy 0, policy_version 690 (0.0015) [2024-11-09 11:15:13,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2838528. Throughput: 0: 941.4. Samples: 710306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:15:13,119][01612] Avg episode reward: [(0, '20.288')] [2024-11-09 11:15:18,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2859008. Throughput: 0: 973.7. Samples: 713800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:15:18,120][01612] Avg episode reward: [(0, '20.752')] [2024-11-09 11:15:20,677][03307] Updated weights for policy 0, policy_version 700 (0.0030) [2024-11-09 11:15:23,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 2871296. Throughput: 0: 925.9. Samples: 718072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:15:23,118][01612] Avg episode reward: [(0, '21.128')] [2024-11-09 11:15:28,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2895872. Throughput: 0: 921.7. Samples: 724090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:15:28,116][01612] Avg episode reward: [(0, '23.261')] [2024-11-09 11:15:28,119][03294] Saving new best policy, reward=23.261! [2024-11-09 11:15:30,888][03307] Updated weights for policy 0, policy_version 710 (0.0025) [2024-11-09 11:15:33,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2916352. Throughput: 0: 953.1. Samples: 727440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:15:33,117][01612] Avg episode reward: [(0, '22.900')] [2024-11-09 11:15:38,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3707.2). Total num frames: 2928640. Throughput: 0: 947.8. Samples: 732658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:15:38,118][01612] Avg episode reward: [(0, '22.901')] [2024-11-09 11:15:42,672][03307] Updated weights for policy 0, policy_version 720 (0.0028) [2024-11-09 11:15:43,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2949120. Throughput: 0: 913.2. Samples: 737812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:15:43,116][01612] Avg episode reward: [(0, '22.660')] [2024-11-09 11:15:43,123][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000720_2949120.pth... [2024-11-09 11:15:43,267][03294] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000503_2060288.pth [2024-11-09 11:15:48,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3721.1). Total num frames: 2969600. Throughput: 0: 931.5. Samples: 741114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:15:48,119][01612] Avg episode reward: [(0, '20.826')] [2024-11-09 11:15:52,926][03307] Updated weights for policy 0, policy_version 730 (0.0023) [2024-11-09 11:15:53,114][01612] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2990080. Throughput: 0: 967.9. Samples: 747248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:15:53,119][01612] Avg episode reward: [(0, '20.967')] [2024-11-09 11:15:58,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 3002368. Throughput: 0: 912.9. Samples: 751388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:15:58,116][01612] Avg episode reward: [(0, '20.927')] [2024-11-09 11:16:03,114][01612] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3026944. Throughput: 0: 907.6. Samples: 754640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:16:03,117][01612] Avg episode reward: [(0, '20.897')] [2024-11-09 11:16:03,979][03307] Updated weights for policy 0, policy_version 740 (0.0054) [2024-11-09 11:16:08,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3047424. Throughput: 0: 961.8. Samples: 761354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:16:08,124][01612] Avg episode reward: [(0, '20.740')] [2024-11-09 11:16:13,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3059712. Throughput: 0: 931.5. Samples: 766008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:16:13,121][01612] Avg episode reward: [(0, '20.749')] [2024-11-09 11:16:15,668][03307] Updated weights for policy 0, policy_version 750 (0.0024) [2024-11-09 11:16:18,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3080192. Throughput: 0: 912.2. Samples: 768490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:16:18,116][01612] Avg episode reward: [(0, '20.137')] [2024-11-09 11:16:23,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 3104768. Throughput: 0: 949.6. Samples: 775390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:16:23,116][01612] Avg episode reward: [(0, '20.206')] [2024-11-09 11:16:24,631][03307] Updated weights for policy 0, policy_version 760 (0.0018) [2024-11-09 11:16:28,116][01612] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 3121152. Throughput: 0: 958.2. Samples: 780934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:16:28,120][01612] Avg episode reward: [(0, '20.516')] [2024-11-09 11:16:33,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3137536. Throughput: 0: 930.7. Samples: 782996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:16:33,117][01612] Avg episode reward: [(0, '21.907')] [2024-11-09 11:16:36,509][03307] Updated weights for policy 0, policy_version 770 (0.0029) [2024-11-09 11:16:38,114][01612] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3158016. Throughput: 0: 932.4. Samples: 789206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:16:38,117][01612] Avg episode reward: [(0, '22.084')] [2024-11-09 11:16:43,115][01612] Fps is (10 sec: 4505.0, 60 sec: 3891.1, 300 sec: 3721.1). Total num frames: 3182592. Throughput: 0: 991.3. Samples: 796000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:16:43,118][01612] Avg episode reward: [(0, '21.838')] [2024-11-09 11:16:47,288][03307] Updated weights for policy 0, policy_version 780 (0.0026) [2024-11-09 11:16:48,121][01612] Fps is (10 sec: 3683.7, 60 sec: 3754.2, 300 sec: 3721.0). Total num frames: 3194880. Throughput: 0: 965.6. Samples: 798098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:16:48,125][01612] Avg episode reward: [(0, '21.834')] [2024-11-09 11:16:53,114][01612] Fps is (10 sec: 3277.2, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3215360. Throughput: 0: 931.3. Samples: 803262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:16:53,116][01612] Avg episode reward: [(0, '20.918')] [2024-11-09 11:16:56,935][03307] Updated weights for policy 0, policy_version 790 (0.0023) [2024-11-09 11:16:58,114][01612] Fps is (10 sec: 4509.0, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 3239936. Throughput: 0: 984.2. Samples: 810298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:16:58,121][01612] Avg episode reward: [(0, '22.463')] [2024-11-09 11:17:03,117][01612] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3735.0). Total num frames: 3256320. Throughput: 0: 992.1. Samples: 813138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:17:03,123][01612] Avg episode reward: [(0, '21.580')] [2024-11-09 11:17:08,114][01612] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3268608. Throughput: 0: 922.4. Samples: 816898. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:17:08,119][01612] Avg episode reward: [(0, '21.311')] [2024-11-09 11:17:10,814][03307] Updated weights for policy 0, policy_version 800 (0.0029) [2024-11-09 11:17:13,114][01612] Fps is (10 sec: 2458.5, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3280896. Throughput: 0: 892.6. Samples: 821098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:17:13,116][01612] Avg episode reward: [(0, '21.480')] [2024-11-09 11:17:18,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 3305472. Throughput: 0: 924.8. Samples: 824610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:17:18,121][01612] Avg episode reward: [(0, '22.795')] [2024-11-09 11:17:22,413][03307] Updated weights for policy 0, policy_version 810 (0.0033) [2024-11-09 11:17:23,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 3317760. Throughput: 0: 897.3. Samples: 829584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:17:23,118][01612] Avg episode reward: [(0, '21.447')] [2024-11-09 11:17:28,114][01612] Fps is (10 sec: 3276.7, 60 sec: 3618.2, 300 sec: 3693.3). Total num frames: 3338240. Throughput: 0: 872.6. Samples: 835268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:17:28,117][01612] Avg episode reward: [(0, '20.968')] [2024-11-09 11:17:31,815][03307] Updated weights for policy 0, policy_version 820 (0.0017) [2024-11-09 11:17:33,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3362816. Throughput: 0: 905.6. Samples: 838844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:17:33,121][01612] Avg episode reward: [(0, '21.969')] [2024-11-09 11:17:38,114][01612] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3379200. Throughput: 0: 919.8. Samples: 844654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:17:38,117][01612] Avg episode reward: [(0, '22.787')] [2024-11-09 11:17:43,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 3395584. Throughput: 0: 867.4. Samples: 849330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:17:43,118][01612] Avg episode reward: [(0, '23.464')] [2024-11-09 11:17:43,129][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000829_3395584.pth... [2024-11-09 11:17:43,254][03294] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000610_2498560.pth [2024-11-09 11:17:43,270][03294] Saving new best policy, reward=23.464! [2024-11-09 11:17:43,854][03307] Updated weights for policy 0, policy_version 830 (0.0017) [2024-11-09 11:17:48,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3755.1, 300 sec: 3707.2). Total num frames: 3420160. Throughput: 0: 878.6. Samples: 852672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:17:48,121][01612] Avg episode reward: [(0, '23.670')] [2024-11-09 11:17:48,124][03294] Saving new best policy, reward=23.670! [2024-11-09 11:17:53,115][01612] Fps is (10 sec: 4095.7, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3436544. Throughput: 0: 943.7. Samples: 859366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:17:53,124][01612] Avg episode reward: [(0, '22.975')] [2024-11-09 11:17:53,602][03307] Updated weights for policy 0, policy_version 840 (0.0029) [2024-11-09 11:17:58,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 3452928. Throughput: 0: 944.6. Samples: 863604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:17:58,119][01612] Avg episode reward: [(0, '23.264')] [2024-11-09 11:18:03,114][01612] Fps is (10 sec: 3686.6, 60 sec: 3618.3, 300 sec: 3721.1). Total num frames: 3473408. Throughput: 0: 935.6. Samples: 866714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:18:03,118][01612] Avg episode reward: [(0, '23.986')] [2024-11-09 11:18:03,129][03294] Saving new best policy, reward=23.986! [2024-11-09 11:18:04,465][03307] Updated weights for policy 0, policy_version 850 (0.0017) [2024-11-09 11:18:08,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3497984. Throughput: 0: 975.5. Samples: 873482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:18:08,118][01612] Avg episode reward: [(0, '23.305')] [2024-11-09 11:18:13,115][01612] Fps is (10 sec: 3685.8, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 3510272. Throughput: 0: 964.0. Samples: 878648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:18:13,118][01612] Avg episode reward: [(0, '24.724')] [2024-11-09 11:18:13,136][03294] Saving new best policy, reward=24.724! [2024-11-09 11:18:16,243][03307] Updated weights for policy 0, policy_version 860 (0.0031) [2024-11-09 11:18:18,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3530752. Throughput: 0: 928.5. Samples: 880626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:18:18,116][01612] Avg episode reward: [(0, '23.501')] [2024-11-09 11:18:23,116][01612] Fps is (10 sec: 4095.7, 60 sec: 3891.0, 300 sec: 3748.9). Total num frames: 3551232. Throughput: 0: 950.7. Samples: 887438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:18:23,122][01612] Avg episode reward: [(0, '23.203')] [2024-11-09 11:18:25,152][03307] Updated weights for policy 0, policy_version 870 (0.0014) [2024-11-09 11:18:28,114][01612] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3571712. Throughput: 0: 983.1. Samples: 893568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:18:28,117][01612] Avg episode reward: [(0, '24.163')] [2024-11-09 11:18:33,114][01612] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3584000. Throughput: 0: 955.4. Samples: 895666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:18:33,120][01612] Avg episode reward: [(0, '24.245')] [2024-11-09 11:18:36,783][03307] Updated weights for policy 0, policy_version 880 (0.0023) [2024-11-09 11:18:38,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3608576. Throughput: 0: 938.3. Samples: 901590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:18:38,117][01612] Avg episode reward: [(0, '24.910')] [2024-11-09 11:18:38,119][03294] Saving new best policy, reward=24.910! [2024-11-09 11:18:43,114][01612] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3629056. Throughput: 0: 993.3. Samples: 908304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:18:43,116][01612] Avg episode reward: [(0, '24.301')] [2024-11-09 11:18:47,584][03307] Updated weights for policy 0, policy_version 890 (0.0019) [2024-11-09 11:18:48,117][01612] Fps is (10 sec: 3685.2, 60 sec: 3754.5, 300 sec: 3748.8). Total num frames: 3645440. Throughput: 0: 974.8. Samples: 910584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:18:48,125][01612] Avg episode reward: [(0, '24.872')] [2024-11-09 11:18:53,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 3665920. Throughput: 0: 930.6. Samples: 915360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:18:53,116][01612] Avg episode reward: [(0, '25.731')] [2024-11-09 11:18:53,127][03294] Saving new best policy, reward=25.731! [2024-11-09 11:18:57,599][03307] Updated weights for policy 0, policy_version 900 (0.0029) [2024-11-09 11:18:58,114][01612] Fps is (10 sec: 4097.4, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3686400. Throughput: 0: 969.0. Samples: 922250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:18:58,116][01612] Avg episode reward: [(0, '26.167')] [2024-11-09 11:18:58,125][03294] Saving new best policy, reward=26.167! [2024-11-09 11:19:03,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3702784. Throughput: 0: 996.5. Samples: 925470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:19:03,120][01612] Avg episode reward: [(0, '26.437')] [2024-11-09 11:19:03,132][03294] Saving new best policy, reward=26.437! [2024-11-09 11:19:08,114][01612] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3719168. Throughput: 0: 936.7. Samples: 929588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:19:08,121][01612] Avg episode reward: [(0, '25.559')] [2024-11-09 11:19:09,140][03307] Updated weights for policy 0, policy_version 910 (0.0031) [2024-11-09 11:19:13,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3762.8). Total num frames: 3743744. Throughput: 0: 952.4. Samples: 936428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-09 11:19:13,117][01612] Avg episode reward: [(0, '26.129')] [2024-11-09 11:19:18,114][01612] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3764224. Throughput: 0: 978.5. Samples: 939700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-09 11:19:18,120][01612] Avg episode reward: [(0, '25.286')] [2024-11-09 11:19:18,897][03307] Updated weights for policy 0, policy_version 920 (0.0016) [2024-11-09 11:19:23,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3748.9). Total num frames: 3776512. Throughput: 0: 957.2. Samples: 944666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:19:23,118][01612] Avg episode reward: [(0, '24.592')] [2024-11-09 11:19:28,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3801088. Throughput: 0: 931.1. Samples: 950202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:19:28,116][01612] Avg episode reward: [(0, '23.620')] [2024-11-09 11:19:30,039][03307] Updated weights for policy 0, policy_version 930 (0.0021) [2024-11-09 11:19:33,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 3821568. Throughput: 0: 958.0. Samples: 953692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:19:33,118][01612] Avg episode reward: [(0, '23.471')] [2024-11-09 11:19:38,114][01612] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3837952. Throughput: 0: 982.4. Samples: 959566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:19:38,116][01612] Avg episode reward: [(0, '24.289')] [2024-11-09 11:19:41,719][03307] Updated weights for policy 0, policy_version 940 (0.0017) [2024-11-09 11:19:43,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3854336. Throughput: 0: 930.9. Samples: 964140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-09 11:19:43,121][01612] Avg episode reward: [(0, '23.743')] [2024-11-09 11:19:43,139][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth... [2024-11-09 11:19:43,278][03294] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000720_2949120.pth [2024-11-09 11:19:48,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3790.5). Total num frames: 3878912. Throughput: 0: 935.7. Samples: 967578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:19:48,116][01612] Avg episode reward: [(0, '23.543')] [2024-11-09 11:19:50,779][03307] Updated weights for policy 0, policy_version 950 (0.0025) [2024-11-09 11:19:53,114][01612] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3895296. Throughput: 0: 997.3. Samples: 974466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-09 11:19:53,120][01612] Avg episode reward: [(0, '23.627')] [2024-11-09 11:19:58,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3911680. Throughput: 0: 939.0. Samples: 978684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-09 11:19:58,116][01612] Avg episode reward: [(0, '22.985')] [2024-11-09 11:20:02,349][03307] Updated weights for policy 0, policy_version 960 (0.0016) [2024-11-09 11:20:03,114][01612] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3932160. Throughput: 0: 931.9. Samples: 981634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:20:03,116][01612] Avg episode reward: [(0, '21.570')] [2024-11-09 11:20:08,114][01612] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 3956736. Throughput: 0: 971.6. Samples: 988390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-09 11:20:08,120][01612] Avg episode reward: [(0, '21.628')] [2024-11-09 11:20:13,016][03307] Updated weights for policy 0, policy_version 970 (0.0035) [2024-11-09 11:20:13,114][01612] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3973120. Throughput: 0: 964.6. Samples: 993610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-09 11:20:13,122][01612] Avg episode reward: [(0, '22.835')] [2024-11-09 11:20:18,114][01612] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3989504. Throughput: 0: 933.2. Samples: 995684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-09 11:20:18,116][01612] Avg episode reward: [(0, '22.370')] [2024-11-09 11:20:21,238][03294] Stopping Batcher_0... [2024-11-09 11:20:21,238][03294] Loop batcher_evt_loop terminating... [2024-11-09 11:20:21,239][01612] Component Batcher_0 stopped! [2024-11-09 11:20:21,247][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-09 11:20:21,294][03307] Weights refcount: 2 0 [2024-11-09 11:20:21,300][01612] Component InferenceWorker_p0-w0 stopped! [2024-11-09 11:20:21,302][03307] Stopping InferenceWorker_p0-w0... [2024-11-09 11:20:21,306][03307] Loop inference_proc0-0_evt_loop terminating... [2024-11-09 11:20:21,365][03294] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000829_3395584.pth [2024-11-09 11:20:21,390][03294] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-09 11:20:21,500][03311] Stopping RolloutWorker_w3... [2024-11-09 11:20:21,500][01612] Component RolloutWorker_w3 stopped! [2024-11-09 11:20:21,508][03309] Stopping RolloutWorker_w1... [2024-11-09 11:20:21,509][03309] Loop rollout_proc1_evt_loop terminating... [2024-11-09 11:20:21,508][01612] Component RolloutWorker_w1 stopped! [2024-11-09 11:20:21,501][03311] Loop rollout_proc3_evt_loop terminating... [2024-11-09 11:20:21,542][03313] Stopping RolloutWorker_w5... [2024-11-09 11:20:21,542][01612] Component RolloutWorker_w5 stopped! [2024-11-09 11:20:21,548][03313] Loop rollout_proc5_evt_loop terminating... [2024-11-09 11:20:21,557][03315] Stopping RolloutWorker_w7... [2024-11-09 11:20:21,559][01612] Component RolloutWorker_w7 stopped! [2024-11-09 11:20:21,561][03315] Loop rollout_proc7_evt_loop terminating... [2024-11-09 11:20:21,615][01612] Component LearnerWorker_p0 stopped! [2024-11-09 11:20:21,617][03294] Stopping LearnerWorker_p0... [2024-11-09 11:20:21,618][03294] Loop learner_proc0_evt_loop terminating... [2024-11-09 11:20:21,706][01612] Component RolloutWorker_w6 stopped! [2024-11-09 11:20:21,708][03314] Stopping RolloutWorker_w6... [2024-11-09 11:20:21,710][03314] Loop rollout_proc6_evt_loop terminating... [2024-11-09 11:20:21,725][01612] Component RolloutWorker_w0 stopped! [2024-11-09 11:20:21,727][03308] Stopping RolloutWorker_w0... [2024-11-09 11:20:21,727][03308] Loop rollout_proc0_evt_loop terminating... [2024-11-09 11:20:21,747][01612] Component RolloutWorker_w4 stopped! [2024-11-09 11:20:21,750][03312] Stopping RolloutWorker_w4... [2024-11-09 11:20:21,753][01612] Component RolloutWorker_w2 stopped! [2024-11-09 11:20:21,756][01612] Waiting for process learner_proc0 to stop... [2024-11-09 11:20:21,759][03310] Stopping RolloutWorker_w2... [2024-11-09 11:20:21,762][03312] Loop rollout_proc4_evt_loop terminating... [2024-11-09 11:20:21,761][03310] Loop rollout_proc2_evt_loop terminating... [2024-11-09 11:20:23,373][01612] Waiting for process inference_proc0-0 to join... [2024-11-09 11:20:23,378][01612] Waiting for process rollout_proc0 to join... [2024-11-09 11:20:25,321][01612] Waiting for process rollout_proc1 to join... [2024-11-09 11:20:25,326][01612] Waiting for process rollout_proc2 to join... [2024-11-09 11:20:25,330][01612] Waiting for process rollout_proc3 to join... [2024-11-09 11:20:25,333][01612] Waiting for process rollout_proc4 to join... [2024-11-09 11:20:25,336][01612] Waiting for process rollout_proc5 to join... [2024-11-09 11:20:25,340][01612] Waiting for process rollout_proc6 to join... [2024-11-09 11:20:25,343][01612] Waiting for process rollout_proc7 to join... [2024-11-09 11:20:25,347][01612] Batcher 0 profile tree view: batching: 27.2749, releasing_batches: 0.0311 [2024-11-09 11:20:25,349][01612] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 429.5295 update_model: 8.6972 weight_update: 0.0025 one_step: 0.0058 handle_policy_step: 601.3379 deserialize: 15.2388, stack: 3.3431, obs_to_device_normalize: 125.9524, forward: 303.8929, send_messages: 29.5587 prepare_outputs: 93.2490 to_cpu: 56.7187 [2024-11-09 11:20:25,351][01612] Learner 0 profile tree view: misc: 0.0076, prepare_batch: 13.9792 train: 75.0934 epoch_init: 0.0057, minibatch_init: 0.0127, losses_postprocess: 0.6994, kl_divergence: 0.6159, after_optimizer: 34.3164 calculate_losses: 26.3142 losses_init: 0.0037, forward_head: 1.3265, bptt_initial: 17.2755, tail: 1.1770, advantages_returns: 0.2965, losses: 3.8579 bptt: 2.0538 bptt_forward_core: 1.9470 update: 12.4609 clip: 0.9337 [2024-11-09 11:20:25,353][01612] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3553, enqueue_policy_requests: 106.6183, env_step: 843.5808, overhead: 13.8938, complete_rollouts: 6.6612 save_policy_outputs: 22.1789 split_output_tensors: 8.9567 [2024-11-09 11:20:25,354][01612] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3196, enqueue_policy_requests: 106.5967, env_step: 842.2183, overhead: 14.1622, complete_rollouts: 8.0041 save_policy_outputs: 21.7474 split_output_tensors: 8.2426 [2024-11-09 11:20:25,356][01612] Loop Runner_EvtLoop terminating... [2024-11-09 11:20:25,358][01612] Runner profile tree view: main_loop: 1116.5439 [2024-11-09 11:20:25,359][01612] Collected {0: 4005888}, FPS: 3587.8 [2024-11-09 11:20:25,772][01612] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-09 11:20:25,774][01612] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-09 11:20:25,777][01612] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-09 11:20:25,779][01612] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-09 11:20:25,781][01612] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-09 11:20:25,783][01612] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-09 11:20:25,784][01612] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-09 11:20:25,786][01612] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-09 11:20:25,787][01612] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-09 11:20:25,788][01612] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-09 11:20:25,789][01612] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-09 11:20:25,791][01612] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-09 11:20:25,792][01612] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-09 11:20:25,793][01612] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-09 11:20:25,794][01612] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-09 11:20:25,834][01612] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-09 11:20:25,838][01612] RunningMeanStd input shape: (3, 72, 128) [2024-11-09 11:20:25,841][01612] RunningMeanStd input shape: (1,) [2024-11-09 11:20:25,858][01612] ConvEncoder: input_channels=3 [2024-11-09 11:20:25,999][01612] Conv encoder output size: 512 [2024-11-09 11:20:26,002][01612] Policy head output size: 512 [2024-11-09 11:20:26,230][01612] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-09 11:20:28,031][01612] Num frames 100... [2024-11-09 11:20:28,410][01612] Num frames 200... [2024-11-09 11:20:28,640][01612] Num frames 300... [2024-11-09 11:20:28,819][01612] Num frames 400... [2024-11-09 11:20:29,004][01612] Num frames 500... [2024-11-09 11:20:29,198][01612] Num frames 600... [2024-11-09 11:20:29,411][01612] Num frames 700... [2024-11-09 11:20:29,606][01612] Num frames 800... [2024-11-09 11:20:29,789][01612] Num frames 900... [2024-11-09 11:20:30,020][01612] Num frames 1000... [2024-11-09 11:20:30,227][01612] Num frames 1100... [2024-11-09 11:20:30,524][01612] Num frames 1200... [2024-11-09 11:20:30,714][01612] Num frames 1300... [2024-11-09 11:20:30,891][01612] Num frames 1400... [2024-11-09 11:20:31,070][01612] Num frames 1500... [2024-11-09 11:20:31,280][01612] Num frames 1600... [2024-11-09 11:20:31,569][01612] Num frames 1700... [2024-11-09 11:20:31,784][01612] Num frames 1800... [2024-11-09 11:20:31,973][01612] Num frames 1900... [2024-11-09 11:20:32,195][01612] Num frames 2000... [2024-11-09 11:20:32,409][01612] Num frames 2100... [2024-11-09 11:20:32,466][01612] Avg episode rewards: #0: 55.999, true rewards: #0: 21.000 [2024-11-09 11:20:32,467][01612] Avg episode reward: 55.999, avg true_objective: 21.000 [2024-11-09 11:20:32,657][01612] Num frames 2200... [2024-11-09 11:20:32,895][01612] Num frames 2300... [2024-11-09 11:20:33,113][01612] Num frames 2400... [2024-11-09 11:20:33,315][01612] Num frames 2500... [2024-11-09 11:20:33,531][01612] Avg episode rewards: #0: 31.399, true rewards: #0: 12.900 [2024-11-09 11:20:33,536][01612] Avg episode reward: 31.399, avg true_objective: 12.900 [2024-11-09 11:20:33,583][01612] Num frames 2600... [2024-11-09 11:20:33,915][01612] Num frames 2700... [2024-11-09 11:20:34,277][01612] Num frames 2800... [2024-11-09 11:20:34,551][01612] Num frames 2900... [2024-11-09 11:20:34,747][01612] Num frames 3000... [2024-11-09 11:20:34,931][01612] Num frames 3100... [2024-11-09 11:20:35,253][01612] Num frames 3200... [2024-11-09 11:20:35,565][01612] Num frames 3300... [2024-11-09 11:20:35,684][01612] Num frames 3400... [2024-11-09 11:20:35,812][01612] Num frames 3500... [2024-11-09 11:20:35,955][01612] Num frames 3600... [2024-11-09 11:20:36,082][01612] Num frames 3700... [2024-11-09 11:20:36,205][01612] Num frames 3800... [2024-11-09 11:20:36,329][01612] Num frames 3900... [2024-11-09 11:20:36,468][01612] Num frames 4000... [2024-11-09 11:20:36,594][01612] Num frames 4100... [2024-11-09 11:20:36,713][01612] Num frames 4200... [2024-11-09 11:20:36,834][01612] Num frames 4300... [2024-11-09 11:20:36,939][01612] Avg episode rewards: #0: 35.466, true rewards: #0: 14.467 [2024-11-09 11:20:36,941][01612] Avg episode reward: 35.466, avg true_objective: 14.467 [2024-11-09 11:20:37,014][01612] Num frames 4400... [2024-11-09 11:20:37,140][01612] Num frames 4500... [2024-11-09 11:20:37,259][01612] Num frames 4600... [2024-11-09 11:20:37,382][01612] Num frames 4700... [2024-11-09 11:20:37,517][01612] Num frames 4800... [2024-11-09 11:20:37,639][01612] Num frames 4900... [2024-11-09 11:20:37,764][01612] Num frames 5000... [2024-11-09 11:20:37,887][01612] Num frames 5100... [2024-11-09 11:20:38,011][01612] Num frames 5200... [2024-11-09 11:20:38,134][01612] Num frames 5300... [2024-11-09 11:20:38,256][01612] Num frames 5400... [2024-11-09 11:20:38,395][01612] Num frames 5500... [2024-11-09 11:20:38,559][01612] Num frames 5600... [2024-11-09 11:20:38,737][01612] Num frames 5700... [2024-11-09 11:20:38,915][01612] Num frames 5800... [2024-11-09 11:20:39,088][01612] Num frames 5900... [2024-11-09 11:20:39,256][01612] Num frames 6000... [2024-11-09 11:20:39,428][01612] Num frames 6100... [2024-11-09 11:20:39,598][01612] Num frames 6200... [2024-11-09 11:20:39,755][01612] Avg episode rewards: #0: 38.899, true rewards: #0: 15.650 [2024-11-09 11:20:39,758][01612] Avg episode reward: 38.899, avg true_objective: 15.650 [2024-11-09 11:20:39,829][01612] Num frames 6300... [2024-11-09 11:20:40,001][01612] Num frames 6400... [2024-11-09 11:20:40,164][01612] Num frames 6500... [2024-11-09 11:20:40,338][01612] Num frames 6600... [2024-11-09 11:20:40,515][01612] Num frames 6700... [2024-11-09 11:20:40,703][01612] Num frames 6800... [2024-11-09 11:20:40,885][01612] Num frames 6900... [2024-11-09 11:20:41,053][01612] Num frames 7000... [2024-11-09 11:20:41,178][01612] Num frames 7100... [2024-11-09 11:20:41,298][01612] Num frames 7200... [2024-11-09 11:20:41,418][01612] Num frames 7300... [2024-11-09 11:20:41,549][01612] Num frames 7400... [2024-11-09 11:20:41,677][01612] Num frames 7500... [2024-11-09 11:20:41,806][01612] Num frames 7600... [2024-11-09 11:20:41,938][01612] Num frames 7700... [2024-11-09 11:20:42,061][01612] Num frames 7800... [2024-11-09 11:20:42,187][01612] Num frames 7900... [2024-11-09 11:20:42,313][01612] Num frames 8000... [2024-11-09 11:20:42,438][01612] Num frames 8100... [2024-11-09 11:20:42,568][01612] Num frames 8200... [2024-11-09 11:20:42,689][01612] Num frames 8300... [2024-11-09 11:20:42,824][01612] Avg episode rewards: #0: 43.119, true rewards: #0: 16.720 [2024-11-09 11:20:42,825][01612] Avg episode reward: 43.119, avg true_objective: 16.720 [2024-11-09 11:20:42,880][01612] Num frames 8400... [2024-11-09 11:20:43,000][01612] Num frames 8500... [2024-11-09 11:20:43,120][01612] Num frames 8600... [2024-11-09 11:20:43,240][01612] Num frames 8700... [2024-11-09 11:20:43,364][01612] Num frames 8800... [2024-11-09 11:20:43,497][01612] Num frames 8900... [2024-11-09 11:20:43,616][01612] Num frames 9000... [2024-11-09 11:20:43,746][01612] Num frames 9100... [2024-11-09 11:20:43,868][01612] Num frames 9200... [2024-11-09 11:20:43,994][01612] Num frames 9300... [2024-11-09 11:20:44,117][01612] Num frames 9400... [2024-11-09 11:20:44,241][01612] Num frames 9500... [2024-11-09 11:20:44,369][01612] Num frames 9600... [2024-11-09 11:20:44,504][01612] Num frames 9700... [2024-11-09 11:20:44,628][01612] Num frames 9800... [2024-11-09 11:20:44,726][01612] Avg episode rewards: #0: 41.553, true rewards: #0: 16.387 [2024-11-09 11:20:44,729][01612] Avg episode reward: 41.553, avg true_objective: 16.387 [2024-11-09 11:20:44,820][01612] Num frames 9900... [2024-11-09 11:20:44,938][01612] Num frames 10000... [2024-11-09 11:20:45,058][01612] Num frames 10100... [2024-11-09 11:20:45,180][01612] Num frames 10200... [2024-11-09 11:20:45,299][01612] Num frames 10300... [2024-11-09 11:20:45,429][01612] Num frames 10400... [2024-11-09 11:20:45,559][01612] Num frames 10500... [2024-11-09 11:20:45,679][01612] Num frames 10600... [2024-11-09 11:20:45,806][01612] Num frames 10700... [2024-11-09 11:20:45,926][01612] Num frames 10800... [2024-11-09 11:20:46,044][01612] Num frames 10900... [2024-11-09 11:20:46,167][01612] Num frames 11000... [2024-11-09 11:20:46,290][01612] Num frames 11100... [2024-11-09 11:20:46,456][01612] Avg episode rewards: #0: 40.699, true rewards: #0: 15.986 [2024-11-09 11:20:46,458][01612] Avg episode reward: 40.699, avg true_objective: 15.986 [2024-11-09 11:20:46,475][01612] Num frames 11200... [2024-11-09 11:20:46,591][01612] Num frames 11300... [2024-11-09 11:20:46,717][01612] Num frames 11400... [2024-11-09 11:20:46,849][01612] Num frames 11500... [2024-11-09 11:20:46,965][01612] Num frames 11600... [2024-11-09 11:20:47,083][01612] Num frames 11700... [2024-11-09 11:20:47,202][01612] Num frames 11800... [2024-11-09 11:20:47,320][01612] Num frames 11900... [2024-11-09 11:20:47,449][01612] Num frames 12000... [2024-11-09 11:20:47,579][01612] Num frames 12100... [2024-11-09 11:20:47,702][01612] Num frames 12200... [2024-11-09 11:20:47,788][01612] Avg episode rewards: #0: 38.653, true rewards: #0: 15.279 [2024-11-09 11:20:47,791][01612] Avg episode reward: 38.653, avg true_objective: 15.279 [2024-11-09 11:20:47,898][01612] Num frames 12300... [2024-11-09 11:20:48,016][01612] Num frames 12400... [2024-11-09 11:20:48,136][01612] Num frames 12500... [2024-11-09 11:20:48,256][01612] Num frames 12600... [2024-11-09 11:20:48,375][01612] Num frames 12700... [2024-11-09 11:20:48,506][01612] Num frames 12800... [2024-11-09 11:20:48,623][01612] Num frames 12900... [2024-11-09 11:20:48,741][01612] Num frames 13000... [2024-11-09 11:20:48,882][01612] Num frames 13100... [2024-11-09 11:20:48,984][01612] Avg episode rewards: #0: 36.243, true rewards: #0: 14.577 [2024-11-09 11:20:48,990][01612] Avg episode reward: 36.243, avg true_objective: 14.577 [2024-11-09 11:20:49,170][01612] Num frames 13200... [2024-11-09 11:20:49,375][01612] Num frames 13300... [2024-11-09 11:20:49,571][01612] Num frames 13400... [2024-11-09 11:20:49,634][01612] Avg episode rewards: #0: 33.001, true rewards: #0: 13.401 [2024-11-09 11:20:49,638][01612] Avg episode reward: 33.001, avg true_objective: 13.401 [2024-11-09 11:22:17,012][01612] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-09 11:22:19,994][01612] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-09 11:22:19,999][01612] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-09 11:22:20,000][01612] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-09 11:22:20,003][01612] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-09 11:22:20,007][01612] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-09 11:22:20,009][01612] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-09 11:22:20,013][01612] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-09 11:22:20,015][01612] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-09 11:22:20,016][01612] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-09 11:22:20,017][01612] Adding new argument 'hf_repository'='bkuen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-09 11:22:20,018][01612] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-09 11:22:20,019][01612] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-09 11:22:20,020][01612] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-09 11:22:20,024][01612] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-09 11:22:20,026][01612] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-09 11:22:20,070][01612] RunningMeanStd input shape: (3, 72, 128) [2024-11-09 11:22:20,072][01612] RunningMeanStd input shape: (1,) [2024-11-09 11:22:20,092][01612] ConvEncoder: input_channels=3 [2024-11-09 11:22:20,147][01612] Conv encoder output size: 512 [2024-11-09 11:22:20,149][01612] Policy head output size: 512 [2024-11-09 11:22:20,177][01612] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-09 11:22:20,837][01612] Num frames 100... [2024-11-09 11:22:21,002][01612] Num frames 200... [2024-11-09 11:22:21,170][01612] Num frames 300... [2024-11-09 11:22:21,353][01612] Num frames 400... [2024-11-09 11:22:21,528][01612] Num frames 500... [2024-11-09 11:22:21,707][01612] Num frames 600... [2024-11-09 11:22:21,884][01612] Num frames 700... [2024-11-09 11:22:22,058][01612] Num frames 800... [2024-11-09 11:22:22,283][01612] Avg episode rewards: #0: 20.960, true rewards: #0: 8.960 [2024-11-09 11:22:22,286][01612] Avg episode reward: 20.960, avg true_objective: 8.960 [2024-11-09 11:22:22,298][01612] Num frames 900... [2024-11-09 11:22:22,500][01612] Num frames 1000... [2024-11-09 11:22:22,750][01612] Num frames 1100... [2024-11-09 11:22:22,954][01612] Num frames 1200... [2024-11-09 11:22:23,129][01612] Num frames 1300... [2024-11-09 11:22:23,303][01612] Num frames 1400... [2024-11-09 11:22:23,481][01612] Num frames 1500... [2024-11-09 11:22:23,663][01612] Num frames 1600... [2024-11-09 11:22:23,852][01612] Num frames 1700... [2024-11-09 11:22:24,044][01612] Num frames 1800... [2024-11-09 11:22:24,232][01612] Num frames 1900... [2024-11-09 11:22:24,437][01612] Num frames 2000... [2024-11-09 11:22:24,627][01612] Num frames 2100... [2024-11-09 11:22:24,827][01612] Avg episode rewards: #0: 27.910, true rewards: #0: 10.910 [2024-11-09 11:22:24,829][01612] Avg episode reward: 27.910, avg true_objective: 10.910 [2024-11-09 11:22:24,862][01612] Num frames 2200... [2024-11-09 11:22:25,037][01612] Num frames 2300... [2024-11-09 11:22:25,226][01612] Num frames 2400... [2024-11-09 11:22:25,416][01612] Num frames 2500... [2024-11-09 11:22:25,617][01612] Num frames 2600... [2024-11-09 11:22:25,806][01612] Num frames 2700... [2024-11-09 11:22:26,017][01612] Num frames 2800... [2024-11-09 11:22:26,213][01612] Num frames 2900... [2024-11-09 11:22:26,387][01612] Avg episode rewards: #0: 24.843, true rewards: #0: 9.843 [2024-11-09 11:22:26,389][01612] Avg episode reward: 24.843, avg true_objective: 9.843 [2024-11-09 11:22:26,465][01612] Num frames 3000... [2024-11-09 11:22:26,626][01612] Num frames 3100... [2024-11-09 11:22:26,793][01612] Num frames 3200... [2024-11-09 11:22:26,970][01612] Num frames 3300... [2024-11-09 11:22:27,145][01612] Avg episode rewards: #0: 20.673, true rewards: #0: 8.422 [2024-11-09 11:22:27,147][01612] Avg episode reward: 20.673, avg true_objective: 8.422 [2024-11-09 11:22:27,203][01612] Num frames 3400... [2024-11-09 11:22:27,373][01612] Num frames 3500... [2024-11-09 11:22:27,552][01612] Num frames 3600... [2024-11-09 11:22:27,746][01612] Num frames 3700... [2024-11-09 11:22:27,931][01612] Num frames 3800... [2024-11-09 11:22:28,086][01612] Num frames 3900... [2024-11-09 11:22:28,206][01612] Num frames 4000... [2024-11-09 11:22:28,330][01612] Num frames 4100... [2024-11-09 11:22:28,456][01612] Num frames 4200... [2024-11-09 11:22:28,581][01612] Num frames 4300... [2024-11-09 11:22:28,710][01612] Num frames 4400... [2024-11-09 11:22:28,833][01612] Num frames 4500... [2024-11-09 11:22:28,958][01612] Num frames 4600... [2024-11-09 11:22:29,033][01612] Avg episode rewards: #0: 22.426, true rewards: #0: 9.226 [2024-11-09 11:22:29,035][01612] Avg episode reward: 22.426, avg true_objective: 9.226 [2024-11-09 11:22:29,140][01612] Num frames 4700... [2024-11-09 11:22:29,257][01612] Num frames 4800... [2024-11-09 11:22:29,382][01612] Num frames 4900... [2024-11-09 11:22:29,479][01612] Avg episode rewards: #0: 19.388, true rewards: #0: 8.222 [2024-11-09 11:22:29,481][01612] Avg episode reward: 19.388, avg true_objective: 8.222 [2024-11-09 11:22:29,564][01612] Num frames 5000... [2024-11-09 11:22:29,697][01612] Num frames 5100... [2024-11-09 11:22:29,820][01612] Num frames 5200... [2024-11-09 11:22:29,943][01612] Num frames 5300... [2024-11-09 11:22:30,073][01612] Num frames 5400... [2024-11-09 11:22:30,146][01612] Avg episode rewards: #0: 17.733, true rewards: #0: 7.733 [2024-11-09 11:22:30,148][01612] Avg episode reward: 17.733, avg true_objective: 7.733 [2024-11-09 11:22:30,252][01612] Num frames 5500... [2024-11-09 11:22:30,377][01612] Num frames 5600... [2024-11-09 11:22:30,504][01612] Num frames 5700... [2024-11-09 11:22:30,636][01612] Num frames 5800... [2024-11-09 11:22:30,763][01612] Num frames 5900... [2024-11-09 11:22:30,885][01612] Num frames 6000... [2024-11-09 11:22:31,003][01612] Num frames 6100... [2024-11-09 11:22:31,130][01612] Num frames 6200... [2024-11-09 11:22:31,262][01612] Num frames 6300... [2024-11-09 11:22:31,446][01612] Num frames 6400... [2024-11-09 11:22:31,515][01612] Avg episode rewards: #0: 18.381, true rewards: #0: 8.006 [2024-11-09 11:22:31,517][01612] Avg episode reward: 18.381, avg true_objective: 8.006 [2024-11-09 11:22:31,679][01612] Num frames 6500... [2024-11-09 11:22:31,827][01612] Num frames 6600... [2024-11-09 11:22:31,950][01612] Num frames 6700... [2024-11-09 11:22:32,091][01612] Num frames 6800... [2024-11-09 11:22:32,223][01612] Num frames 6900... [2024-11-09 11:22:32,354][01612] Num frames 7000... [2024-11-09 11:22:32,476][01612] Num frames 7100... [2024-11-09 11:22:32,599][01612] Num frames 7200... [2024-11-09 11:22:32,734][01612] Num frames 7300... [2024-11-09 11:22:32,862][01612] Num frames 7400... [2024-11-09 11:22:32,989][01612] Num frames 7500... [2024-11-09 11:22:33,123][01612] Num frames 7600... [2024-11-09 11:22:33,248][01612] Num frames 7700... [2024-11-09 11:22:33,369][01612] Num frames 7800... [2024-11-09 11:22:33,488][01612] Num frames 7900... [2024-11-09 11:22:33,618][01612] Num frames 8000... [2024-11-09 11:22:33,744][01612] Num frames 8100... [2024-11-09 11:22:33,867][01612] Num frames 8200... [2024-11-09 11:22:33,988][01612] Num frames 8300... [2024-11-09 11:22:34,111][01612] Num frames 8400... [2024-11-09 11:22:34,240][01612] Num frames 8500... [2024-11-09 11:22:34,304][01612] Avg episode rewards: #0: 22.117, true rewards: #0: 9.450 [2024-11-09 11:22:34,305][01612] Avg episode reward: 22.117, avg true_objective: 9.450 [2024-11-09 11:22:34,418][01612] Num frames 8600... [2024-11-09 11:22:34,556][01612] Num frames 8700... [2024-11-09 11:22:34,690][01612] Num frames 8800... [2024-11-09 11:22:34,819][01612] Num frames 8900... [2024-11-09 11:22:34,950][01612] Num frames 9000... [2024-11-09 11:22:35,079][01612] Num frames 9100... [2024-11-09 11:22:35,218][01612] Num frames 9200... [2024-11-09 11:22:35,339][01612] Num frames 9300... [2024-11-09 11:22:35,460][01612] Num frames 9400... [2024-11-09 11:22:35,583][01612] Num frames 9500... [2024-11-09 11:22:35,711][01612] Num frames 9600... [2024-11-09 11:22:35,836][01612] Num frames 9700... [2024-11-09 11:22:36,001][01612] Avg episode rewards: #0: 22.879, true rewards: #0: 9.779 [2024-11-09 11:22:36,003][01612] Avg episode reward: 22.879, avg true_objective: 9.779 [2024-11-09 11:23:37,793][01612] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-09 11:25:23,020][01612] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-09 11:25:23,022][01612] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-09 11:25:23,024][01612] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-09 11:25:23,026][01612] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-09 11:25:23,028][01612] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-09 11:25:23,030][01612] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-09 11:25:23,032][01612] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-09 11:25:23,033][01612] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-09 11:25:23,034][01612] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-09 11:25:23,035][01612] Adding new argument 'hf_repository'='bkuen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-09 11:25:23,036][01612] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-09 11:25:23,037][01612] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-09 11:25:23,038][01612] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-09 11:25:23,039][01612] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-09 11:25:23,040][01612] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-09 11:25:23,074][01612] RunningMeanStd input shape: (3, 72, 128) [2024-11-09 11:25:23,076][01612] RunningMeanStd input shape: (1,) [2024-11-09 11:25:23,089][01612] ConvEncoder: input_channels=3 [2024-11-09 11:25:23,127][01612] Conv encoder output size: 512 [2024-11-09 11:25:23,128][01612] Policy head output size: 512 [2024-11-09 11:25:23,148][01612] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-09 11:25:23,584][01612] Num frames 100... [2024-11-09 11:25:23,716][01612] Num frames 200... [2024-11-09 11:25:23,842][01612] Num frames 300... [2024-11-09 11:25:23,959][01612] Num frames 400... [2024-11-09 11:25:24,074][01612] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-11-09 11:25:24,076][01612] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-11-09 11:25:24,140][01612] Num frames 500... [2024-11-09 11:25:24,260][01612] Num frames 600... [2024-11-09 11:25:24,384][01612] Num frames 700... [2024-11-09 11:25:24,506][01612] Num frames 800... [2024-11-09 11:25:24,637][01612] Num frames 900... [2024-11-09 11:25:24,764][01612] Num frames 1000... [2024-11-09 11:25:24,888][01612] Num frames 1100... [2024-11-09 11:25:25,009][01612] Num frames 1200... [2024-11-09 11:25:25,128][01612] Num frames 1300... [2024-11-09 11:25:25,252][01612] Num frames 1400... [2024-11-09 11:25:25,382][01612] Num frames 1500... [2024-11-09 11:25:25,484][01612] Avg episode rewards: #0: 16.680, true rewards: #0: 7.680 [2024-11-09 11:25:25,486][01612] Avg episode reward: 16.680, avg true_objective: 7.680 [2024-11-09 11:25:25,575][01612] Num frames 1600... [2024-11-09 11:25:25,762][01612] Num frames 1700... [2024-11-09 11:25:25,953][01612] Num frames 1800... [2024-11-09 11:25:26,165][01612] Avg episode rewards: #0: 12.640, true rewards: #0: 6.307 [2024-11-09 11:25:26,167][01612] Avg episode reward: 12.640, avg true_objective: 6.307 [2024-11-09 11:25:26,184][01612] Num frames 1900... [2024-11-09 11:25:26,304][01612] Num frames 2000... [2024-11-09 11:25:26,430][01612] Num frames 2100... [2024-11-09 11:25:26,553][01612] Num frames 2200... [2024-11-09 11:25:26,690][01612] Num frames 2300... [2024-11-09 11:25:26,814][01612] Num frames 2400... [2024-11-09 11:25:26,937][01612] Num frames 2500... [2024-11-09 11:25:27,003][01612] Avg episode rewards: #0: 12.268, true rewards: #0: 6.267 [2024-11-09 11:25:27,004][01612] Avg episode reward: 12.268, avg true_objective: 6.267 [2024-11-09 11:25:27,117][01612] Num frames 2600... [2024-11-09 11:25:27,238][01612] Num frames 2700... [2024-11-09 11:25:27,363][01612] Num frames 2800... [2024-11-09 11:25:27,494][01612] Num frames 2900... [2024-11-09 11:25:27,686][01612] Num frames 3000... [2024-11-09 11:25:27,866][01612] Num frames 3100... [2024-11-09 11:25:28,031][01612] Num frames 3200... [2024-11-09 11:25:28,164][01612] Avg episode rewards: #0: 12.486, true rewards: #0: 6.486 [2024-11-09 11:25:28,169][01612] Avg episode reward: 12.486, avg true_objective: 6.486 [2024-11-09 11:25:28,271][01612] Num frames 3300... [2024-11-09 11:25:28,441][01612] Num frames 3400... [2024-11-09 11:25:28,616][01612] Num frames 3500... [2024-11-09 11:25:28,826][01612] Num frames 3600... [2024-11-09 11:25:29,000][01612] Num frames 3700... [2024-11-09 11:25:29,166][01612] Num frames 3800... [2024-11-09 11:25:29,319][01612] Avg episode rewards: #0: 12.752, true rewards: #0: 6.418 [2024-11-09 11:25:29,322][01612] Avg episode reward: 12.752, avg true_objective: 6.418 [2024-11-09 11:25:29,414][01612] Num frames 3900... [2024-11-09 11:25:29,591][01612] Num frames 4000... [2024-11-09 11:25:29,786][01612] Num frames 4100... [2024-11-09 11:25:29,997][01612] Num frames 4200... [2024-11-09 11:25:30,156][01612] Num frames 4300... [2024-11-09 11:25:30,280][01612] Num frames 4400... [2024-11-09 11:25:30,406][01612] Num frames 4500... [2024-11-09 11:25:30,533][01612] Num frames 4600... [2024-11-09 11:25:30,660][01612] Num frames 4700... [2024-11-09 11:25:30,792][01612] Num frames 4800... [2024-11-09 11:25:30,928][01612] Num frames 4900... [2024-11-09 11:25:31,047][01612] Num frames 5000... [2024-11-09 11:25:31,168][01612] Num frames 5100... [2024-11-09 11:25:31,288][01612] Num frames 5200... [2024-11-09 11:25:31,410][01612] Num frames 5300... [2024-11-09 11:25:31,495][01612] Avg episode rewards: #0: 17.319, true rewards: #0: 7.604 [2024-11-09 11:25:31,497][01612] Avg episode reward: 17.319, avg true_objective: 7.604 [2024-11-09 11:25:31,593][01612] Num frames 5400... [2024-11-09 11:25:31,720][01612] Num frames 5500... [2024-11-09 11:25:31,849][01612] Num frames 5600... [2024-11-09 11:25:31,977][01612] Num frames 5700... [2024-11-09 11:25:32,103][01612] Num frames 5800... [2024-11-09 11:25:32,226][01612] Num frames 5900... [2024-11-09 11:25:32,352][01612] Num frames 6000... [2024-11-09 11:25:32,479][01612] Num frames 6100... [2024-11-09 11:25:32,601][01612] Num frames 6200... [2024-11-09 11:25:32,729][01612] Num frames 6300... [2024-11-09 11:25:32,859][01612] Num frames 6400... [2024-11-09 11:25:32,993][01612] Num frames 6500... [2024-11-09 11:25:33,117][01612] Num frames 6600... [2024-11-09 11:25:33,244][01612] Num frames 6700... [2024-11-09 11:25:33,372][01612] Num frames 6800... [2024-11-09 11:25:33,494][01612] Num frames 6900... [2024-11-09 11:25:33,629][01612] Num frames 7000... [2024-11-09 11:25:33,763][01612] Num frames 7100... [2024-11-09 11:25:33,889][01612] Num frames 7200... [2024-11-09 11:25:34,023][01612] Num frames 7300... [2024-11-09 11:25:34,146][01612] Num frames 7400... [2024-11-09 11:25:34,230][01612] Avg episode rewards: #0: 22.404, true rewards: #0: 9.279 [2024-11-09 11:25:34,232][01612] Avg episode reward: 22.404, avg true_objective: 9.279 [2024-11-09 11:25:34,332][01612] Num frames 7500... [2024-11-09 11:25:34,457][01612] Num frames 7600... [2024-11-09 11:25:34,583][01612] Num frames 7700... [2024-11-09 11:25:34,715][01612] Num frames 7800... [2024-11-09 11:25:34,839][01612] Num frames 7900... [2024-11-09 11:25:34,971][01612] Num frames 8000... [2024-11-09 11:25:35,093][01612] Num frames 8100... [2024-11-09 11:25:35,217][01612] Num frames 8200... [2024-11-09 11:25:35,338][01612] Num frames 8300... [2024-11-09 11:25:35,465][01612] Num frames 8400... [2024-11-09 11:25:35,590][01612] Num frames 8500... [2024-11-09 11:25:35,720][01612] Num frames 8600... [2024-11-09 11:25:35,855][01612] Num frames 8700... [2024-11-09 11:25:35,996][01612] Num frames 8800... [2024-11-09 11:25:36,090][01612] Avg episode rewards: #0: 23.478, true rewards: #0: 9.811 [2024-11-09 11:25:36,091][01612] Avg episode reward: 23.478, avg true_objective: 9.811 [2024-11-09 11:25:36,179][01612] Num frames 8900... [2024-11-09 11:25:36,300][01612] Num frames 9000... [2024-11-09 11:25:36,430][01612] Num frames 9100... [2024-11-09 11:25:36,557][01612] Num frames 9200... [2024-11-09 11:25:36,692][01612] Num frames 9300... [2024-11-09 11:25:36,816][01612] Num frames 9400... [2024-11-09 11:25:36,943][01612] Num frames 9500... [2024-11-09 11:25:37,078][01612] Num frames 9600... [2024-11-09 11:25:37,202][01612] Num frames 9700... [2024-11-09 11:25:37,325][01612] Num frames 9800... [2024-11-09 11:25:37,453][01612] Num frames 9900... [2024-11-09 11:25:37,585][01612] Num frames 10000... [2024-11-09 11:25:37,721][01612] Num frames 10100... [2024-11-09 11:25:37,846][01612] Num frames 10200... [2024-11-09 11:25:37,971][01612] Num frames 10300... [2024-11-09 11:25:38,101][01612] Num frames 10400... [2024-11-09 11:25:38,226][01612] Num frames 10500... [2024-11-09 11:25:38,356][01612] Num frames 10600... [2024-11-09 11:25:38,483][01612] Num frames 10700... [2024-11-09 11:25:38,618][01612] Num frames 10800... [2024-11-09 11:25:38,751][01612] Num frames 10900... [2024-11-09 11:25:38,844][01612] Avg episode rewards: #0: 26.130, true rewards: #0: 10.930 [2024-11-09 11:25:38,846][01612] Avg episode reward: 26.130, avg true_objective: 10.930 [2024-11-09 11:26:49,197][01612] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-09 11:27:19,230][01612] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-09 11:27:19,232][01612] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-09 11:27:19,234][01612] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-09 11:27:19,236][01612] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-09 11:27:19,237][01612] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-09 11:27:19,239][01612] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-09 11:27:19,240][01612] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-09 11:27:19,242][01612] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-09 11:27:19,243][01612] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-09 11:27:19,244][01612] Adding new argument 'hf_repository'='bkuen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-09 11:27:19,245][01612] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-09 11:27:19,246][01612] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-09 11:27:19,249][01612] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-09 11:27:19,250][01612] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-09 11:27:19,251][01612] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-09 11:27:19,284][01612] RunningMeanStd input shape: (3, 72, 128) [2024-11-09 11:27:19,287][01612] RunningMeanStd input shape: (1,) [2024-11-09 11:27:19,300][01612] ConvEncoder: input_channels=3 [2024-11-09 11:27:19,340][01612] Conv encoder output size: 512 [2024-11-09 11:27:19,341][01612] Policy head output size: 512 [2024-11-09 11:27:19,362][01612] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-09 11:27:19,839][01612] Num frames 100... [2024-11-09 11:27:19,961][01612] Num frames 200... [2024-11-09 11:27:20,083][01612] Num frames 300... [2024-11-09 11:27:20,206][01612] Num frames 400... [2024-11-09 11:27:20,336][01612] Num frames 500... [2024-11-09 11:27:20,475][01612] Num frames 600... [2024-11-09 11:27:20,608][01612] Num frames 700... [2024-11-09 11:27:20,740][01612] Num frames 800... [2024-11-09 11:27:20,870][01612] Num frames 900... [2024-11-09 11:27:20,996][01612] Num frames 1000... [2024-11-09 11:27:21,128][01612] Num frames 1100... [2024-11-09 11:27:21,313][01612] Avg episode rewards: #0: 28.960, true rewards: #0: 11.960 [2024-11-09 11:27:21,315][01612] Avg episode reward: 28.960, avg true_objective: 11.960 [2024-11-09 11:27:21,324][01612] Num frames 1200... [2024-11-09 11:27:21,466][01612] Num frames 1300... [2024-11-09 11:27:21,599][01612] Num frames 1400... [2024-11-09 11:27:21,735][01612] Num frames 1500... [2024-11-09 11:27:21,864][01612] Num frames 1600... [2024-11-09 11:27:21,986][01612] Num frames 1700... [2024-11-09 11:27:22,110][01612] Num frames 1800... [2024-11-09 11:27:22,244][01612] Num frames 1900... [2024-11-09 11:27:22,376][01612] Avg episode rewards: #0: 24.320, true rewards: #0: 9.820 [2024-11-09 11:27:22,377][01612] Avg episode reward: 24.320, avg true_objective: 9.820 [2024-11-09 11:27:22,429][01612] Num frames 2000... [2024-11-09 11:27:22,596][01612] Num frames 2100... [2024-11-09 11:27:22,788][01612] Num frames 2200... [2024-11-09 11:27:22,970][01612] Num frames 2300... [2024-11-09 11:27:23,136][01612] Num frames 2400... [2024-11-09 11:27:23,296][01612] Num frames 2500... [2024-11-09 11:27:23,467][01612] Num frames 2600... [2024-11-09 11:27:23,641][01612] Num frames 2700... [2024-11-09 11:27:23,813][01612] Num frames 2800... [2024-11-09 11:27:23,981][01612] Num frames 2900... [2024-11-09 11:27:24,155][01612] Num frames 3000... [2024-11-09 11:27:24,332][01612] Num frames 3100... [2024-11-09 11:27:24,514][01612] Num frames 3200... [2024-11-09 11:27:24,710][01612] Num frames 3300... [2024-11-09 11:27:24,900][01612] Num frames 3400... [2024-11-09 11:27:25,076][01612] Num frames 3500... [2024-11-09 11:27:25,209][01612] Num frames 3600... [2024-11-09 11:27:25,333][01612] Num frames 3700... [2024-11-09 11:27:25,474][01612] Num frames 3800... [2024-11-09 11:27:25,580][01612] Avg episode rewards: #0: 31.790, true rewards: #0: 12.790 [2024-11-09 11:27:25,581][01612] Avg episode reward: 31.790, avg true_objective: 12.790 [2024-11-09 11:27:25,674][01612] Num frames 3900... [2024-11-09 11:27:25,805][01612] Num frames 4000... [2024-11-09 11:27:25,946][01612] Num frames 4100... [2024-11-09 11:27:26,075][01612] Num frames 4200... [2024-11-09 11:27:26,197][01612] Num frames 4300... [2024-11-09 11:27:26,318][01612] Num frames 4400... [2024-11-09 11:27:26,440][01612] Num frames 4500... [2024-11-09 11:27:26,565][01612] Num frames 4600... [2024-11-09 11:27:26,708][01612] Avg episode rewards: #0: 28.922, true rewards: #0: 11.672 [2024-11-09 11:27:26,710][01612] Avg episode reward: 28.922, avg true_objective: 11.672 [2024-11-09 11:27:26,749][01612] Num frames 4700... [2024-11-09 11:27:26,875][01612] Num frames 4800... [2024-11-09 11:27:27,021][01612] Num frames 4900... [2024-11-09 11:27:27,154][01612] Num frames 5000... [2024-11-09 11:27:27,284][01612] Num frames 5100... [2024-11-09 11:27:27,409][01612] Num frames 5200... [2024-11-09 11:27:27,534][01612] Num frames 5300... [2024-11-09 11:27:27,660][01612] Num frames 5400... [2024-11-09 11:27:27,796][01612] Num frames 5500... [2024-11-09 11:27:27,919][01612] Num frames 5600... [2024-11-09 11:27:27,984][01612] Avg episode rewards: #0: 27.614, true rewards: #0: 11.214 [2024-11-09 11:27:27,985][01612] Avg episode reward: 27.614, avg true_objective: 11.214 [2024-11-09 11:27:28,095][01612] Num frames 5700... [2024-11-09 11:27:28,221][01612] Num frames 5800... [2024-11-09 11:27:28,346][01612] Num frames 5900... [2024-11-09 11:27:28,483][01612] Num frames 6000... [2024-11-09 11:27:28,610][01612] Num frames 6100... [2024-11-09 11:27:28,749][01612] Num frames 6200... [2024-11-09 11:27:28,881][01612] Num frames 6300... [2024-11-09 11:27:29,004][01612] Num frames 6400... [2024-11-09 11:27:29,159][01612] Avg episode rewards: #0: 26.137, true rewards: #0: 10.803 [2024-11-09 11:27:29,161][01612] Avg episode reward: 26.137, avg true_objective: 10.803 [2024-11-09 11:27:29,188][01612] Num frames 6500... [2024-11-09 11:27:29,317][01612] Num frames 6600... [2024-11-09 11:27:29,452][01612] Num frames 6700... [2024-11-09 11:27:29,574][01612] Num frames 6800... [2024-11-09 11:27:29,709][01612] Num frames 6900... [2024-11-09 11:27:29,845][01612] Num frames 7000... [2024-11-09 11:27:29,970][01612] Num frames 7100... [2024-11-09 11:27:30,091][01612] Num frames 7200... [2024-11-09 11:27:30,215][01612] Num frames 7300... [2024-11-09 11:27:30,345][01612] Num frames 7400... [2024-11-09 11:27:30,416][01612] Avg episode rewards: #0: 25.871, true rewards: #0: 10.586 [2024-11-09 11:27:30,418][01612] Avg episode reward: 25.871, avg true_objective: 10.586 [2024-11-09 11:27:30,531][01612] Num frames 7500... [2024-11-09 11:27:30,656][01612] Num frames 7600... [2024-11-09 11:27:30,789][01612] Num frames 7700... [2024-11-09 11:27:30,923][01612] Num frames 7800... [2024-11-09 11:27:31,050][01612] Num frames 7900... [2024-11-09 11:27:31,208][01612] Avg episode rewards: #0: 24.233, true rewards: #0: 9.982 [2024-11-09 11:27:31,210][01612] Avg episode reward: 24.233, avg true_objective: 9.982 [2024-11-09 11:27:31,229][01612] Num frames 8000... [2024-11-09 11:27:31,355][01612] Num frames 8100... [2024-11-09 11:27:31,488][01612] Num frames 8200... [2024-11-09 11:27:31,615][01612] Num frames 8300... [2024-11-09 11:27:31,762][01612] Avg episode rewards: #0: 21.967, true rewards: #0: 9.300 [2024-11-09 11:27:31,764][01612] Avg episode reward: 21.967, avg true_objective: 9.300 [2024-11-09 11:27:31,801][01612] Num frames 8400... [2024-11-09 11:27:31,933][01612] Num frames 8500... [2024-11-09 11:27:32,055][01612] Num frames 8600... [2024-11-09 11:27:32,175][01612] Num frames 8700... [2024-11-09 11:27:32,295][01612] Num frames 8800... [2024-11-09 11:27:32,427][01612] Num frames 8900... [2024-11-09 11:27:32,556][01612] Num frames 9000... [2024-11-09 11:27:32,691][01612] Num frames 9100... [2024-11-09 11:27:32,835][01612] Avg episode rewards: #0: 21.470, true rewards: #0: 9.170 [2024-11-09 11:27:32,837][01612] Avg episode reward: 21.470, avg true_objective: 9.170 [2024-11-09 11:28:32,167][01612] Replay video saved to /content/train_dir/default_experiment/replay.mp4!