diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,50 @@ -[2023-08-30 09:42:07,326][00929] Saving configuration to /content/train_dir/default_experiment/config.json... -[2023-08-30 09:42:07,328][00929] Rollout worker 0 uses device cpu -[2023-08-30 09:42:07,331][00929] Rollout worker 1 uses device cpu -[2023-08-30 09:42:07,333][00929] Rollout worker 2 uses device cpu -[2023-08-30 09:42:07,334][00929] Rollout worker 3 uses device cpu -[2023-08-30 09:42:07,335][00929] Rollout worker 4 uses device cpu -[2023-08-30 09:42:07,336][00929] Rollout worker 5 uses device cpu -[2023-08-30 09:42:07,338][00929] Rollout worker 6 uses device cpu -[2023-08-30 09:42:07,342][00929] Rollout worker 7 uses device cpu -[2023-08-30 09:42:07,518][00929] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-08-30 09:42:07,523][00929] InferenceWorker_p0-w0: min num requests: 2 -[2023-08-30 09:42:07,572][00929] Starting all processes... -[2023-08-30 09:42:07,576][00929] Starting process learner_proc0 -[2023-08-30 09:42:07,649][00929] Starting all processes... -[2023-08-30 09:42:07,658][00929] Starting process inference_proc0-0 -[2023-08-30 09:42:07,659][00929] Starting process rollout_proc0 -[2023-08-30 09:42:07,659][00929] Starting process rollout_proc1 -[2023-08-30 09:42:07,659][00929] Starting process rollout_proc2 -[2023-08-30 09:42:07,659][00929] Starting process rollout_proc3 -[2023-08-30 09:42:07,659][00929] Starting process rollout_proc4 -[2023-08-30 09:42:07,659][00929] Starting process rollout_proc5 -[2023-08-30 09:42:07,659][00929] Starting process rollout_proc6 -[2023-08-30 09:42:07,659][00929] Starting process rollout_proc7 -[2023-08-30 09:42:25,337][08367] Worker 5 uses CPU cores [1] -[2023-08-30 09:42:25,537][08365] Worker 3 uses CPU cores [1] -[2023-08-30 09:42:25,623][08348] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-08-30 09:42:25,624][08348] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2023-08-30 09:42:25,691][08369] Worker 7 uses CPU cores [1] -[2023-08-30 09:42:25,720][08348] Num visible devices: 1 -[2023-08-30 09:42:25,755][08348] Starting seed is not provided -[2023-08-30 09:42:25,755][08348] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-08-30 09:42:25,755][08348] Initializing actor-critic model on device cuda:0 -[2023-08-30 09:42:25,756][08348] RunningMeanStd input shape: (3, 72, 128) -[2023-08-30 09:42:25,758][08348] RunningMeanStd input shape: (1,) -[2023-08-30 09:42:25,772][08366] Worker 4 uses CPU cores [0] -[2023-08-30 09:42:25,869][08364] Worker 2 uses CPU cores [0] -[2023-08-30 09:42:25,888][08348] ConvEncoder: input_channels=3 -[2023-08-30 09:42:25,944][08363] Worker 1 uses CPU cores [1] -[2023-08-30 09:42:25,984][08361] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-08-30 09:42:25,984][08361] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2023-08-30 09:42:26,039][08361] Num visible devices: 1 -[2023-08-30 09:42:26,213][08368] Worker 6 uses CPU cores [0] -[2023-08-30 09:42:26,248][08362] Worker 0 uses CPU cores [0] -[2023-08-30 09:42:26,497][08348] Conv encoder output size: 512 -[2023-08-30 09:42:26,498][08348] Policy head output size: 512 -[2023-08-30 09:42:26,564][08348] Created Actor Critic model with architecture: -[2023-08-30 09:42:26,564][08348] ActorCriticSharedWeights( +[2023-08-31 04:32:41,787][00354] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-08-31 04:32:41,792][00354] Rollout worker 0 uses device cpu +[2023-08-31 04:32:41,794][00354] Rollout worker 1 uses device cpu +[2023-08-31 04:32:41,795][00354] Rollout worker 2 uses device cpu +[2023-08-31 04:32:41,796][00354] Rollout worker 3 uses device cpu +[2023-08-31 04:32:41,799][00354] Rollout worker 4 uses device cpu +[2023-08-31 04:32:41,801][00354] Rollout worker 5 uses device cpu +[2023-08-31 04:32:41,802][00354] Rollout worker 6 uses device cpu +[2023-08-31 04:32:41,804][00354] Rollout worker 7 uses device cpu +[2023-08-31 04:32:41,988][00354] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-08-31 04:32:41,993][00354] InferenceWorker_p0-w0: min num requests: 2 +[2023-08-31 04:32:42,037][00354] Starting all processes... +[2023-08-31 04:32:42,042][00354] Starting process learner_proc0 +[2023-08-31 04:32:42,119][00354] Starting all processes... +[2023-08-31 04:32:42,133][00354] Starting process inference_proc0-0 +[2023-08-31 04:32:42,138][00354] Starting process rollout_proc0 +[2023-08-31 04:32:42,139][00354] Starting process rollout_proc1 +[2023-08-31 04:32:42,139][00354] Starting process rollout_proc2 +[2023-08-31 04:32:42,139][00354] Starting process rollout_proc3 +[2023-08-31 04:32:42,139][00354] Starting process rollout_proc4 +[2023-08-31 04:32:42,139][00354] Starting process rollout_proc5 +[2023-08-31 04:32:42,139][00354] Starting process rollout_proc6 +[2023-08-31 04:32:42,139][00354] Starting process rollout_proc7 +[2023-08-31 04:32:58,851][07796] Worker 4 uses CPU cores [0] +[2023-08-31 04:32:59,222][07777] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-08-31 04:32:59,223][07777] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-08-31 04:32:59,289][07794] Worker 3 uses CPU cores [1] +[2023-08-31 04:32:59,304][07777] Num visible devices: 1 +[2023-08-31 04:32:59,337][07777] Starting seed is not provided +[2023-08-31 04:32:59,337][07777] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-08-31 04:32:59,338][07777] Initializing actor-critic model on device cuda:0 +[2023-08-31 04:32:59,339][07777] RunningMeanStd input shape: (3, 72, 128) +[2023-08-31 04:32:59,341][07777] RunningMeanStd input shape: (1,) +[2023-08-31 04:32:59,430][07777] ConvEncoder: input_channels=3 +[2023-08-31 04:32:59,650][07795] Worker 5 uses CPU cores [1] +[2023-08-31 04:32:59,655][07792] Worker 1 uses CPU cores [1] +[2023-08-31 04:32:59,671][07797] Worker 6 uses CPU cores [0] +[2023-08-31 04:32:59,768][07790] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-08-31 04:32:59,770][07790] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-08-31 04:32:59,830][07790] Num visible devices: 1 +[2023-08-31 04:32:59,947][07791] Worker 0 uses CPU cores [0] +[2023-08-31 04:32:59,998][07798] Worker 7 uses CPU cores [1] +[2023-08-31 04:33:00,043][07793] Worker 2 uses CPU cores [0] +[2023-08-31 04:33:00,171][07777] Conv encoder output size: 512 +[2023-08-31 04:33:00,171][07777] Policy head output size: 512 +[2023-08-31 04:33:00,232][07777] Created Actor Critic model with architecture: +[2023-08-31 04:33:00,233][07777] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,1436 +85,1539 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2023-08-30 09:42:27,508][00929] Heartbeat connected on Batcher_0 -[2023-08-30 09:42:27,519][00929] Heartbeat connected on InferenceWorker_p0-w0 -[2023-08-30 09:42:27,530][00929] Heartbeat connected on RolloutWorker_w0 -[2023-08-30 09:42:27,537][00929] Heartbeat connected on RolloutWorker_w1 -[2023-08-30 09:42:27,544][00929] Heartbeat connected on RolloutWorker_w2 -[2023-08-30 09:42:27,549][00929] Heartbeat connected on RolloutWorker_w3 -[2023-08-30 09:42:27,555][00929] Heartbeat connected on RolloutWorker_w4 -[2023-08-30 09:42:27,560][00929] Heartbeat connected on RolloutWorker_w5 -[2023-08-30 09:42:27,568][00929] Heartbeat connected on RolloutWorker_w6 -[2023-08-30 09:42:27,571][00929] Heartbeat connected on RolloutWorker_w7 -[2023-08-30 09:42:35,028][08348] Using optimizer -[2023-08-30 09:42:35,029][08348] No checkpoints found -[2023-08-30 09:42:35,029][08348] Did not load from checkpoint, starting from scratch! -[2023-08-30 09:42:35,029][08348] Initialized policy 0 weights for model version 0 -[2023-08-30 09:42:35,032][08348] LearnerWorker_p0 finished initialization! -[2023-08-30 09:42:35,033][08348] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-08-30 09:42:35,033][00929] Heartbeat connected on LearnerWorker_p0 -[2023-08-30 09:42:35,129][08361] RunningMeanStd input shape: (3, 72, 128) -[2023-08-30 09:42:35,130][08361] RunningMeanStd input shape: (1,) -[2023-08-30 09:42:35,142][08361] ConvEncoder: input_channels=3 -[2023-08-30 09:42:35,242][08361] Conv encoder output size: 512 -[2023-08-30 09:42:35,242][08361] Policy head output size: 512 -[2023-08-30 09:42:35,356][00929] Inference worker 0-0 is ready! -[2023-08-30 09:42:35,358][00929] All inference workers are ready! Signal rollout workers to start! -[2023-08-30 09:42:35,723][08363] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 09:42:35,727][08367] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 09:42:35,729][08365] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 09:42:35,726][08369] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 09:42:35,773][08366] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 09:42:35,779][08362] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 09:42:35,775][08364] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 09:42:35,777][08368] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 09:42:36,775][00929] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-08-30 09:42:36,781][08362] Decorrelating experience for 0 frames... -[2023-08-30 09:42:36,781][08366] Decorrelating experience for 0 frames... -[2023-08-30 09:42:37,120][08365] Decorrelating experience for 0 frames... -[2023-08-30 09:42:37,125][08363] Decorrelating experience for 0 frames... -[2023-08-30 09:42:37,128][08369] Decorrelating experience for 0 frames... -[2023-08-30 09:42:37,958][08362] Decorrelating experience for 32 frames... -[2023-08-30 09:42:37,956][08366] Decorrelating experience for 32 frames... -[2023-08-30 09:42:38,318][08365] Decorrelating experience for 32 frames... -[2023-08-30 09:42:38,321][08369] Decorrelating experience for 32 frames... -[2023-08-30 09:42:38,326][08363] Decorrelating experience for 32 frames... -[2023-08-30 09:42:38,855][08368] Decorrelating experience for 0 frames... -[2023-08-30 09:42:38,861][08364] Decorrelating experience for 0 frames... -[2023-08-30 09:42:39,951][08367] Decorrelating experience for 0 frames... -[2023-08-30 09:42:39,984][08369] Decorrelating experience for 64 frames... -[2023-08-30 09:42:40,501][08362] Decorrelating experience for 64 frames... -[2023-08-30 09:42:40,515][08366] Decorrelating experience for 64 frames... -[2023-08-30 09:42:40,865][08364] Decorrelating experience for 32 frames... -[2023-08-30 09:42:40,874][08368] Decorrelating experience for 32 frames... -[2023-08-30 09:42:41,746][08367] Decorrelating experience for 32 frames... -[2023-08-30 09:42:41,775][00929] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-08-30 09:42:41,980][08365] Decorrelating experience for 64 frames... -[2023-08-30 09:42:42,294][08362] Decorrelating experience for 96 frames... -[2023-08-30 09:42:42,408][08363] Decorrelating experience for 64 frames... -[2023-08-30 09:42:43,235][08368] Decorrelating experience for 64 frames... -[2023-08-30 09:42:44,299][08369] Decorrelating experience for 96 frames... -[2023-08-30 09:42:44,318][08364] Decorrelating experience for 64 frames... -[2023-08-30 09:42:44,462][08365] Decorrelating experience for 96 frames... -[2023-08-30 09:42:45,198][08363] Decorrelating experience for 96 frames... -[2023-08-30 09:42:46,471][08366] Decorrelating experience for 96 frames... -[2023-08-30 09:42:46,775][00929] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 5.8. Samples: 58. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-08-30 09:42:46,780][00929] Avg episode reward: [(0, '1.608')] -[2023-08-30 09:42:47,577][08367] Decorrelating experience for 64 frames... -[2023-08-30 09:42:48,552][08364] Decorrelating experience for 96 frames... -[2023-08-30 09:42:48,916][08368] Decorrelating experience for 96 frames... -[2023-08-30 09:42:51,087][08348] Signal inference workers to stop experience collection... -[2023-08-30 09:42:51,099][08361] InferenceWorker_p0-w0: stopping experience collection -[2023-08-30 09:42:51,284][08367] Decorrelating experience for 96 frames... -[2023-08-30 09:42:51,775][00929] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 148.3. Samples: 2224. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-08-30 09:42:51,777][00929] Avg episode reward: [(0, '2.857')] -[2023-08-30 09:42:54,842][08348] Signal inference workers to resume experience collection... -[2023-08-30 09:42:54,843][08361] InferenceWorker_p0-w0: resuming experience collection -[2023-08-30 09:42:56,775][00929] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 178.2. Samples: 3564. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2023-08-30 09:42:56,781][00929] Avg episode reward: [(0, '2.972')] -[2023-08-30 09:43:01,775][00929] Fps is (10 sec: 2457.6, 60 sec: 983.0, 300 sec: 983.0). Total num frames: 24576. Throughput: 0: 213.4. Samples: 5334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) -[2023-08-30 09:43:01,787][00929] Avg episode reward: [(0, '3.534')] -[2023-08-30 09:43:06,775][00929] Fps is (10 sec: 3276.8, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 36864. Throughput: 0: 299.1. Samples: 8974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:43:06,780][00929] Avg episode reward: [(0, '3.730')] -[2023-08-30 09:43:08,284][08361] Updated weights for policy 0, policy_version 10 (0.0020) -[2023-08-30 09:43:11,775][00929] Fps is (10 sec: 2457.6, 60 sec: 1404.3, 300 sec: 1404.3). Total num frames: 49152. Throughput: 0: 374.5. Samples: 13108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:43:11,777][00929] Avg episode reward: [(0, '4.289')] -[2023-08-30 09:43:16,775][00929] Fps is (10 sec: 3276.8, 60 sec: 1740.8, 300 sec: 1740.8). Total num frames: 69632. Throughput: 0: 400.2. Samples: 16010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-08-30 09:43:16,777][00929] Avg episode reward: [(0, '4.534')] -[2023-08-30 09:43:19,524][08361] Updated weights for policy 0, policy_version 20 (0.0015) -[2023-08-30 09:43:21,775][00929] Fps is (10 sec: 3686.4, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 86016. Throughput: 0: 486.0. Samples: 21870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:43:21,777][00929] Avg episode reward: [(0, '4.473')] -[2023-08-30 09:43:26,783][00929] Fps is (10 sec: 2865.0, 60 sec: 1965.8, 300 sec: 1965.8). Total num frames: 98304. Throughput: 0: 566.1. Samples: 25480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:43:26,791][00929] Avg episode reward: [(0, '4.590')] -[2023-08-30 09:43:31,777][00929] Fps is (10 sec: 2047.6, 60 sec: 1936.2, 300 sec: 1936.2). Total num frames: 106496. Throughput: 0: 583.4. Samples: 26314. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) -[2023-08-30 09:43:31,783][00929] Avg episode reward: [(0, '4.575')] -[2023-08-30 09:43:31,796][08348] Saving new best policy, reward=4.575! -[2023-08-30 09:43:35,778][08361] Updated weights for policy 0, policy_version 30 (0.0022) -[2023-08-30 09:43:36,775][00929] Fps is (10 sec: 2459.5, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 122880. Throughput: 0: 632.9. Samples: 30706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:43:36,777][00929] Avg episode reward: [(0, '4.546')] -[2023-08-30 09:43:41,775][00929] Fps is (10 sec: 3687.1, 60 sec: 2389.4, 300 sec: 2205.5). Total num frames: 143360. Throughput: 0: 732.1. Samples: 36510. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-08-30 09:43:41,779][00929] Avg episode reward: [(0, '4.370')] -[2023-08-30 09:43:46,775][00929] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2223.5). Total num frames: 155648. Throughput: 0: 734.7. Samples: 38396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:43:46,780][00929] Avg episode reward: [(0, '4.461')] -[2023-08-30 09:43:49,380][08361] Updated weights for policy 0, policy_version 40 (0.0035) -[2023-08-30 09:43:51,775][00929] Fps is (10 sec: 2457.5, 60 sec: 2798.9, 300 sec: 2239.1). Total num frames: 167936. Throughput: 0: 736.3. Samples: 42108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:43:51,778][00929] Avg episode reward: [(0, '4.437')] -[2023-08-30 09:43:56,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2355.2). Total num frames: 188416. Throughput: 0: 755.6. Samples: 47108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:43:56,777][00929] Avg episode reward: [(0, '4.320')] -[2023-08-30 09:44:01,067][08361] Updated weights for policy 0, policy_version 50 (0.0021) -[2023-08-30 09:44:01,775][00929] Fps is (10 sec: 3686.5, 60 sec: 3003.7, 300 sec: 2409.4). Total num frames: 204800. Throughput: 0: 756.1. Samples: 50034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:44:01,777][00929] Avg episode reward: [(0, '4.354')] -[2023-08-30 09:44:01,788][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000050_204800.pth... -[2023-08-30 09:44:06,776][00929] Fps is (10 sec: 2867.0, 60 sec: 3003.7, 300 sec: 2412.1). Total num frames: 217088. Throughput: 0: 734.5. Samples: 54922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:44:06,779][00929] Avg episode reward: [(0, '4.413')] -[2023-08-30 09:44:11,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2414.5). Total num frames: 229376. Throughput: 0: 737.9. Samples: 58680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:44:11,782][00929] Avg episode reward: [(0, '4.545')] -[2023-08-30 09:44:16,118][08361] Updated weights for policy 0, policy_version 60 (0.0014) -[2023-08-30 09:44:16,775][00929] Fps is (10 sec: 2867.4, 60 sec: 2935.5, 300 sec: 2457.6). Total num frames: 245760. Throughput: 0: 760.4. Samples: 60530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:44:16,778][00929] Avg episode reward: [(0, '4.568')] -[2023-08-30 09:44:21,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2535.6). Total num frames: 266240. Throughput: 0: 788.8. Samples: 66200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:44:21,785][00929] Avg episode reward: [(0, '4.490')] -[2023-08-30 09:44:26,775][00929] Fps is (10 sec: 3686.5, 60 sec: 3072.4, 300 sec: 2569.3). Total num frames: 282624. Throughput: 0: 774.2. Samples: 71350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:44:26,781][00929] Avg episode reward: [(0, '4.379')] -[2023-08-30 09:44:27,559][08361] Updated weights for policy 0, policy_version 70 (0.0022) -[2023-08-30 09:44:31,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3140.4, 300 sec: 2564.5). Total num frames: 294912. Throughput: 0: 772.9. Samples: 73176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:44:31,779][00929] Avg episode reward: [(0, '4.289')] -[2023-08-30 09:44:36,777][00929] Fps is (10 sec: 2457.1, 60 sec: 3071.9, 300 sec: 2560.0). Total num frames: 307200. Throughput: 0: 771.7. Samples: 76836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:44:36,779][00929] Avg episode reward: [(0, '4.273')] -[2023-08-30 09:44:41,454][08361] Updated weights for policy 0, policy_version 80 (0.0039) -[2023-08-30 09:44:41,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2621.4). Total num frames: 327680. Throughput: 0: 787.3. Samples: 82538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:44:41,777][00929] Avg episode reward: [(0, '4.324')] -[2023-08-30 09:44:46,775][00929] Fps is (10 sec: 3687.1, 60 sec: 3140.3, 300 sec: 2646.6). Total num frames: 344064. Throughput: 0: 787.5. Samples: 85472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:44:46,777][00929] Avg episode reward: [(0, '4.474')] -[2023-08-30 09:44:51,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2670.0). Total num frames: 360448. Throughput: 0: 772.6. Samples: 89690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:44:51,782][00929] Avg episode reward: [(0, '4.308')] -[2023-08-30 09:44:55,002][08361] Updated weights for policy 0, policy_version 90 (0.0027) -[2023-08-30 09:44:56,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2662.4). Total num frames: 372736. Throughput: 0: 774.0. Samples: 93510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:44:56,779][00929] Avg episode reward: [(0, '4.381')] -[2023-08-30 09:45:01,776][00929] Fps is (10 sec: 2867.0, 60 sec: 3072.0, 300 sec: 2683.6). Total num frames: 389120. Throughput: 0: 788.4. Samples: 96010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:45:01,780][00929] Avg episode reward: [(0, '4.394')] -[2023-08-30 09:45:06,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2703.4). Total num frames: 405504. Throughput: 0: 785.4. Samples: 101542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:45:06,780][00929] Avg episode reward: [(0, '4.537')] -[2023-08-30 09:45:06,926][08361] Updated weights for policy 0, policy_version 100 (0.0021) -[2023-08-30 09:45:11,775][00929] Fps is (10 sec: 3277.0, 60 sec: 3208.5, 300 sec: 2721.9). Total num frames: 421888. Throughput: 0: 769.7. Samples: 105986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:45:11,779][00929] Avg episode reward: [(0, '4.537')] -[2023-08-30 09:45:16,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2713.6). Total num frames: 434176. Throughput: 0: 771.2. Samples: 107878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:45:16,780][00929] Avg episode reward: [(0, '4.678')] -[2023-08-30 09:45:16,782][08348] Saving new best policy, reward=4.678! -[2023-08-30 09:45:21,471][08361] Updated weights for policy 0, policy_version 110 (0.0025) -[2023-08-30 09:45:21,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2730.7). Total num frames: 450560. Throughput: 0: 783.6. Samples: 112096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:45:21,777][00929] Avg episode reward: [(0, '4.647')] -[2023-08-30 09:45:26,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2770.8). Total num frames: 471040. Throughput: 0: 787.7. Samples: 117986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:45:26,778][00929] Avg episode reward: [(0, '4.503')] -[2023-08-30 09:45:31,776][00929] Fps is (10 sec: 3276.5, 60 sec: 3140.2, 300 sec: 2761.9). Total num frames: 483328. Throughput: 0: 783.4. Samples: 120726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:45:31,778][00929] Avg episode reward: [(0, '4.610')] -[2023-08-30 09:45:33,648][08361] Updated weights for policy 0, policy_version 120 (0.0014) -[2023-08-30 09:45:36,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3140.4, 300 sec: 2753.4). Total num frames: 495616. Throughput: 0: 773.8. Samples: 124510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:45:36,778][00929] Avg episode reward: [(0, '4.637')] -[2023-08-30 09:45:41,775][00929] Fps is (10 sec: 2867.4, 60 sec: 3072.0, 300 sec: 2767.6). Total num frames: 512000. Throughput: 0: 778.0. Samples: 128518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:45:41,784][00929] Avg episode reward: [(0, '4.771')] -[2023-08-30 09:45:41,794][08348] Saving new best policy, reward=4.771! -[2023-08-30 09:45:46,704][08361] Updated weights for policy 0, policy_version 130 (0.0020) -[2023-08-30 09:45:46,775][00929] Fps is (10 sec: 3686.3, 60 sec: 3140.3, 300 sec: 2802.5). Total num frames: 532480. Throughput: 0: 786.7. Samples: 131410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:45:46,783][00929] Avg episode reward: [(0, '4.785')] -[2023-08-30 09:45:46,786][08348] Saving new best policy, reward=4.785! -[2023-08-30 09:45:51,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2814.7). Total num frames: 548864. Throughput: 0: 791.5. Samples: 137158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:45:51,781][00929] Avg episode reward: [(0, '4.563')] -[2023-08-30 09:45:56,775][00929] Fps is (10 sec: 2867.3, 60 sec: 3140.3, 300 sec: 2805.8). Total num frames: 561152. Throughput: 0: 774.1. Samples: 140822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:45:56,779][00929] Avg episode reward: [(0, '4.692')] -[2023-08-30 09:46:01,436][08361] Updated weights for policy 0, policy_version 140 (0.0019) -[2023-08-30 09:46:01,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2797.3). Total num frames: 573440. Throughput: 0: 771.9. Samples: 142612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:46:01,781][00929] Avg episode reward: [(0, '4.649')] -[2023-08-30 09:46:01,796][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000140_573440.pth... -[2023-08-30 09:46:06,776][00929] Fps is (10 sec: 2457.4, 60 sec: 3003.7, 300 sec: 2789.2). Total num frames: 585728. Throughput: 0: 755.8. Samples: 146106. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-08-30 09:46:06,784][00929] Avg episode reward: [(0, '4.704')] -[2023-08-30 09:46:11,777][00929] Fps is (10 sec: 2047.6, 60 sec: 2867.1, 300 sec: 2762.4). Total num frames: 593920. Throughput: 0: 702.8. Samples: 149612. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-08-30 09:46:11,784][00929] Avg episode reward: [(0, '4.433')] -[2023-08-30 09:46:16,775][00929] Fps is (10 sec: 2048.2, 60 sec: 2867.2, 300 sec: 2755.5). Total num frames: 606208. Throughput: 0: 677.5. Samples: 151214. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:46:16,782][00929] Avg episode reward: [(0, '4.522')] -[2023-08-30 09:46:18,991][08361] Updated weights for policy 0, policy_version 150 (0.0018) -[2023-08-30 09:46:21,775][00929] Fps is (10 sec: 2458.1, 60 sec: 2798.9, 300 sec: 2748.9). Total num frames: 618496. Throughput: 0: 677.0. Samples: 154974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:46:21,782][00929] Avg episode reward: [(0, '4.605')] -[2023-08-30 09:46:26,775][00929] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2760.3). Total num frames: 634880. Throughput: 0: 686.7. Samples: 159420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:46:26,777][00929] Avg episode reward: [(0, '4.669')] -[2023-08-30 09:46:31,628][08361] Updated weights for policy 0, policy_version 160 (0.0039) -[2023-08-30 09:46:31,775][00929] Fps is (10 sec: 3686.5, 60 sec: 2867.2, 300 sec: 2788.8). Total num frames: 655360. Throughput: 0: 684.7. Samples: 162222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:46:31,781][00929] Avg episode reward: [(0, '4.560')] -[2023-08-30 09:46:36,775][00929] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 2798.9). Total num frames: 671744. Throughput: 0: 675.2. Samples: 167542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:46:36,779][00929] Avg episode reward: [(0, '4.498')] -[2023-08-30 09:46:41,776][00929] Fps is (10 sec: 2867.0, 60 sec: 2867.2, 300 sec: 2792.0). Total num frames: 684032. Throughput: 0: 675.0. Samples: 171198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:46:41,782][00929] Avg episode reward: [(0, '4.593')] -[2023-08-30 09:46:46,477][08361] Updated weights for policy 0, policy_version 170 (0.0040) -[2023-08-30 09:46:46,775][00929] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2785.3). Total num frames: 696320. Throughput: 0: 677.1. Samples: 173080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:46:46,782][00929] Avg episode reward: [(0, '4.531')] -[2023-08-30 09:46:51,775][00929] Fps is (10 sec: 2867.4, 60 sec: 2730.7, 300 sec: 2794.9). Total num frames: 712704. Throughput: 0: 718.2. Samples: 178424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:46:51,778][00929] Avg episode reward: [(0, '4.492')] -[2023-08-30 09:46:56,775][00929] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2819.9). Total num frames: 733184. Throughput: 0: 765.2. Samples: 184046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:46:56,778][00929] Avg episode reward: [(0, '4.489')] -[2023-08-30 09:46:57,634][08361] Updated weights for policy 0, policy_version 180 (0.0014) -[2023-08-30 09:47:01,779][00929] Fps is (10 sec: 3275.4, 60 sec: 2867.0, 300 sec: 2813.1). Total num frames: 745472. Throughput: 0: 769.7. Samples: 185852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:47:01,784][00929] Avg episode reward: [(0, '4.443')] -[2023-08-30 09:47:06,775][00929] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2806.5). Total num frames: 757760. Throughput: 0: 767.6. Samples: 189516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:47:06,777][00929] Avg episode reward: [(0, '4.673')] -[2023-08-30 09:47:11,775][00929] Fps is (10 sec: 2868.4, 60 sec: 3003.8, 300 sec: 2815.1). Total num frames: 774144. Throughput: 0: 784.1. Samples: 194704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:47:11,781][00929] Avg episode reward: [(0, '4.841')] -[2023-08-30 09:47:11,792][08348] Saving new best policy, reward=4.841! -[2023-08-30 09:47:12,057][08361] Updated weights for policy 0, policy_version 190 (0.0026) -[2023-08-30 09:47:16,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2837.9). Total num frames: 794624. Throughput: 0: 785.0. Samples: 197546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:47:16,783][00929] Avg episode reward: [(0, '5.025')] -[2023-08-30 09:47:16,785][08348] Saving new best policy, reward=5.025! -[2023-08-30 09:47:21,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2831.3). Total num frames: 806912. Throughput: 0: 767.6. Samples: 202082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:47:21,780][00929] Avg episode reward: [(0, '5.094')] -[2023-08-30 09:47:21,792][08348] Saving new best policy, reward=5.094! -[2023-08-30 09:47:25,599][08361] Updated weights for policy 0, policy_version 200 (0.0013) -[2023-08-30 09:47:26,776][00929] Fps is (10 sec: 2457.2, 60 sec: 3071.9, 300 sec: 2824.8). Total num frames: 819200. Throughput: 0: 769.0. Samples: 205802. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:47:26,783][00929] Avg episode reward: [(0, '5.117')] -[2023-08-30 09:47:26,791][08348] Saving new best policy, reward=5.117! -[2023-08-30 09:47:31,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2832.5). Total num frames: 835584. Throughput: 0: 772.8. Samples: 207858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:47:31,778][00929] Avg episode reward: [(0, '5.087')] -[2023-08-30 09:47:36,775][00929] Fps is (10 sec: 3687.0, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 856064. Throughput: 0: 784.3. Samples: 213716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:47:36,783][00929] Avg episode reward: [(0, '5.147')] -[2023-08-30 09:47:36,786][08348] Saving new best policy, reward=5.147! -[2023-08-30 09:47:37,267][08361] Updated weights for policy 0, policy_version 210 (0.0025) -[2023-08-30 09:47:41,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2957.5). Total num frames: 872448. Throughput: 0: 767.8. Samples: 218596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:47:41,783][00929] Avg episode reward: [(0, '5.142')] -[2023-08-30 09:47:46,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2999.1). Total num frames: 884736. Throughput: 0: 769.7. Samples: 220486. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:47:46,779][00929] Avg episode reward: [(0, '5.189')] -[2023-08-30 09:47:46,784][08348] Saving new best policy, reward=5.189! -[2023-08-30 09:47:51,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 897024. Throughput: 0: 773.4. Samples: 224320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:47:51,782][00929] Avg episode reward: [(0, '5.157')] -[2023-08-30 09:47:51,881][08361] Updated weights for policy 0, policy_version 220 (0.0019) -[2023-08-30 09:47:56,775][00929] Fps is (10 sec: 3276.7, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 917504. Throughput: 0: 787.9. Samples: 230158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:47:56,783][00929] Avg episode reward: [(0, '5.515')] -[2023-08-30 09:47:56,785][08348] Saving new best policy, reward=5.515! -[2023-08-30 09:48:01,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.5, 300 sec: 3040.8). Total num frames: 933888. Throughput: 0: 788.4. Samples: 233024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:48:01,779][00929] Avg episode reward: [(0, '5.608')] -[2023-08-30 09:48:01,796][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000228_933888.pth... -[2023-08-30 09:48:01,933][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000050_204800.pth -[2023-08-30 09:48:01,945][08348] Saving new best policy, reward=5.608! -[2023-08-30 09:48:04,200][08361] Updated weights for policy 0, policy_version 230 (0.0030) -[2023-08-30 09:48:06,775][00929] Fps is (10 sec: 2867.3, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 946176. Throughput: 0: 770.7. Samples: 236762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:48:06,778][00929] Avg episode reward: [(0, '5.456')] -[2023-08-30 09:48:11,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 958464. Throughput: 0: 770.2. Samples: 240458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:48:11,781][00929] Avg episode reward: [(0, '5.251')] -[2023-08-30 09:48:16,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 978944. Throughput: 0: 787.4. Samples: 243290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:48:16,781][00929] Avg episode reward: [(0, '5.006')] -[2023-08-30 09:48:17,355][08361] Updated weights for policy 0, policy_version 240 (0.0030) -[2023-08-30 09:48:21,775][00929] Fps is (10 sec: 4096.1, 60 sec: 3208.5, 300 sec: 3054.7). Total num frames: 999424. Throughput: 0: 789.6. Samples: 249248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:48:21,779][00929] Avg episode reward: [(0, '5.384')] -[2023-08-30 09:48:26,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3068.5). Total num frames: 1011712. Throughput: 0: 773.3. Samples: 253394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:48:26,781][00929] Avg episode reward: [(0, '5.643')] -[2023-08-30 09:48:26,783][08348] Saving new best policy, reward=5.643! -[2023-08-30 09:48:31,523][08361] Updated weights for policy 0, policy_version 250 (0.0024) -[2023-08-30 09:48:31,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3054.6). Total num frames: 1024000. Throughput: 0: 770.7. Samples: 255168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:48:31,782][00929] Avg episode reward: [(0, '5.751')] -[2023-08-30 09:48:31,799][08348] Saving new best policy, reward=5.751! -[2023-08-30 09:48:36,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3040.8). Total num frames: 1040384. Throughput: 0: 784.7. Samples: 259632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:48:36,777][00929] Avg episode reward: [(0, '5.916')] -[2023-08-30 09:48:36,781][08348] Saving new best policy, reward=5.916! -[2023-08-30 09:48:41,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3054.6). Total num frames: 1056768. Throughput: 0: 781.2. Samples: 265312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:48:41,782][00929] Avg episode reward: [(0, '5.853')] -[2023-08-30 09:48:43,035][08361] Updated weights for policy 0, policy_version 260 (0.0013) -[2023-08-30 09:48:46,780][00929] Fps is (10 sec: 3275.2, 60 sec: 3140.0, 300 sec: 3068.5). Total num frames: 1073152. Throughput: 0: 769.1. Samples: 267636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:48:46,787][00929] Avg episode reward: [(0, '6.014')] -[2023-08-30 09:48:46,792][08348] Saving new best policy, reward=6.014! -[2023-08-30 09:48:51,776][00929] Fps is (10 sec: 2867.0, 60 sec: 3140.2, 300 sec: 3040.8). Total num frames: 1085440. Throughput: 0: 766.1. Samples: 271236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:48:51,780][00929] Avg episode reward: [(0, '6.219')] -[2023-08-30 09:48:51,793][08348] Saving new best policy, reward=6.219! -[2023-08-30 09:48:56,778][00929] Fps is (10 sec: 2458.1, 60 sec: 3003.6, 300 sec: 3026.8). Total num frames: 1097728. Throughput: 0: 778.0. Samples: 275468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:48:56,780][00929] Avg episode reward: [(0, '7.214')] -[2023-08-30 09:48:56,782][08348] Saving new best policy, reward=7.214! -[2023-08-30 09:48:58,067][08361] Updated weights for policy 0, policy_version 270 (0.0031) -[2023-08-30 09:49:01,775][00929] Fps is (10 sec: 3277.1, 60 sec: 3072.0, 300 sec: 3054.7). Total num frames: 1118208. Throughput: 0: 778.8. Samples: 278336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:49:01,782][00929] Avg episode reward: [(0, '7.577')] -[2023-08-30 09:49:01,797][08348] Saving new best policy, reward=7.577! -[2023-08-30 09:49:06,775][00929] Fps is (10 sec: 3687.4, 60 sec: 3140.3, 300 sec: 3068.5). Total num frames: 1134592. Throughput: 0: 771.0. Samples: 283944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:49:06,779][00929] Avg episode reward: [(0, '7.652')] -[2023-08-30 09:49:06,782][08348] Saving new best policy, reward=7.652! -[2023-08-30 09:49:10,947][08361] Updated weights for policy 0, policy_version 280 (0.0019) -[2023-08-30 09:49:11,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3054.6). Total num frames: 1146880. Throughput: 0: 759.9. Samples: 287590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:49:11,780][00929] Avg episode reward: [(0, '7.267')] -[2023-08-30 09:49:16,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 1159168. Throughput: 0: 761.0. Samples: 289414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:49:16,783][00929] Avg episode reward: [(0, '7.293')] -[2023-08-30 09:49:21,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3040.8). Total num frames: 1179648. Throughput: 0: 778.5. Samples: 294666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:49:21,777][00929] Avg episode reward: [(0, '7.186')] -[2023-08-30 09:49:23,389][08361] Updated weights for policy 0, policy_version 290 (0.0020) -[2023-08-30 09:49:26,775][00929] Fps is (10 sec: 4095.9, 60 sec: 3140.3, 300 sec: 3068.5). Total num frames: 1200128. Throughput: 0: 782.1. Samples: 300506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:49:26,781][00929] Avg episode reward: [(0, '7.638')] -[2023-08-30 09:49:31,776][00929] Fps is (10 sec: 3276.5, 60 sec: 3140.2, 300 sec: 3068.5). Total num frames: 1212416. Throughput: 0: 771.8. Samples: 302364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:49:31,779][00929] Avg episode reward: [(0, '7.803')] -[2023-08-30 09:49:31,794][08348] Saving new best policy, reward=7.803! -[2023-08-30 09:49:36,775][00929] Fps is (10 sec: 2457.7, 60 sec: 3072.0, 300 sec: 3040.8). Total num frames: 1224704. Throughput: 0: 773.0. Samples: 306020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:49:36,787][00929] Avg episode reward: [(0, '7.804')] -[2023-08-30 09:49:38,135][08361] Updated weights for policy 0, policy_version 300 (0.0018) -[2023-08-30 09:49:41,775][00929] Fps is (10 sec: 2867.5, 60 sec: 3072.0, 300 sec: 3040.8). Total num frames: 1241088. Throughput: 0: 788.1. Samples: 310930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:49:41,777][00929] Avg episode reward: [(0, '7.631')] -[2023-08-30 09:49:46,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3004.0, 300 sec: 3026.9). Total num frames: 1253376. Throughput: 0: 772.0. Samples: 313078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:49:46,777][00929] Avg episode reward: [(0, '7.464')] -[2023-08-30 09:49:51,780][00929] Fps is (10 sec: 2456.4, 60 sec: 3003.5, 300 sec: 3026.8). Total num frames: 1265664. Throughput: 0: 722.5. Samples: 316458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:49:51,782][00929] Avg episode reward: [(0, '7.729')] -[2023-08-30 09:49:53,357][08361] Updated weights for policy 0, policy_version 310 (0.0033) -[2023-08-30 09:49:56,775][00929] Fps is (10 sec: 2048.0, 60 sec: 2935.6, 300 sec: 2999.1). Total num frames: 1273856. Throughput: 0: 708.3. Samples: 319464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:49:56,779][00929] Avg episode reward: [(0, '7.967')] -[2023-08-30 09:49:56,781][08348] Saving new best policy, reward=7.967! -[2023-08-30 09:50:01,775][00929] Fps is (10 sec: 2049.0, 60 sec: 2798.9, 300 sec: 2985.2). Total num frames: 1286144. Throughput: 0: 705.6. Samples: 321164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:50:01,781][00929] Avg episode reward: [(0, '8.076')] -[2023-08-30 09:50:01,800][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth... -[2023-08-30 09:50:01,932][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000140_573440.pth -[2023-08-30 09:50:01,943][08348] Saving new best policy, reward=8.076! -[2023-08-30 09:50:06,775][00929] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2985.2). Total num frames: 1302528. Throughput: 0: 693.2. Samples: 325858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:50:06,778][00929] Avg episode reward: [(0, '8.435')] -[2023-08-30 09:50:06,842][08348] Saving new best policy, reward=8.435! -[2023-08-30 09:50:08,086][08361] Updated weights for policy 0, policy_version 320 (0.0026) -[2023-08-30 09:50:11,775][00929] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 3013.0). Total num frames: 1323008. Throughput: 0: 692.0. Samples: 331644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:50:11,778][00929] Avg episode reward: [(0, '8.677')] -[2023-08-30 09:50:11,787][08348] Saving new best policy, reward=8.677! -[2023-08-30 09:50:16,775][00929] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2999.1). Total num frames: 1335296. Throughput: 0: 694.8. Samples: 333630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:50:16,778][00929] Avg episode reward: [(0, '8.693')] -[2023-08-30 09:50:16,786][08348] Saving new best policy, reward=8.693! -[2023-08-30 09:50:21,775][00929] Fps is (10 sec: 2457.5, 60 sec: 2798.9, 300 sec: 2971.3). Total num frames: 1347584. Throughput: 0: 695.5. Samples: 337316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:50:21,782][00929] Avg episode reward: [(0, '9.026')] -[2023-08-30 09:50:21,793][08348] Saving new best policy, reward=9.026! -[2023-08-30 09:50:22,323][08361] Updated weights for policy 0, policy_version 330 (0.0027) -[2023-08-30 09:50:26,775][00929] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2985.2). Total num frames: 1363968. Throughput: 0: 690.8. Samples: 342014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:50:26,777][00929] Avg episode reward: [(0, '9.266')] -[2023-08-30 09:50:26,780][08348] Saving new best policy, reward=9.266! -[2023-08-30 09:50:31,775][00929] Fps is (10 sec: 3686.6, 60 sec: 2867.3, 300 sec: 3013.0). Total num frames: 1384448. Throughput: 0: 705.8. Samples: 344838. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:50:31,778][00929] Avg episode reward: [(0, '8.932')] -[2023-08-30 09:50:33,633][08361] Updated weights for policy 0, policy_version 340 (0.0029) -[2023-08-30 09:50:36,779][00929] Fps is (10 sec: 3275.6, 60 sec: 2867.0, 300 sec: 2999.1). Total num frames: 1396736. Throughput: 0: 746.4. Samples: 350046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:50:36,781][00929] Avg episode reward: [(0, '9.199')] -[2023-08-30 09:50:41,775][00929] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2985.2). Total num frames: 1413120. Throughput: 0: 762.7. Samples: 353786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:50:41,783][00929] Avg episode reward: [(0, '9.290')] -[2023-08-30 09:50:41,802][08348] Saving new best policy, reward=9.290! -[2023-08-30 09:50:46,775][00929] Fps is (10 sec: 2868.3, 60 sec: 2867.2, 300 sec: 2971.3). Total num frames: 1425408. Throughput: 0: 765.9. Samples: 355630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:50:46,777][00929] Avg episode reward: [(0, '9.280')] -[2023-08-30 09:50:48,242][08361] Updated weights for policy 0, policy_version 350 (0.0018) -[2023-08-30 09:50:51,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3004.0, 300 sec: 2999.1). Total num frames: 1445888. Throughput: 0: 785.2. Samples: 361190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-08-30 09:50:51,781][00929] Avg episode reward: [(0, '9.792')] -[2023-08-30 09:50:51,792][08348] Saving new best policy, reward=9.792! -[2023-08-30 09:50:56,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 1462272. Throughput: 0: 776.8. Samples: 366602. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-08-30 09:50:56,777][00929] Avg episode reward: [(0, '9.829')] -[2023-08-30 09:50:56,779][08348] Saving new best policy, reward=9.829! -[2023-08-30 09:51:01,047][08361] Updated weights for policy 0, policy_version 360 (0.0032) -[2023-08-30 09:51:01,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 1474560. Throughput: 0: 770.6. Samples: 368308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:51:01,778][00929] Avg episode reward: [(0, '9.912')] -[2023-08-30 09:51:01,794][08348] Saving new best policy, reward=9.912! -[2023-08-30 09:51:06,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 1486848. Throughput: 0: 768.9. Samples: 371916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:51:06,777][00929] Avg episode reward: [(0, '10.419')] -[2023-08-30 09:51:06,781][08348] Saving new best policy, reward=10.419! -[2023-08-30 09:51:11,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3054.6). Total num frames: 1507328. Throughput: 0: 778.4. Samples: 377042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:51:11,781][00929] Avg episode reward: [(0, '10.882')] -[2023-08-30 09:51:11,792][08348] Saving new best policy, reward=10.882! -[2023-08-30 09:51:14,228][08361] Updated weights for policy 0, policy_version 370 (0.0025) -[2023-08-30 09:51:16,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3068.5). Total num frames: 1523712. Throughput: 0: 775.0. Samples: 379712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:51:16,784][00929] Avg episode reward: [(0, '11.214')] -[2023-08-30 09:51:16,787][08348] Saving new best policy, reward=11.214! -[2023-08-30 09:51:21,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3054.6). Total num frames: 1536000. Throughput: 0: 754.3. Samples: 383986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:51:21,781][00929] Avg episode reward: [(0, '11.094')] -[2023-08-30 09:51:26,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 1548288. Throughput: 0: 754.6. Samples: 387742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:51:26,778][00929] Avg episode reward: [(0, '10.685')] -[2023-08-30 09:51:28,930][08361] Updated weights for policy 0, policy_version 380 (0.0033) -[2023-08-30 09:51:31,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 1564672. Throughput: 0: 765.0. Samples: 390054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:51:31,780][00929] Avg episode reward: [(0, '11.228')] -[2023-08-30 09:51:31,792][08348] Saving new best policy, reward=11.228! -[2023-08-30 09:51:36,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.5, 300 sec: 3054.7). Total num frames: 1585152. Throughput: 0: 770.9. Samples: 395880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:51:36,778][00929] Avg episode reward: [(0, '11.064')] -[2023-08-30 09:51:40,344][08361] Updated weights for policy 0, policy_version 390 (0.0019) -[2023-08-30 09:51:41,778][00929] Fps is (10 sec: 3275.9, 60 sec: 3071.9, 300 sec: 3054.6). Total num frames: 1597440. Throughput: 0: 755.6. Samples: 400606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:51:41,780][00929] Avg episode reward: [(0, '11.284')] -[2023-08-30 09:51:41,798][08348] Saving new best policy, reward=11.284! -[2023-08-30 09:51:46,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3040.8). Total num frames: 1609728. Throughput: 0: 757.2. Samples: 402380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:51:46,782][00929] Avg episode reward: [(0, '12.141')] -[2023-08-30 09:51:46,871][08348] Saving new best policy, reward=12.141! -[2023-08-30 09:51:51,775][00929] Fps is (10 sec: 2868.0, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 1626112. Throughput: 0: 767.5. Samples: 406454. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:51:51,778][00929] Avg episode reward: [(0, '10.944')] -[2023-08-30 09:51:54,173][08361] Updated weights for policy 0, policy_version 400 (0.0019) -[2023-08-30 09:51:56,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3054.7). Total num frames: 1646592. Throughput: 0: 784.1. Samples: 412326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:51:56,778][00929] Avg episode reward: [(0, '10.044')] -[2023-08-30 09:52:01,781][00929] Fps is (10 sec: 3684.1, 60 sec: 3139.9, 300 sec: 3068.5). Total num frames: 1662976. Throughput: 0: 788.7. Samples: 415210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:52:01,786][00929] Avg episode reward: [(0, '10.547')] -[2023-08-30 09:52:01,800][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth... -[2023-08-30 09:52:01,952][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000228_933888.pth -[2023-08-30 09:52:06,776][00929] Fps is (10 sec: 2867.0, 60 sec: 3140.2, 300 sec: 3054.6). Total num frames: 1675264. Throughput: 0: 773.5. Samples: 418792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:52:06,777][00929] Avg episode reward: [(0, '9.722')] -[2023-08-30 09:52:08,154][08361] Updated weights for policy 0, policy_version 410 (0.0022) -[2023-08-30 09:52:11,776][00929] Fps is (10 sec: 2458.9, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 1687552. Throughput: 0: 779.3. Samples: 422810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:52:11,780][00929] Avg episode reward: [(0, '10.935')] -[2023-08-30 09:52:16,775][00929] Fps is (10 sec: 3277.0, 60 sec: 3072.0, 300 sec: 3054.6). Total num frames: 1708032. Throughput: 0: 793.2. Samples: 425746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:52:16,778][00929] Avg episode reward: [(0, '11.525')] -[2023-08-30 09:52:19,448][08361] Updated weights for policy 0, policy_version 420 (0.0030) -[2023-08-30 09:52:21,780][00929] Fps is (10 sec: 3684.9, 60 sec: 3140.0, 300 sec: 3068.5). Total num frames: 1724416. Throughput: 0: 793.6. Samples: 431594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:52:21,783][00929] Avg episode reward: [(0, '11.301')] -[2023-08-30 09:52:26,778][00929] Fps is (10 sec: 2866.4, 60 sec: 3140.1, 300 sec: 3054.6). Total num frames: 1736704. Throughput: 0: 773.3. Samples: 435404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:52:26,780][00929] Avg episode reward: [(0, '11.711')] -[2023-08-30 09:52:31,775][00929] Fps is (10 sec: 2868.6, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 1753088. Throughput: 0: 774.1. Samples: 437216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:52:31,779][00929] Avg episode reward: [(0, '11.721')] -[2023-08-30 09:52:34,040][08361] Updated weights for policy 0, policy_version 430 (0.0025) -[2023-08-30 09:52:36,775][00929] Fps is (10 sec: 3277.7, 60 sec: 3072.0, 300 sec: 3040.8). Total num frames: 1769472. Throughput: 0: 790.6. Samples: 442030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:52:36,778][00929] Avg episode reward: [(0, '11.507')] -[2023-08-30 09:52:41,776][00929] Fps is (10 sec: 3686.1, 60 sec: 3208.6, 300 sec: 3068.5). Total num frames: 1789952. Throughput: 0: 793.2. Samples: 448020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:52:41,778][00929] Avg episode reward: [(0, '11.811')] -[2023-08-30 09:52:45,677][08361] Updated weights for policy 0, policy_version 440 (0.0040) -[2023-08-30 09:52:46,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3068.5). Total num frames: 1802240. Throughput: 0: 777.8. Samples: 450206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:52:46,778][00929] Avg episode reward: [(0, '11.621')] -[2023-08-30 09:52:51,775][00929] Fps is (10 sec: 2457.8, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 1814528. Throughput: 0: 780.9. Samples: 453934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:52:51,779][00929] Avg episode reward: [(0, '11.287')] -[2023-08-30 09:52:56,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3040.8). Total num frames: 1830912. Throughput: 0: 797.0. Samples: 458674. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:52:56,782][00929] Avg episode reward: [(0, '11.080')] -[2023-08-30 09:52:59,126][08361] Updated weights for policy 0, policy_version 450 (0.0016) -[2023-08-30 09:53:01,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.6, 300 sec: 3068.5). Total num frames: 1851392. Throughput: 0: 796.8. Samples: 461602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:53:01,778][00929] Avg episode reward: [(0, '11.027')] -[2023-08-30 09:53:06,779][00929] Fps is (10 sec: 3685.0, 60 sec: 3208.4, 300 sec: 3082.4). Total num frames: 1867776. Throughput: 0: 786.3. Samples: 466978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:53:06,783][00929] Avg episode reward: [(0, '12.275')] -[2023-08-30 09:53:06,797][08348] Saving new best policy, reward=12.275! -[2023-08-30 09:53:11,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3054.6). Total num frames: 1880064. Throughput: 0: 783.0. Samples: 470636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:53:11,783][00929] Avg episode reward: [(0, '11.967')] -[2023-08-30 09:53:12,760][08361] Updated weights for policy 0, policy_version 460 (0.0029) -[2023-08-30 09:53:16,775][00929] Fps is (10 sec: 2868.2, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 1896448. Throughput: 0: 784.5. Samples: 472520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:53:16,778][00929] Avg episode reward: [(0, '13.123')] -[2023-08-30 09:53:16,786][08348] Saving new best policy, reward=13.123! -[2023-08-30 09:53:21,775][00929] Fps is (10 sec: 3276.9, 60 sec: 3140.5, 300 sec: 3054.6). Total num frames: 1912832. Throughput: 0: 801.1. Samples: 478078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:53:21,783][00929] Avg episode reward: [(0, '14.360')] -[2023-08-30 09:53:21,848][08348] Saving new best policy, reward=14.360! -[2023-08-30 09:53:23,946][08361] Updated weights for policy 0, policy_version 470 (0.0032) -[2023-08-30 09:53:26,778][00929] Fps is (10 sec: 3276.0, 60 sec: 3208.5, 300 sec: 3068.5). Total num frames: 1929216. Throughput: 0: 785.6. Samples: 483372. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-08-30 09:53:26,780][00929] Avg episode reward: [(0, '14.991')] -[2023-08-30 09:53:26,797][08348] Saving new best policy, reward=14.991! -[2023-08-30 09:53:31,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3054.6). Total num frames: 1941504. Throughput: 0: 768.1. Samples: 484770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:53:31,786][00929] Avg episode reward: [(0, '14.744')] -[2023-08-30 09:53:36,775][00929] Fps is (10 sec: 2048.5, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 1949696. Throughput: 0: 750.7. Samples: 487714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:53:36,782][00929] Avg episode reward: [(0, '14.547')] -[2023-08-30 09:53:41,775][00929] Fps is (10 sec: 2048.0, 60 sec: 2867.2, 300 sec: 3013.0). Total num frames: 1961984. Throughput: 0: 716.7. Samples: 490924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:53:41,778][00929] Avg episode reward: [(0, '15.176')] -[2023-08-30 09:53:41,789][08348] Saving new best policy, reward=15.176! -[2023-08-30 09:53:43,400][08361] Updated weights for policy 0, policy_version 480 (0.0046) -[2023-08-30 09:53:46,775][00929] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3026.9). Total num frames: 1978368. Throughput: 0: 700.2. Samples: 493112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:53:46,777][00929] Avg episode reward: [(0, '15.435')] -[2023-08-30 09:53:46,782][08348] Saving new best policy, reward=15.435! -[2023-08-30 09:53:51,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3054.7). Total num frames: 1998848. Throughput: 0: 712.0. Samples: 499014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:53:51,777][00929] Avg episode reward: [(0, '16.284')] -[2023-08-30 09:53:51,786][08348] Saving new best policy, reward=16.284! -[2023-08-30 09:53:54,461][08361] Updated weights for policy 0, policy_version 490 (0.0014) -[2023-08-30 09:53:56,780][00929] Fps is (10 sec: 3275.2, 60 sec: 3003.5, 300 sec: 3026.8). Total num frames: 2011136. Throughput: 0: 731.7. Samples: 503568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:53:56,782][00929] Avg episode reward: [(0, '15.570')] -[2023-08-30 09:54:01,775][00929] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 3013.0). Total num frames: 2023424. Throughput: 0: 731.5. Samples: 505436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:54:01,779][00929] Avg episode reward: [(0, '16.641')] -[2023-08-30 09:54:01,796][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000494_2023424.pth... -[2023-08-30 09:54:01,959][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth -[2023-08-30 09:54:01,968][08348] Saving new best policy, reward=16.641! -[2023-08-30 09:54:06,775][00929] Fps is (10 sec: 2868.6, 60 sec: 2867.4, 300 sec: 3026.9). Total num frames: 2039808. Throughput: 0: 694.0. Samples: 509308. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-08-30 09:54:06,777][00929] Avg episode reward: [(0, '17.165')] -[2023-08-30 09:54:06,786][08348] Saving new best policy, reward=17.165! -[2023-08-30 09:54:08,826][08361] Updated weights for policy 0, policy_version 500 (0.0023) -[2023-08-30 09:54:11,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3054.6). Total num frames: 2060288. Throughput: 0: 704.8. Samples: 515088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:54:11,777][00929] Avg episode reward: [(0, '17.118')] -[2023-08-30 09:54:16,775][00929] Fps is (10 sec: 3686.3, 60 sec: 3003.7, 300 sec: 3040.8). Total num frames: 2076672. Throughput: 0: 738.4. Samples: 517996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:54:16,783][00929] Avg episode reward: [(0, '16.878')] -[2023-08-30 09:54:21,775][00929] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2999.1). Total num frames: 2084864. Throughput: 0: 754.4. Samples: 521664. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-08-30 09:54:21,789][00929] Avg episode reward: [(0, '16.741')] -[2023-08-30 09:54:22,374][08361] Updated weights for policy 0, policy_version 510 (0.0016) -[2023-08-30 09:54:26,775][00929] Fps is (10 sec: 2457.6, 60 sec: 2867.3, 300 sec: 3013.0). Total num frames: 2101248. Throughput: 0: 769.2. Samples: 525540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:54:26,783][00929] Avg episode reward: [(0, '16.425')] -[2023-08-30 09:54:31,775][00929] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3026.9). Total num frames: 2117632. Throughput: 0: 784.7. Samples: 528424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:54:31,782][00929] Avg episode reward: [(0, '15.048')] -[2023-08-30 09:54:34,304][08361] Updated weights for policy 0, policy_version 520 (0.0025) -[2023-08-30 09:54:36,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 2138112. Throughput: 0: 781.8. Samples: 534194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:54:36,780][00929] Avg episode reward: [(0, '15.392')] -[2023-08-30 09:54:41,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 2150400. Throughput: 0: 766.2. Samples: 538044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-08-30 09:54:41,779][00929] Avg episode reward: [(0, '15.161')] -[2023-08-30 09:54:46,775][00929] Fps is (10 sec: 2048.0, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 2158592. Throughput: 0: 764.6. Samples: 539844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:54:46,777][00929] Avg episode reward: [(0, '15.621')] -[2023-08-30 09:54:49,146][08361] Updated weights for policy 0, policy_version 530 (0.0022) -[2023-08-30 09:54:51,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3068.5). Total num frames: 2179072. Throughput: 0: 782.2. Samples: 544506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-08-30 09:54:51,780][00929] Avg episode reward: [(0, '16.419')] -[2023-08-30 09:54:56,775][00929] Fps is (10 sec: 4096.0, 60 sec: 3140.5, 300 sec: 3096.3). Total num frames: 2199552. Throughput: 0: 783.8. Samples: 550360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:54:56,779][00929] Avg episode reward: [(0, '18.459')] -[2023-08-30 09:54:56,787][08348] Saving new best policy, reward=18.459! -[2023-08-30 09:55:01,265][08361] Updated weights for policy 0, policy_version 540 (0.0027) -[2023-08-30 09:55:01,780][00929] Fps is (10 sec: 3275.1, 60 sec: 3140.0, 300 sec: 3082.4). Total num frames: 2211840. Throughput: 0: 767.2. Samples: 552522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:55:01,783][00929] Avg episode reward: [(0, '18.131')] -[2023-08-30 09:55:06,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3054.6). Total num frames: 2224128. Throughput: 0: 765.8. Samples: 556126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:55:06,778][00929] Avg episode reward: [(0, '17.790')] -[2023-08-30 09:55:11,775][00929] Fps is (10 sec: 2868.7, 60 sec: 3003.7, 300 sec: 3068.5). Total num frames: 2240512. Throughput: 0: 780.1. Samples: 560646. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-08-30 09:55:11,782][00929] Avg episode reward: [(0, '19.235')] -[2023-08-30 09:55:11,797][08348] Saving new best policy, reward=19.235! -[2023-08-30 09:55:14,565][08361] Updated weights for policy 0, policy_version 550 (0.0022) -[2023-08-30 09:55:16,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3082.4). Total num frames: 2256896. Throughput: 0: 780.3. Samples: 563536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:55:16,780][00929] Avg episode reward: [(0, '19.379')] -[2023-08-30 09:55:16,787][08348] Saving new best policy, reward=19.379! -[2023-08-30 09:55:21,777][00929] Fps is (10 sec: 3276.1, 60 sec: 3140.1, 300 sec: 3082.4). Total num frames: 2273280. Throughput: 0: 770.1. Samples: 568852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:55:21,780][00929] Avg episode reward: [(0, '19.895')] -[2023-08-30 09:55:21,797][08348] Saving new best policy, reward=19.895! -[2023-08-30 09:55:26,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3054.6). Total num frames: 2285568. Throughput: 0: 765.6. Samples: 572496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:55:26,784][00929] Avg episode reward: [(0, '19.231')] -[2023-08-30 09:55:28,716][08361] Updated weights for policy 0, policy_version 560 (0.0031) -[2023-08-30 09:55:31,775][00929] Fps is (10 sec: 2867.9, 60 sec: 3072.0, 300 sec: 3068.6). Total num frames: 2301952. Throughput: 0: 768.3. Samples: 574416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:55:31,783][00929] Avg episode reward: [(0, '19.187')] -[2023-08-30 09:55:36,777][00929] Fps is (10 sec: 3276.2, 60 sec: 3003.6, 300 sec: 3068.5). Total num frames: 2318336. Throughput: 0: 781.2. Samples: 579662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:55:36,779][00929] Avg episode reward: [(0, '20.292')] -[2023-08-30 09:55:36,873][08348] Saving new best policy, reward=20.292! -[2023-08-30 09:55:40,163][08361] Updated weights for policy 0, policy_version 570 (0.0030) -[2023-08-30 09:55:41,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 2338816. Throughput: 0: 775.3. Samples: 585250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:55:41,782][00929] Avg episode reward: [(0, '21.940')] -[2023-08-30 09:55:41,796][08348] Saving new best policy, reward=21.940! -[2023-08-30 09:55:46,780][00929] Fps is (10 sec: 3275.8, 60 sec: 3208.3, 300 sec: 3068.5). Total num frames: 2351104. Throughput: 0: 767.9. Samples: 587078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:55:46,791][00929] Avg episode reward: [(0, '22.504')] -[2023-08-30 09:55:46,793][08348] Saving new best policy, reward=22.504! -[2023-08-30 09:55:51,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3054.6). Total num frames: 2363392. Throughput: 0: 767.1. Samples: 590646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:55:51,782][00929] Avg episode reward: [(0, '22.943')] -[2023-08-30 09:55:51,795][08348] Saving new best policy, reward=22.943! -[2023-08-30 09:55:55,122][08361] Updated weights for policy 0, policy_version 580 (0.0018) -[2023-08-30 09:55:56,775][00929] Fps is (10 sec: 2868.6, 60 sec: 3003.7, 300 sec: 3068.5). Total num frames: 2379776. Throughput: 0: 778.0. Samples: 595654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:55:56,781][00929] Avg episode reward: [(0, '22.491')] -[2023-08-30 09:56:01,775][00929] Fps is (10 sec: 3686.3, 60 sec: 3140.5, 300 sec: 3096.3). Total num frames: 2400256. Throughput: 0: 777.8. Samples: 598536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:56:01,777][00929] Avg episode reward: [(0, '22.919')] -[2023-08-30 09:56:01,791][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000586_2400256.pth... -[2023-08-30 09:56:01,918][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth -[2023-08-30 09:56:06,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3068.5). Total num frames: 2412544. Throughput: 0: 765.1. Samples: 603278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:56:06,780][00929] Avg episode reward: [(0, '23.132')] -[2023-08-30 09:56:06,783][08348] Saving new best policy, reward=23.132! -[2023-08-30 09:56:07,737][08361] Updated weights for policy 0, policy_version 590 (0.0027) -[2023-08-30 09:56:11,777][00929] Fps is (10 sec: 2457.2, 60 sec: 3071.9, 300 sec: 3054.6). Total num frames: 2424832. Throughput: 0: 767.0. Samples: 607012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:56:11,788][00929] Avg episode reward: [(0, '21.716')] -[2023-08-30 09:56:16,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3068.5). Total num frames: 2441216. Throughput: 0: 768.8. Samples: 609010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:56:16,778][00929] Avg episode reward: [(0, '21.120')] -[2023-08-30 09:56:20,248][08361] Updated weights for policy 0, policy_version 600 (0.0018) -[2023-08-30 09:56:21,775][00929] Fps is (10 sec: 3687.0, 60 sec: 3140.4, 300 sec: 3096.3). Total num frames: 2461696. Throughput: 0: 784.5. Samples: 614962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:56:21,782][00929] Avg episode reward: [(0, '21.521')] -[2023-08-30 09:56:26,775][00929] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 3096.3). Total num frames: 2478080. Throughput: 0: 773.1. Samples: 620038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:56:26,785][00929] Avg episode reward: [(0, '22.187')] -[2023-08-30 09:56:31,775][00929] Fps is (10 sec: 2867.3, 60 sec: 3140.3, 300 sec: 3068.5). Total num frames: 2490368. Throughput: 0: 775.2. Samples: 621958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:56:31,782][00929] Avg episode reward: [(0, '21.733')] -[2023-08-30 09:56:34,597][08361] Updated weights for policy 0, policy_version 610 (0.0021) -[2023-08-30 09:56:36,775][00929] Fps is (10 sec: 2457.7, 60 sec: 3072.1, 300 sec: 3068.6). Total num frames: 2502656. Throughput: 0: 779.4. Samples: 625720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:56:36,783][00929] Avg episode reward: [(0, '21.048')] -[2023-08-30 09:56:41,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 2523136. Throughput: 0: 795.1. Samples: 631432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-08-30 09:56:41,777][00929] Avg episode reward: [(0, '20.739')] -[2023-08-30 09:56:45,528][08361] Updated weights for policy 0, policy_version 620 (0.0026) -[2023-08-30 09:56:46,781][00929] Fps is (10 sec: 3684.2, 60 sec: 3140.2, 300 sec: 3096.2). Total num frames: 2539520. Throughput: 0: 796.5. Samples: 634382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:56:46,784][00929] Avg episode reward: [(0, '20.753')] -[2023-08-30 09:56:51,776][00929] Fps is (10 sec: 2867.0, 60 sec: 3140.2, 300 sec: 3068.5). Total num frames: 2551808. Throughput: 0: 785.8. Samples: 638640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:56:51,781][00929] Avg episode reward: [(0, '21.582')] -[2023-08-30 09:56:56,775][00929] Fps is (10 sec: 2459.0, 60 sec: 3072.0, 300 sec: 3054.7). Total num frames: 2564096. Throughput: 0: 787.2. Samples: 642434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:56:56,780][00929] Avg episode reward: [(0, '20.788')] -[2023-08-30 09:57:00,319][08361] Updated weights for policy 0, policy_version 630 (0.0052) -[2023-08-30 09:57:01,775][00929] Fps is (10 sec: 3277.1, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 2584576. Throughput: 0: 797.6. Samples: 644904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:57:01,777][00929] Avg episode reward: [(0, '21.233')] -[2023-08-30 09:57:06,775][00929] Fps is (10 sec: 4096.1, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 2605056. Throughput: 0: 792.3. Samples: 650616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:57:06,784][00929] Avg episode reward: [(0, '22.984')] -[2023-08-30 09:57:11,781][00929] Fps is (10 sec: 2865.5, 60 sec: 3140.1, 300 sec: 3068.5). Total num frames: 2613248. Throughput: 0: 765.9. Samples: 654506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:57:11,784][00929] Avg episode reward: [(0, '22.997')] -[2023-08-30 09:57:14,646][08361] Updated weights for policy 0, policy_version 640 (0.0023) -[2023-08-30 09:57:16,777][00929] Fps is (10 sec: 2047.6, 60 sec: 3071.9, 300 sec: 3054.7). Total num frames: 2625536. Throughput: 0: 754.6. Samples: 655918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 09:57:16,780][00929] Avg episode reward: [(0, '23.528')] -[2023-08-30 09:57:16,791][08348] Saving new best policy, reward=23.528! -[2023-08-30 09:57:21,777][00929] Fps is (10 sec: 2048.8, 60 sec: 2867.1, 300 sec: 3040.8). Total num frames: 2633728. Throughput: 0: 734.1. Samples: 658758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:57:21,779][00929] Avg episode reward: [(0, '23.801')] -[2023-08-30 09:57:21,798][08348] Saving new best policy, reward=23.801! -[2023-08-30 09:57:26,775][00929] Fps is (10 sec: 2048.4, 60 sec: 2798.9, 300 sec: 3026.9). Total num frames: 2646016. Throughput: 0: 686.9. Samples: 662342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:57:26,777][00929] Avg episode reward: [(0, '22.149')] -[2023-08-30 09:57:30,223][08361] Updated weights for policy 0, policy_version 650 (0.0052) -[2023-08-30 09:57:31,775][00929] Fps is (10 sec: 3277.4, 60 sec: 2935.5, 300 sec: 3040.8). Total num frames: 2666496. Throughput: 0: 687.2. Samples: 665300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-08-30 09:57:31,777][00929] Avg episode reward: [(0, '20.917')] -[2023-08-30 09:57:36,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 2682880. Throughput: 0: 720.3. Samples: 671052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:57:36,779][00929] Avg episode reward: [(0, '19.428')] -[2023-08-30 09:57:41,775][00929] Fps is (10 sec: 2867.1, 60 sec: 2867.2, 300 sec: 3026.9). Total num frames: 2695168. Throughput: 0: 719.6. Samples: 674816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:57:41,781][00929] Avg episode reward: [(0, '19.113')] -[2023-08-30 09:57:43,767][08361] Updated weights for policy 0, policy_version 660 (0.0021) -[2023-08-30 09:57:46,779][00929] Fps is (10 sec: 2456.7, 60 sec: 2799.0, 300 sec: 3026.8). Total num frames: 2707456. Throughput: 0: 706.2. Samples: 676684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:57:46,781][00929] Avg episode reward: [(0, '18.422')] -[2023-08-30 09:57:51,775][00929] Fps is (10 sec: 3276.9, 60 sec: 2935.5, 300 sec: 3040.8). Total num frames: 2727936. Throughput: 0: 688.8. Samples: 681614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:57:51,778][00929] Avg episode reward: [(0, '17.458')] -[2023-08-30 09:57:55,425][08361] Updated weights for policy 0, policy_version 670 (0.0040) -[2023-08-30 09:57:56,775][00929] Fps is (10 sec: 4097.5, 60 sec: 3072.0, 300 sec: 3040.8). Total num frames: 2748416. Throughput: 0: 735.7. Samples: 687610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:57:56,782][00929] Avg episode reward: [(0, '18.022')] -[2023-08-30 09:58:01,775][00929] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3026.9). Total num frames: 2760704. Throughput: 0: 750.1. Samples: 689672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:58:01,778][00929] Avg episode reward: [(0, '19.078')] -[2023-08-30 09:58:01,792][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000674_2760704.pth... -[2023-08-30 09:58:01,940][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000494_2023424.pth -[2023-08-30 09:58:06,775][00929] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 3026.9). Total num frames: 2772992. Throughput: 0: 768.3. Samples: 693328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:58:06,780][00929] Avg episode reward: [(0, '19.386')] -[2023-08-30 09:58:10,227][08361] Updated weights for policy 0, policy_version 680 (0.0028) -[2023-08-30 09:58:11,775][00929] Fps is (10 sec: 2867.2, 60 sec: 2935.8, 300 sec: 3026.9). Total num frames: 2789376. Throughput: 0: 789.2. Samples: 697856. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:58:11,782][00929] Avg episode reward: [(0, '20.637')] -[2023-08-30 09:58:16,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3072.1, 300 sec: 3040.8). Total num frames: 2809856. Throughput: 0: 788.7. Samples: 700792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:58:16,783][00929] Avg episode reward: [(0, '20.559')] -[2023-08-30 09:58:21,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 3040.8). Total num frames: 2826240. Throughput: 0: 779.2. Samples: 706114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:58:21,782][00929] Avg episode reward: [(0, '20.081')] -[2023-08-30 09:58:21,798][08361] Updated weights for policy 0, policy_version 690 (0.0019) -[2023-08-30 09:58:26,776][00929] Fps is (10 sec: 2457.4, 60 sec: 3140.2, 300 sec: 3026.9). Total num frames: 2834432. Throughput: 0: 779.9. Samples: 709912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:58:26,778][00929] Avg episode reward: [(0, '19.842')] -[2023-08-30 09:58:31,775][00929] Fps is (10 sec: 2457.5, 60 sec: 3072.0, 300 sec: 3054.6). Total num frames: 2850816. Throughput: 0: 778.6. Samples: 711720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:58:31,781][00929] Avg episode reward: [(0, '20.079')] -[2023-08-30 09:58:35,544][08361] Updated weights for policy 0, policy_version 700 (0.0021) -[2023-08-30 09:58:36,775][00929] Fps is (10 sec: 3686.7, 60 sec: 3140.3, 300 sec: 3082.4). Total num frames: 2871296. Throughput: 0: 789.7. Samples: 717152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:58:36,778][00929] Avg episode reward: [(0, '20.859')] -[2023-08-30 09:58:41,775][00929] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3082.4). Total num frames: 2887680. Throughput: 0: 781.0. Samples: 722754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:58:41,779][00929] Avg episode reward: [(0, '20.212')] -[2023-08-30 09:58:46,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3208.7, 300 sec: 3054.6). Total num frames: 2899968. Throughput: 0: 775.8. Samples: 724584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:58:46,780][00929] Avg episode reward: [(0, '21.956')] -[2023-08-30 09:58:49,260][08361] Updated weights for policy 0, policy_version 710 (0.0018) -[2023-08-30 09:58:51,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3054.7). Total num frames: 2912256. Throughput: 0: 777.7. Samples: 728326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:58:51,781][00929] Avg episode reward: [(0, '22.191')] -[2023-08-30 09:58:56,775][00929] Fps is (10 sec: 3276.7, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 2932736. Throughput: 0: 793.8. Samples: 733576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:58:56,784][00929] Avg episode reward: [(0, '23.650')] -[2023-08-30 09:59:00,861][08361] Updated weights for policy 0, policy_version 720 (0.0016) -[2023-08-30 09:59:01,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3082.4). Total num frames: 2949120. Throughput: 0: 792.7. Samples: 736462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:59:01,785][00929] Avg episode reward: [(0, '24.201')] -[2023-08-30 09:59:01,859][08348] Saving new best policy, reward=24.201! -[2023-08-30 09:59:06,775][00929] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3068.5). Total num frames: 2965504. Throughput: 0: 779.4. Samples: 741186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:59:06,780][00929] Avg episode reward: [(0, '22.902')] -[2023-08-30 09:59:11,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3054.6). Total num frames: 2977792. Throughput: 0: 777.6. Samples: 744904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 09:59:11,781][00929] Avg episode reward: [(0, '22.849')] -[2023-08-30 09:59:15,435][08361] Updated weights for policy 0, policy_version 730 (0.0039) -[2023-08-30 09:59:16,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 2994176. Throughput: 0: 783.0. Samples: 746956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 09:59:16,779][00929] Avg episode reward: [(0, '22.449')] -[2023-08-30 09:59:21,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 3010560. Throughput: 0: 792.0. Samples: 752794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:59:21,783][00929] Avg episode reward: [(0, '20.782')] -[2023-08-30 09:59:26,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3082.4). Total num frames: 3026944. Throughput: 0: 775.6. Samples: 757654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:59:26,777][00929] Avg episode reward: [(0, '20.663')] -[2023-08-30 09:59:27,528][08361] Updated weights for policy 0, policy_version 740 (0.0018) -[2023-08-30 09:59:31,776][00929] Fps is (10 sec: 2867.0, 60 sec: 3140.2, 300 sec: 3054.6). Total num frames: 3039232. Throughput: 0: 775.0. Samples: 759460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:59:31,783][00929] Avg episode reward: [(0, '20.744')] -[2023-08-30 09:59:36,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3054.6). Total num frames: 3051520. Throughput: 0: 773.7. Samples: 763144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:59:36,777][00929] Avg episode reward: [(0, '20.729')] -[2023-08-30 09:59:41,022][08361] Updated weights for policy 0, policy_version 750 (0.0031) -[2023-08-30 09:59:41,775][00929] Fps is (10 sec: 3277.1, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 3072000. Throughput: 0: 785.6. Samples: 768926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 09:59:41,780][00929] Avg episode reward: [(0, '20.627')] -[2023-08-30 09:59:46,775][00929] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3096.3). Total num frames: 3092480. Throughput: 0: 785.6. Samples: 771814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 09:59:46,781][00929] Avg episode reward: [(0, '20.361')] -[2023-08-30 09:59:51,781][00929] Fps is (10 sec: 3274.8, 60 sec: 3208.2, 300 sec: 3068.5). Total num frames: 3104768. Throughput: 0: 773.4. Samples: 775992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 09:59:51,786][00929] Avg episode reward: [(0, '20.170')] -[2023-08-30 09:59:54,918][08361] Updated weights for policy 0, policy_version 760 (0.0018) -[2023-08-30 09:59:56,779][00929] Fps is (10 sec: 2456.6, 60 sec: 3071.8, 300 sec: 3068.5). Total num frames: 3117056. Throughput: 0: 775.4. Samples: 779798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 09:59:56,781][00929] Avg episode reward: [(0, '19.749')] -[2023-08-30 10:00:01,775][00929] Fps is (10 sec: 2868.9, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 3133440. Throughput: 0: 785.9. Samples: 782320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:00:01,782][00929] Avg episode reward: [(0, '20.542')] -[2023-08-30 10:00:01,795][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000765_3133440.pth... -[2023-08-30 10:00:01,962][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000586_2400256.pth -[2023-08-30 10:00:06,494][08361] Updated weights for policy 0, policy_version 770 (0.0019) -[2023-08-30 10:00:06,775][00929] Fps is (10 sec: 3687.8, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 3153920. Throughput: 0: 781.6. Samples: 787964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 10:00:06,777][00929] Avg episode reward: [(0, '21.079')] -[2023-08-30 10:00:11,776][00929] Fps is (10 sec: 3276.6, 60 sec: 3140.2, 300 sec: 3082.4). Total num frames: 3166208. Throughput: 0: 769.6. Samples: 792288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:00:11,778][00929] Avg episode reward: [(0, '21.720')] -[2023-08-30 10:00:16,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3068.5). Total num frames: 3178496. Throughput: 0: 769.4. Samples: 794082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:00:16,783][00929] Avg episode reward: [(0, '22.941')] -[2023-08-30 10:00:21,682][08361] Updated weights for policy 0, policy_version 780 (0.0047) -[2023-08-30 10:00:21,775][00929] Fps is (10 sec: 2867.4, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 3194880. Throughput: 0: 776.0. Samples: 798066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:00:21,778][00929] Avg episode reward: [(0, '23.070')] -[2023-08-30 10:00:26,775][00929] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 3211264. Throughput: 0: 767.9. Samples: 803480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:00:26,782][00929] Avg episode reward: [(0, '23.684')] -[2023-08-30 10:00:31,776][00929] Fps is (10 sec: 3276.6, 60 sec: 3140.3, 300 sec: 3082.4). Total num frames: 3227648. Throughput: 0: 760.8. Samples: 806052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 10:00:31,783][00929] Avg episode reward: [(0, '24.276')] -[2023-08-30 10:00:31,795][08348] Saving new best policy, reward=24.276! -[2023-08-30 10:00:35,237][08361] Updated weights for policy 0, policy_version 790 (0.0018) -[2023-08-30 10:00:36,775][00929] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3040.8). Total num frames: 3235840. Throughput: 0: 746.6. Samples: 809586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 10:00:36,778][00929] Avg episode reward: [(0, '23.867')] -[2023-08-30 10:00:41,775][00929] Fps is (10 sec: 2457.8, 60 sec: 3003.7, 300 sec: 3054.7). Total num frames: 3252224. Throughput: 0: 744.9. Samples: 813316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:00:41,778][00929] Avg episode reward: [(0, '23.237')] -[2023-08-30 10:00:46,775][00929] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3068.5). Total num frames: 3268608. Throughput: 0: 748.8. Samples: 816016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:00:46,778][00929] Avg episode reward: [(0, '22.512')] -[2023-08-30 10:00:48,642][08361] Updated weights for policy 0, policy_version 800 (0.0021) -[2023-08-30 10:00:51,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3004.0, 300 sec: 3068.5). Total num frames: 3284992. Throughput: 0: 738.5. Samples: 821198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:00:51,782][00929] Avg episode reward: [(0, '21.082')] -[2023-08-30 10:00:56,779][00929] Fps is (10 sec: 2456.6, 60 sec: 2935.5, 300 sec: 3026.8). Total num frames: 3293184. Throughput: 0: 702.9. Samples: 823922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 10:00:56,782][00929] Avg episode reward: [(0, '21.854')] -[2023-08-30 10:01:01,775][00929] Fps is (10 sec: 1638.4, 60 sec: 2798.9, 300 sec: 3013.0). Total num frames: 3301376. Throughput: 0: 691.7. Samples: 825210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 10:01:01,777][00929] Avg episode reward: [(0, '21.598')] -[2023-08-30 10:01:06,775][00929] Fps is (10 sec: 1639.0, 60 sec: 2594.1, 300 sec: 2999.1). Total num frames: 3309568. Throughput: 0: 664.3. Samples: 827958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:01:06,778][00929] Avg episode reward: [(0, '21.764')] -[2023-08-30 10:01:09,096][08361] Updated weights for policy 0, policy_version 810 (0.0038) -[2023-08-30 10:01:11,775][00929] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2999.1). Total num frames: 3325952. Throughput: 0: 634.9. Samples: 832052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:01:11,782][00929] Avg episode reward: [(0, '23.210')] -[2023-08-30 10:01:16,775][00929] Fps is (10 sec: 3276.9, 60 sec: 2730.7, 300 sec: 2985.2). Total num frames: 3342336. Throughput: 0: 636.4. Samples: 834690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:01:16,777][00929] Avg episode reward: [(0, '22.533')] -[2023-08-30 10:01:21,769][08361] Updated weights for policy 0, policy_version 820 (0.0032) -[2023-08-30 10:01:21,777][00929] Fps is (10 sec: 3276.2, 60 sec: 2730.6, 300 sec: 2985.2). Total num frames: 3358720. Throughput: 0: 664.6. Samples: 839494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:01:21,781][00929] Avg episode reward: [(0, '22.746')] -[2023-08-30 10:01:26,781][00929] Fps is (10 sec: 2456.2, 60 sec: 2593.9, 300 sec: 2971.3). Total num frames: 3366912. Throughput: 0: 658.0. Samples: 842928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:01:26,793][00929] Avg episode reward: [(0, '23.046')] -[2023-08-30 10:01:31,775][00929] Fps is (10 sec: 2458.1, 60 sec: 2594.2, 300 sec: 2985.2). Total num frames: 3383296. Throughput: 0: 638.4. Samples: 844746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:01:31,777][00929] Avg episode reward: [(0, '22.930')] -[2023-08-30 10:01:35,800][08361] Updated weights for policy 0, policy_version 830 (0.0043) -[2023-08-30 10:01:36,775][00929] Fps is (10 sec: 3278.7, 60 sec: 2730.7, 300 sec: 2971.3). Total num frames: 3399680. Throughput: 0: 641.4. Samples: 850060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:01:36,783][00929] Avg episode reward: [(0, '23.925')] -[2023-08-30 10:01:41,775][00929] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2971.4). Total num frames: 3416064. Throughput: 0: 701.6. Samples: 855490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 10:01:41,777][00929] Avg episode reward: [(0, '23.810')] -[2023-08-30 10:01:46,775][00929] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2985.2). Total num frames: 3432448. Throughput: 0: 713.2. Samples: 857306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 10:01:46,780][00929] Avg episode reward: [(0, '23.476')] -[2023-08-30 10:01:49,859][08361] Updated weights for policy 0, policy_version 840 (0.0026) -[2023-08-30 10:01:51,775][00929] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2971.3). Total num frames: 3440640. Throughput: 0: 732.3. Samples: 860910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:01:51,780][00929] Avg episode reward: [(0, '23.864')] -[2023-08-30 10:01:56,775][00929] Fps is (10 sec: 2867.2, 60 sec: 2799.1, 300 sec: 2971.3). Total num frames: 3461120. Throughput: 0: 756.2. Samples: 866080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:01:56,777][00929] Avg episode reward: [(0, '26.185')] -[2023-08-30 10:01:56,779][08348] Saving new best policy, reward=26.185! -[2023-08-30 10:02:01,572][08361] Updated weights for policy 0, policy_version 850 (0.0015) -[2023-08-30 10:02:01,775][00929] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 3481600. Throughput: 0: 760.6. Samples: 868918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:02:01,777][00929] Avg episode reward: [(0, '25.201')] -[2023-08-30 10:02:01,789][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000850_3481600.pth... -[2023-08-30 10:02:01,983][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000674_2760704.pth -[2023-08-30 10:02:06,778][00929] Fps is (10 sec: 2866.4, 60 sec: 3003.6, 300 sec: 2971.4). Total num frames: 3489792. Throughput: 0: 749.5. Samples: 873224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:02:06,783][00929] Avg episode reward: [(0, '25.638')] -[2023-08-30 10:02:11,780][00929] Fps is (10 sec: 2456.3, 60 sec: 3003.5, 300 sec: 2985.2). Total num frames: 3506176. Throughput: 0: 755.2. Samples: 876912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:02:11,783][00929] Avg episode reward: [(0, '24.721')] -[2023-08-30 10:02:16,536][08361] Updated weights for policy 0, policy_version 860 (0.0016) -[2023-08-30 10:02:16,775][00929] Fps is (10 sec: 3277.7, 60 sec: 3003.7, 300 sec: 3013.0). Total num frames: 3522560. Throughput: 0: 763.1. Samples: 879084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:02:16,782][00929] Avg episode reward: [(0, '24.812')] -[2023-08-30 10:02:21,775][00929] Fps is (10 sec: 3278.5, 60 sec: 3003.8, 300 sec: 3026.9). Total num frames: 3538944. Throughput: 0: 771.6. Samples: 884782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 10:02:21,777][00929] Avg episode reward: [(0, '24.940')] -[2023-08-30 10:02:26,775][00929] Fps is (10 sec: 3276.7, 60 sec: 3140.6, 300 sec: 3013.0). Total num frames: 3555328. Throughput: 0: 754.6. Samples: 889448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:02:26,778][00929] Avg episode reward: [(0, '23.564')] -[2023-08-30 10:02:29,767][08361] Updated weights for policy 0, policy_version 870 (0.0023) -[2023-08-30 10:02:31,775][00929] Fps is (10 sec: 2867.3, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3567616. Throughput: 0: 755.5. Samples: 891304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:02:31,777][00929] Avg episode reward: [(0, '22.765')] -[2023-08-30 10:02:36,775][00929] Fps is (10 sec: 2457.7, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 3579904. Throughput: 0: 758.7. Samples: 895052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 10:02:36,780][00929] Avg episode reward: [(0, '23.878')] -[2023-08-30 10:02:41,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 3600384. Throughput: 0: 769.9. Samples: 900724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:02:41,780][00929] Avg episode reward: [(0, '23.860')] -[2023-08-30 10:02:42,507][08361] Updated weights for policy 0, policy_version 880 (0.0025) -[2023-08-30 10:02:46,779][00929] Fps is (10 sec: 3685.0, 60 sec: 3071.8, 300 sec: 3013.0). Total num frames: 3616768. Throughput: 0: 770.1. Samples: 903574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 10:02:46,781][00929] Avg episode reward: [(0, '25.106')] -[2023-08-30 10:02:51,777][00929] Fps is (10 sec: 2866.5, 60 sec: 3140.1, 300 sec: 2985.2). Total num frames: 3629056. Throughput: 0: 754.9. Samples: 907194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 10:02:51,780][00929] Avg episode reward: [(0, '25.137')] -[2023-08-30 10:02:56,775][00929] Fps is (10 sec: 2458.6, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 3641344. Throughput: 0: 754.0. Samples: 910840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-08-30 10:02:56,783][00929] Avg episode reward: [(0, '25.695')] -[2023-08-30 10:02:57,422][08361] Updated weights for policy 0, policy_version 890 (0.0043) -[2023-08-30 10:03:01,775][00929] Fps is (10 sec: 2867.9, 60 sec: 2935.5, 300 sec: 2999.1). Total num frames: 3657728. Throughput: 0: 769.2. Samples: 913698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 10:03:01,780][00929] Avg episode reward: [(0, '25.637')] -[2023-08-30 10:03:06,775][00929] Fps is (10 sec: 3686.4, 60 sec: 3140.4, 300 sec: 3013.0). Total num frames: 3678208. Throughput: 0: 770.8. Samples: 919468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:03:06,780][00929] Avg episode reward: [(0, '26.118')] -[2023-08-30 10:03:09,440][08361] Updated weights for policy 0, policy_version 900 (0.0013) -[2023-08-30 10:03:11,775][00929] Fps is (10 sec: 3276.8, 60 sec: 3072.3, 300 sec: 2985.2). Total num frames: 3690496. Throughput: 0: 754.5. Samples: 923400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:03:11,777][00929] Avg episode reward: [(0, '25.551')] -[2023-08-30 10:03:16,775][00929] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 3702784. Throughput: 0: 754.3. Samples: 925250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:03:16,783][00929] Avg episode reward: [(0, '24.363')] -[2023-08-30 10:03:21,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 2999.1). Total num frames: 3719168. Throughput: 0: 769.5. Samples: 929680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 10:03:21,778][00929] Avg episode reward: [(0, '24.707')] -[2023-08-30 10:03:23,137][08361] Updated weights for policy 0, policy_version 910 (0.0014) -[2023-08-30 10:03:26,775][00929] Fps is (10 sec: 3686.5, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3739648. Throughput: 0: 769.9. Samples: 935368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-08-30 10:03:26,780][00929] Avg episode reward: [(0, '25.228')] -[2023-08-30 10:03:31,776][00929] Fps is (10 sec: 3276.6, 60 sec: 3072.0, 300 sec: 2985.2). Total num frames: 3751936. Throughput: 0: 758.5. Samples: 937704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-08-30 10:03:31,778][00929] Avg episode reward: [(0, '24.362')] -[2023-08-30 10:03:36,776][00929] Fps is (10 sec: 2457.4, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 3764224. Throughput: 0: 758.9. Samples: 941344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:03:36,780][00929] Avg episode reward: [(0, '25.041')] -[2023-08-30 10:03:37,596][08361] Updated weights for policy 0, policy_version 920 (0.0020) -[2023-08-30 10:03:41,775][00929] Fps is (10 sec: 2457.8, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 3776512. Throughput: 0: 765.5. Samples: 945288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:03:41,783][00929] Avg episode reward: [(0, '24.240')] -[2023-08-30 10:03:46,775][00929] Fps is (10 sec: 3277.0, 60 sec: 3003.9, 300 sec: 2999.1). Total num frames: 3796992. Throughput: 0: 763.1. Samples: 948038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:03:46,782][00929] Avg episode reward: [(0, '25.614')] -[2023-08-30 10:03:49,535][08361] Updated weights for policy 0, policy_version 930 (0.0021) -[2023-08-30 10:03:51,780][00929] Fps is (10 sec: 3684.6, 60 sec: 3071.9, 300 sec: 2985.2). Total num frames: 3813376. Throughput: 0: 754.5. Samples: 953422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:03:51,785][00929] Avg episode reward: [(0, '25.186')] -[2023-08-30 10:03:56,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 3825664. Throughput: 0: 745.2. Samples: 956934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:03:56,777][00929] Avg episode reward: [(0, '24.931')] -[2023-08-30 10:04:01,777][00929] Fps is (10 sec: 2458.3, 60 sec: 3003.6, 300 sec: 2957.4). Total num frames: 3837952. Throughput: 0: 741.8. Samples: 958630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:04:01,782][00929] Avg episode reward: [(0, '25.353')] -[2023-08-30 10:04:01,794][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000937_3837952.pth... -[2023-08-30 10:04:01,944][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000765_3133440.pth -[2023-08-30 10:04:04,927][08361] Updated weights for policy 0, policy_version 940 (0.0015) -[2023-08-30 10:04:06,775][00929] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 3854336. Throughput: 0: 753.4. Samples: 963582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:04:06,778][00929] Avg episode reward: [(0, '24.709')] -[2023-08-30 10:04:11,784][00929] Fps is (10 sec: 3683.8, 60 sec: 3071.5, 300 sec: 2985.1). Total num frames: 3874816. Throughput: 0: 749.9. Samples: 969120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:04:11,787][00929] Avg episode reward: [(0, '23.791')] -[2023-08-30 10:04:16,779][00929] Fps is (10 sec: 2866.1, 60 sec: 3003.6, 300 sec: 2957.4). Total num frames: 3883008. Throughput: 0: 738.5. Samples: 970940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-08-30 10:04:16,789][00929] Avg episode reward: [(0, '23.441')] -[2023-08-30 10:04:18,662][08361] Updated weights for policy 0, policy_version 950 (0.0025) -[2023-08-30 10:04:21,775][00929] Fps is (10 sec: 2459.7, 60 sec: 3003.7, 300 sec: 2957.4). Total num frames: 3899392. Throughput: 0: 738.2. Samples: 974562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-08-30 10:04:21,786][00929] Avg episode reward: [(0, '23.148')] -[2023-08-30 10:04:26,775][00929] Fps is (10 sec: 3278.1, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 3915776. Throughput: 0: 756.7. Samples: 979338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:04:26,784][00929] Avg episode reward: [(0, '23.885')] -[2023-08-30 10:04:30,897][08361] Updated weights for policy 0, policy_version 960 (0.0019) -[2023-08-30 10:04:31,775][00929] Fps is (10 sec: 3276.9, 60 sec: 3003.8, 300 sec: 2985.2). Total num frames: 3932160. Throughput: 0: 759.7. Samples: 982224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-08-30 10:04:31,784][00929] Avg episode reward: [(0, '25.048')] -[2023-08-30 10:04:36,775][00929] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 2957.5). Total num frames: 3944448. Throughput: 0: 733.8. Samples: 986438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:04:36,782][00929] Avg episode reward: [(0, '24.974')] -[2023-08-30 10:04:41,778][00929] Fps is (10 sec: 2047.4, 60 sec: 2935.3, 300 sec: 2915.8). Total num frames: 3952640. Throughput: 0: 715.5. Samples: 989134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:04:41,788][00929] Avg episode reward: [(0, '24.900')] -[2023-08-30 10:04:46,775][00929] Fps is (10 sec: 1638.4, 60 sec: 2730.7, 300 sec: 2902.0). Total num frames: 3960832. Throughput: 0: 708.3. Samples: 990504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-08-30 10:04:46,780][00929] Avg episode reward: [(0, '25.070')] -[2023-08-30 10:04:50,775][08361] Updated weights for policy 0, policy_version 970 (0.0042) -[2023-08-30 10:04:51,775][00929] Fps is (10 sec: 2048.5, 60 sec: 2662.6, 300 sec: 2901.9). Total num frames: 3973120. Throughput: 0: 667.0. Samples: 993598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:04:51,784][00929] Avg episode reward: [(0, '24.310')] -[2023-08-30 10:04:56,775][00929] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 3993600. Throughput: 0: 669.7. Samples: 999250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-08-30 10:04:56,777][00929] Avg episode reward: [(0, '25.254')] -[2023-08-30 10:04:59,316][08348] Stopping Batcher_0... -[2023-08-30 10:04:59,317][08348] Loop batcher_evt_loop terminating... -[2023-08-30 10:04:59,317][00929] Component Batcher_0 stopped! -[2023-08-30 10:04:59,326][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-08-30 10:04:59,396][00929] Component RolloutWorker_w4 stopped! -[2023-08-30 10:04:59,405][08366] Stopping RolloutWorker_w4... -[2023-08-30 10:04:59,405][08366] Loop rollout_proc4_evt_loop terminating... -[2023-08-30 10:04:59,397][08361] Weights refcount: 2 0 -[2023-08-30 10:04:59,414][08361] Stopping InferenceWorker_p0-w0... -[2023-08-30 10:04:59,414][00929] Component InferenceWorker_p0-w0 stopped! -[2023-08-30 10:04:59,433][00929] Component RolloutWorker_w6 stopped! -[2023-08-30 10:04:59,435][00929] Component RolloutWorker_w3 stopped! -[2023-08-30 10:04:59,439][08365] Stopping RolloutWorker_w3... -[2023-08-30 10:04:59,440][08365] Loop rollout_proc3_evt_loop terminating... -[2023-08-30 10:04:59,446][00929] Component RolloutWorker_w1 stopped! -[2023-08-30 10:04:59,447][08363] Stopping RolloutWorker_w1... -[2023-08-30 10:04:59,451][08361] Loop inference_proc0-0_evt_loop terminating... -[2023-08-30 10:04:59,455][08363] Loop rollout_proc1_evt_loop terminating... -[2023-08-30 10:04:59,433][08368] Stopping RolloutWorker_w6... -[2023-08-30 10:04:59,457][08368] Loop rollout_proc6_evt_loop terminating... -[2023-08-30 10:04:59,457][00929] Component RolloutWorker_w5 stopped! -[2023-08-30 10:04:59,459][08367] Stopping RolloutWorker_w5... -[2023-08-30 10:04:59,463][00929] Component RolloutWorker_w7 stopped! -[2023-08-30 10:04:59,465][08369] Stopping RolloutWorker_w7... -[2023-08-30 10:04:59,459][08367] Loop rollout_proc5_evt_loop terminating... -[2023-08-30 10:04:59,467][08369] Loop rollout_proc7_evt_loop terminating... -[2023-08-30 10:04:59,477][08364] Stopping RolloutWorker_w2... -[2023-08-30 10:04:59,477][00929] Component RolloutWorker_w2 stopped! -[2023-08-30 10:04:59,482][08364] Loop rollout_proc2_evt_loop terminating... -[2023-08-30 10:04:59,522][08362] Stopping RolloutWorker_w0... -[2023-08-30 10:04:59,523][08362] Loop rollout_proc0_evt_loop terminating... -[2023-08-30 10:04:59,522][00929] Component RolloutWorker_w0 stopped! -[2023-08-30 10:04:59,548][08348] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000850_3481600.pth -[2023-08-30 10:04:59,562][08348] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-08-30 10:04:59,747][08348] Stopping LearnerWorker_p0... -[2023-08-30 10:04:59,747][00929] Component LearnerWorker_p0 stopped! -[2023-08-30 10:04:59,748][08348] Loop learner_proc0_evt_loop terminating... -[2023-08-30 10:04:59,749][00929] Waiting for process learner_proc0 to stop... -[2023-08-30 10:05:02,005][00929] Waiting for process inference_proc0-0 to join... -[2023-08-30 10:05:02,008][00929] Waiting for process rollout_proc0 to join... -[2023-08-30 10:05:04,717][00929] Waiting for process rollout_proc1 to join... -[2023-08-30 10:05:04,720][00929] Waiting for process rollout_proc2 to join... -[2023-08-30 10:05:04,721][00929] Waiting for process rollout_proc3 to join... -[2023-08-30 10:05:04,723][00929] Waiting for process rollout_proc4 to join... -[2023-08-30 10:05:04,725][00929] Waiting for process rollout_proc5 to join... -[2023-08-30 10:05:04,727][00929] Waiting for process rollout_proc6 to join... -[2023-08-30 10:05:04,728][00929] Waiting for process rollout_proc7 to join... -[2023-08-30 10:05:04,729][00929] Batcher 0 profile tree view: -batching: 30.0293, releasing_batches: 0.0235 -[2023-08-30 10:05:04,736][00929] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0000 - wait_policy_total: 599.3278 -update_model: 8.9797 - weight_update: 0.0040 -one_step: 0.0027 - handle_policy_step: 680.8084 - deserialize: 17.8320, stack: 3.3653, obs_to_device_normalize: 129.3780, forward: 377.1985, send_messages: 32.1604 - prepare_outputs: 88.6094 - to_cpu: 50.6414 -[2023-08-30 10:05:04,737][00929] Learner 0 profile tree view: -misc: 0.0059, prepare_batch: 19.9335 -train: 77.3170 - epoch_init: 0.0083, minibatch_init: 0.0135, losses_postprocess: 0.6532, kl_divergence: 0.6640, after_optimizer: 4.2534 - calculate_losses: 26.5385 - losses_init: 0.0041, forward_head: 1.4542, bptt_initial: 17.1643, tail: 1.2550, advantages_returns: 0.3518, losses: 3.6558 - bptt: 2.3040 - bptt_forward_core: 2.1865 - update: 44.5742 - clip: 32.7087 -[2023-08-30 10:05:04,738][00929] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.3928, enqueue_policy_requests: 176.7038, env_step: 1003.2468, overhead: 28.5822, complete_rollouts: 8.6126 -save_policy_outputs: 27.1381 - split_output_tensors: 12.6970 -[2023-08-30 10:05:04,740][00929] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.3993, enqueue_policy_requests: 186.2463, env_step: 996.2913, overhead: 27.8348, complete_rollouts: 8.0003 -save_policy_outputs: 25.0977 - split_output_tensors: 11.7235 -[2023-08-30 10:05:04,741][00929] Loop Runner_EvtLoop terminating... -[2023-08-30 10:05:04,742][00929] Runner profile tree view: -main_loop: 1377.1708 -[2023-08-30 10:05:04,744][00929] Collected {0: 4005888}, FPS: 2908.8 -[2023-08-30 10:05:15,846][00929] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2023-08-30 10:05:15,847][00929] Overriding arg 'num_workers' with value 1 passed from command line -[2023-08-30 10:05:15,851][00929] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-08-30 10:05:15,854][00929] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-08-30 10:05:15,855][00929] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-08-30 10:05:15,857][00929] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-08-30 10:05:15,858][00929] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2023-08-30 10:05:15,860][00929] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-08-30 10:05:15,861][00929] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2023-08-30 10:05:15,863][00929] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2023-08-30 10:05:15,873][00929] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-08-30 10:05:15,875][00929] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-08-30 10:05:15,876][00929] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-08-30 10:05:15,877][00929] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-08-30 10:05:15,878][00929] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-08-30 10:05:15,910][00929] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-08-30 10:05:15,916][00929] RunningMeanStd input shape: (3, 72, 128) -[2023-08-30 10:05:15,919][00929] RunningMeanStd input shape: (1,) -[2023-08-30 10:05:15,935][00929] ConvEncoder: input_channels=3 -[2023-08-30 10:05:16,057][00929] Conv encoder output size: 512 -[2023-08-30 10:05:16,059][00929] Policy head output size: 512 -[2023-08-30 10:05:19,349][00929] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-08-30 10:05:20,857][00929] Num frames 100... -[2023-08-30 10:05:20,985][00929] Num frames 200... -[2023-08-30 10:05:21,119][00929] Num frames 300... -[2023-08-30 10:05:21,254][00929] Num frames 400... -[2023-08-30 10:05:21,386][00929] Num frames 500... -[2023-08-30 10:05:21,511][00929] Num frames 600... -[2023-08-30 10:05:21,639][00929] Num frames 700... -[2023-08-30 10:05:21,770][00929] Num frames 800... -[2023-08-30 10:05:21,906][00929] Num frames 900... -[2023-08-30 10:05:22,040][00929] Num frames 1000... -[2023-08-30 10:05:22,173][00929] Num frames 1100... -[2023-08-30 10:05:22,303][00929] Num frames 1200... -[2023-08-30 10:05:22,430][00929] Num frames 1300... -[2023-08-30 10:05:22,559][00929] Num frames 1400... -[2023-08-30 10:05:22,691][00929] Num frames 1500... -[2023-08-30 10:05:22,753][00929] Avg episode rewards: #0: 33.040, true rewards: #0: 15.040 -[2023-08-30 10:05:22,755][00929] Avg episode reward: 33.040, avg true_objective: 15.040 -[2023-08-30 10:05:22,884][00929] Num frames 1600... -[2023-08-30 10:05:23,010][00929] Num frames 1700... -[2023-08-30 10:05:23,141][00929] Num frames 1800... -[2023-08-30 10:05:23,273][00929] Num frames 1900... -[2023-08-30 10:05:23,404][00929] Num frames 2000... -[2023-08-30 10:05:23,542][00929] Num frames 2100... -[2023-08-30 10:05:23,665][00929] Num frames 2200... -[2023-08-30 10:05:23,792][00929] Num frames 2300... -[2023-08-30 10:05:23,918][00929] Num frames 2400... -[2023-08-30 10:05:24,057][00929] Avg episode rewards: #0: 27.820, true rewards: #0: 12.320 -[2023-08-30 10:05:24,059][00929] Avg episode reward: 27.820, avg true_objective: 12.320 -[2023-08-30 10:05:24,123][00929] Num frames 2500... -[2023-08-30 10:05:24,266][00929] Num frames 2600... -[2023-08-30 10:05:24,409][00929] Num frames 2700... -[2023-08-30 10:05:24,546][00929] Num frames 2800... -[2023-08-30 10:05:24,686][00929] Num frames 2900... -[2023-08-30 10:05:24,823][00929] Num frames 3000... -[2023-08-30 10:05:24,967][00929] Num frames 3100... -[2023-08-30 10:05:25,101][00929] Num frames 3200... -[2023-08-30 10:05:25,234][00929] Num frames 3300... -[2023-08-30 10:05:25,368][00929] Num frames 3400... -[2023-08-30 10:05:25,502][00929] Num frames 3500... -[2023-08-30 10:05:25,631][00929] Num frames 3600... -[2023-08-30 10:05:25,761][00929] Num frames 3700... -[2023-08-30 10:05:25,888][00929] Num frames 3800... -[2023-08-30 10:05:26,019][00929] Num frames 3900... -[2023-08-30 10:05:26,156][00929] Num frames 4000... -[2023-08-30 10:05:26,295][00929] Avg episode rewards: #0: 34.213, true rewards: #0: 13.547 -[2023-08-30 10:05:26,296][00929] Avg episode reward: 34.213, avg true_objective: 13.547 -[2023-08-30 10:05:26,351][00929] Num frames 4100... -[2023-08-30 10:05:26,492][00929] Num frames 4200... -[2023-08-30 10:05:26,624][00929] Num frames 4300... -[2023-08-30 10:05:26,764][00929] Num frames 4400... -[2023-08-30 10:05:26,895][00929] Num frames 4500... -[2023-08-30 10:05:27,024][00929] Num frames 4600... -[2023-08-30 10:05:27,158][00929] Num frames 4700... -[2023-08-30 10:05:27,290][00929] Num frames 4800... -[2023-08-30 10:05:27,433][00929] Num frames 4900... -[2023-08-30 10:05:27,569][00929] Num frames 5000... -[2023-08-30 10:05:27,708][00929] Num frames 5100... -[2023-08-30 10:05:27,800][00929] Avg episode rewards: #0: 31.570, true rewards: #0: 12.820 -[2023-08-30 10:05:27,802][00929] Avg episode reward: 31.570, avg true_objective: 12.820 -[2023-08-30 10:05:27,901][00929] Num frames 5200... -[2023-08-30 10:05:28,030][00929] Num frames 5300... -[2023-08-30 10:05:28,166][00929] Num frames 5400... -[2023-08-30 10:05:28,305][00929] Num frames 5500... -[2023-08-30 10:05:28,436][00929] Num frames 5600... -[2023-08-30 10:05:28,567][00929] Num frames 5700... -[2023-08-30 10:05:28,697][00929] Num frames 5800... -[2023-08-30 10:05:28,825][00929] Num frames 5900... -[2023-08-30 10:05:28,958][00929] Num frames 6000... -[2023-08-30 10:05:29,084][00929] Num frames 6100... -[2023-08-30 10:05:29,226][00929] Num frames 6200... -[2023-08-30 10:05:29,353][00929] Num frames 6300... -[2023-08-30 10:05:29,488][00929] Num frames 6400... -[2023-08-30 10:05:29,618][00929] Num frames 6500... -[2023-08-30 10:05:29,700][00929] Avg episode rewards: #0: 32.038, true rewards: #0: 13.038 -[2023-08-30 10:05:29,702][00929] Avg episode reward: 32.038, avg true_objective: 13.038 -[2023-08-30 10:05:29,825][00929] Num frames 6600... -[2023-08-30 10:05:29,958][00929] Num frames 6700... -[2023-08-30 10:05:30,158][00929] Num frames 6800... -[2023-08-30 10:05:30,352][00929] Num frames 6900... -[2023-08-30 10:05:30,541][00929] Num frames 7000... -[2023-08-30 10:05:30,726][00929] Num frames 7100... -[2023-08-30 10:05:30,907][00929] Num frames 7200... -[2023-08-30 10:05:31,090][00929] Num frames 7300... -[2023-08-30 10:05:31,278][00929] Num frames 7400... -[2023-08-30 10:05:31,470][00929] Num frames 7500... -[2023-08-30 10:05:31,668][00929] Num frames 7600... -[2023-08-30 10:05:31,742][00929] Avg episode rewards: #0: 30.175, true rewards: #0: 12.675 -[2023-08-30 10:05:31,744][00929] Avg episode reward: 30.175, avg true_objective: 12.675 -[2023-08-30 10:05:31,945][00929] Num frames 7700... -[2023-08-30 10:05:32,133][00929] Num frames 7800... -[2023-08-30 10:05:32,330][00929] Num frames 7900... -[2023-08-30 10:05:32,524][00929] Num frames 8000... -[2023-08-30 10:05:32,718][00929] Num frames 8100... -[2023-08-30 10:05:32,905][00929] Num frames 8200... -[2023-08-30 10:05:33,095][00929] Num frames 8300... -[2023-08-30 10:05:33,288][00929] Num frames 8400... -[2023-08-30 10:05:33,485][00929] Num frames 8500... -[2023-08-30 10:05:33,672][00929] Num frames 8600... -[2023-08-30 10:05:33,862][00929] Num frames 8700... -[2023-08-30 10:05:33,971][00929] Avg episode rewards: #0: 29.321, true rewards: #0: 12.464 -[2023-08-30 10:05:33,973][00929] Avg episode reward: 29.321, avg true_objective: 12.464 -[2023-08-30 10:05:34,077][00929] Num frames 8800... -[2023-08-30 10:05:34,217][00929] Num frames 8900... -[2023-08-30 10:05:34,367][00929] Num frames 9000... -[2023-08-30 10:05:34,506][00929] Num frames 9100... -[2023-08-30 10:05:34,642][00929] Num frames 9200... -[2023-08-30 10:05:34,769][00929] Num frames 9300... -[2023-08-30 10:05:34,898][00929] Num frames 9400... -[2023-08-30 10:05:35,028][00929] Num frames 9500... -[2023-08-30 10:05:35,157][00929] Avg episode rewards: #0: 27.696, true rewards: #0: 11.946 -[2023-08-30 10:05:35,159][00929] Avg episode reward: 27.696, avg true_objective: 11.946 -[2023-08-30 10:05:35,218][00929] Num frames 9600... -[2023-08-30 10:05:35,350][00929] Num frames 9700... -[2023-08-30 10:05:35,495][00929] Num frames 9800... -[2023-08-30 10:05:35,627][00929] Num frames 9900... -[2023-08-30 10:05:35,757][00929] Num frames 10000... -[2023-08-30 10:05:35,885][00929] Num frames 10100... -[2023-08-30 10:05:36,018][00929] Num frames 10200... -[2023-08-30 10:05:36,195][00929] Avg episode rewards: #0: 25.992, true rewards: #0: 11.437 -[2023-08-30 10:05:36,197][00929] Avg episode reward: 25.992, avg true_objective: 11.437 -[2023-08-30 10:05:36,210][00929] Num frames 10300... -[2023-08-30 10:05:36,341][00929] Num frames 10400... -[2023-08-30 10:05:36,476][00929] Num frames 10500... -[2023-08-30 10:05:36,604][00929] Num frames 10600... -[2023-08-30 10:05:36,734][00929] Num frames 10700... -[2023-08-30 10:05:36,862][00929] Num frames 10800... -[2023-08-30 10:05:36,991][00929] Num frames 10900... -[2023-08-30 10:05:37,120][00929] Num frames 11000... -[2023-08-30 10:05:37,252][00929] Num frames 11100... -[2023-08-30 10:05:37,383][00929] Num frames 11200... -[2023-08-30 10:05:37,519][00929] Num frames 11300... -[2023-08-30 10:05:37,600][00929] Avg episode rewards: #0: 25.517, true rewards: #0: 11.317 -[2023-08-30 10:05:37,602][00929] Avg episode reward: 25.517, avg true_objective: 11.317 -[2023-08-30 10:06:55,782][00929] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2023-08-30 10:11:32,476][00929] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2023-08-30 10:11:32,477][00929] Overriding arg 'num_workers' with value 1 passed from command line -[2023-08-30 10:11:32,479][00929] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-08-30 10:11:32,482][00929] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-08-30 10:11:32,484][00929] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-08-30 10:11:32,486][00929] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-08-30 10:11:32,492][00929] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2023-08-30 10:11:32,493][00929] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-08-30 10:11:32,494][00929] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2023-08-30 10:11:32,495][00929] Adding new argument 'hf_repository'='AdanLee/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2023-08-30 10:11:32,496][00929] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-08-30 10:11:32,497][00929] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-08-30 10:11:32,499][00929] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-08-30 10:11:32,500][00929] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-08-30 10:11:32,501][00929] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-08-30 10:11:32,535][00929] RunningMeanStd input shape: (3, 72, 128) -[2023-08-30 10:11:32,536][00929] RunningMeanStd input shape: (1,) -[2023-08-30 10:11:32,550][00929] ConvEncoder: input_channels=3 -[2023-08-30 10:11:32,586][00929] Conv encoder output size: 512 -[2023-08-30 10:11:32,588][00929] Policy head output size: 512 -[2023-08-30 10:11:32,607][00929] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-08-30 10:11:33,069][00929] Num frames 100... -[2023-08-30 10:11:33,210][00929] Num frames 200... -[2023-08-30 10:11:33,348][00929] Num frames 300... -[2023-08-30 10:11:33,474][00929] Num frames 400... -[2023-08-30 10:11:33,598][00929] Num frames 500... -[2023-08-30 10:11:33,730][00929] Num frames 600... -[2023-08-30 10:11:33,873][00929] Num frames 700... -[2023-08-30 10:11:34,007][00929] Num frames 800... -[2023-08-30 10:11:34,136][00929] Num frames 900... -[2023-08-30 10:11:34,268][00929] Num frames 1000... -[2023-08-30 10:11:34,405][00929] Num frames 1100... -[2023-08-30 10:11:34,540][00929] Num frames 1200... -[2023-08-30 10:11:34,676][00929] Num frames 1300... -[2023-08-30 10:11:34,807][00929] Num frames 1400... -[2023-08-30 10:11:34,938][00929] Num frames 1500... -[2023-08-30 10:11:35,067][00929] Num frames 1600... -[2023-08-30 10:11:35,205][00929] Num frames 1700... -[2023-08-30 10:11:35,335][00929] Num frames 1800... -[2023-08-30 10:11:35,470][00929] Avg episode rewards: #0: 46.559, true rewards: #0: 18.560 -[2023-08-30 10:11:35,472][00929] Avg episode reward: 46.559, avg true_objective: 18.560 -[2023-08-30 10:11:35,534][00929] Num frames 1900... -[2023-08-30 10:11:35,667][00929] Num frames 2000... -[2023-08-30 10:11:35,799][00929] Num frames 2100... -[2023-08-30 10:11:35,936][00929] Num frames 2200... -[2023-08-30 10:11:36,064][00929] Num frames 2300... -[2023-08-30 10:11:36,258][00929] Num frames 2400... -[2023-08-30 10:11:36,470][00929] Num frames 2500... -[2023-08-30 10:11:36,659][00929] Num frames 2600... -[2023-08-30 10:11:36,861][00929] Num frames 2700... -[2023-08-30 10:11:37,058][00929] Num frames 2800... -[2023-08-30 10:11:37,251][00929] Num frames 2900... -[2023-08-30 10:11:37,454][00929] Num frames 3000... -[2023-08-30 10:11:37,636][00929] Num frames 3100... -[2023-08-30 10:11:37,852][00929] Num frames 3200... -[2023-08-30 10:11:38,044][00929] Num frames 3300... -[2023-08-30 10:11:38,235][00929] Num frames 3400... -[2023-08-30 10:11:38,430][00929] Num frames 3500... -[2023-08-30 10:11:38,605][00929] Avg episode rewards: #0: 43.260, true rewards: #0: 17.760 -[2023-08-30 10:11:38,608][00929] Avg episode reward: 43.260, avg true_objective: 17.760 -[2023-08-30 10:11:38,711][00929] Num frames 3600... -[2023-08-30 10:11:38,903][00929] Num frames 3700... -[2023-08-30 10:11:39,109][00929] Num frames 3800... -[2023-08-30 10:11:39,298][00929] Num frames 3900... -[2023-08-30 10:11:39,493][00929] Num frames 4000... -[2023-08-30 10:11:39,692][00929] Num frames 4100... -[2023-08-30 10:11:39,885][00929] Num frames 4200... -[2023-08-30 10:11:40,115][00929] Avg episode rewards: #0: 33.626, true rewards: #0: 14.293 -[2023-08-30 10:11:40,117][00929] Avg episode reward: 33.626, avg true_objective: 14.293 -[2023-08-30 10:11:40,147][00929] Num frames 4300... -[2023-08-30 10:11:40,330][00929] Num frames 4400... -[2023-08-30 10:11:40,460][00929] Num frames 4500... -[2023-08-30 10:11:40,595][00929] Num frames 4600... -[2023-08-30 10:11:40,725][00929] Num frames 4700... -[2023-08-30 10:11:40,862][00929] Num frames 4800... -[2023-08-30 10:11:40,995][00929] Num frames 4900... -[2023-08-30 10:11:41,123][00929] Num frames 5000... -[2023-08-30 10:11:41,260][00929] Num frames 5100... -[2023-08-30 10:11:41,393][00929] Num frames 5200... -[2023-08-30 10:11:41,528][00929] Num frames 5300... -[2023-08-30 10:11:41,696][00929] Num frames 5400... -[2023-08-30 10:11:41,836][00929] Num frames 5500... -[2023-08-30 10:11:41,968][00929] Num frames 5600... -[2023-08-30 10:11:42,103][00929] Num frames 5700... -[2023-08-30 10:11:42,155][00929] Avg episode rewards: #0: 34.250, true rewards: #0: 14.250 -[2023-08-30 10:11:42,156][00929] Avg episode reward: 34.250, avg true_objective: 14.250 -[2023-08-30 10:11:42,295][00929] Num frames 5800... -[2023-08-30 10:11:42,428][00929] Num frames 5900... -[2023-08-30 10:11:42,556][00929] Num frames 6000... -[2023-08-30 10:11:42,693][00929] Num frames 6100... -[2023-08-30 10:11:42,825][00929] Num frames 6200... -[2023-08-30 10:11:42,958][00929] Num frames 6300... -[2023-08-30 10:11:43,094][00929] Num frames 6400... -[2023-08-30 10:11:43,225][00929] Num frames 6500... -[2023-08-30 10:11:43,370][00929] Num frames 6600... -[2023-08-30 10:11:43,501][00929] Num frames 6700... -[2023-08-30 10:11:43,640][00929] Num frames 6800... -[2023-08-30 10:11:43,774][00929] Num frames 6900... -[2023-08-30 10:11:43,905][00929] Num frames 7000... -[2023-08-30 10:11:43,987][00929] Avg episode rewards: #0: 34.038, true rewards: #0: 14.038 -[2023-08-30 10:11:43,989][00929] Avg episode reward: 34.038, avg true_objective: 14.038 -[2023-08-30 10:11:44,112][00929] Num frames 7100... -[2023-08-30 10:11:44,272][00929] Num frames 7200... -[2023-08-30 10:11:44,430][00929] Avg episode rewards: #0: 29.125, true rewards: #0: 12.125 -[2023-08-30 10:11:44,432][00929] Avg episode reward: 29.125, avg true_objective: 12.125 -[2023-08-30 10:11:44,470][00929] Num frames 7300... -[2023-08-30 10:11:44,613][00929] Num frames 7400... -[2023-08-30 10:11:44,764][00929] Num frames 7500... -[2023-08-30 10:11:44,895][00929] Num frames 7600... -[2023-08-30 10:11:45,027][00929] Num frames 7700... -[2023-08-30 10:11:45,151][00929] Num frames 7800... -[2023-08-30 10:11:45,233][00929] Avg episode rewards: #0: 26.170, true rewards: #0: 11.170 -[2023-08-30 10:11:45,234][00929] Avg episode reward: 26.170, avg true_objective: 11.170 -[2023-08-30 10:11:45,349][00929] Num frames 7900... -[2023-08-30 10:11:45,486][00929] Num frames 8000... -[2023-08-30 10:11:45,610][00929] Num frames 8100... -[2023-08-30 10:11:45,743][00929] Num frames 8200... -[2023-08-30 10:11:45,877][00929] Num frames 8300... -[2023-08-30 10:11:46,009][00929] Num frames 8400... -[2023-08-30 10:11:46,146][00929] Num frames 8500... -[2023-08-30 10:11:46,279][00929] Num frames 8600... -[2023-08-30 10:11:46,417][00929] Num frames 8700... -[2023-08-30 10:11:46,549][00929] Num frames 8800... -[2023-08-30 10:11:46,677][00929] Num frames 8900... -[2023-08-30 10:11:46,815][00929] Num frames 9000... -[2023-08-30 10:11:46,949][00929] Num frames 9100... -[2023-08-30 10:11:47,080][00929] Num frames 9200... -[2023-08-30 10:11:47,223][00929] Num frames 9300... -[2023-08-30 10:11:47,356][00929] Num frames 9400... -[2023-08-30 10:11:47,523][00929] Avg episode rewards: #0: 27.479, true rewards: #0: 11.854 -[2023-08-30 10:11:47,525][00929] Avg episode reward: 27.479, avg true_objective: 11.854 -[2023-08-30 10:11:47,553][00929] Num frames 9500... -[2023-08-30 10:11:47,680][00929] Num frames 9600... -[2023-08-30 10:11:47,823][00929] Num frames 9700... -[2023-08-30 10:11:47,962][00929] Num frames 9800... -[2023-08-30 10:11:48,111][00929] Num frames 9900... -[2023-08-30 10:11:48,251][00929] Num frames 10000... -[2023-08-30 10:11:48,401][00929] Num frames 10100... -[2023-08-30 10:11:48,543][00929] Num frames 10200... -[2023-08-30 10:11:48,681][00929] Num frames 10300... -[2023-08-30 10:11:48,821][00929] Num frames 10400... -[2023-08-30 10:11:48,952][00929] Num frames 10500... -[2023-08-30 10:11:49,089][00929] Num frames 10600... -[2023-08-30 10:11:49,226][00929] Num frames 10700... -[2023-08-30 10:11:49,365][00929] Avg episode rewards: #0: 27.737, true rewards: #0: 11.959 -[2023-08-30 10:11:49,367][00929] Avg episode reward: 27.737, avg true_objective: 11.959 -[2023-08-30 10:11:49,430][00929] Num frames 10800... -[2023-08-30 10:11:49,623][00929] Num frames 10900... -[2023-08-30 10:11:49,840][00929] Num frames 11000... -[2023-08-30 10:11:50,029][00929] Num frames 11100... -[2023-08-30 10:11:50,230][00929] Num frames 11200... -[2023-08-30 10:11:50,447][00929] Num frames 11300... -[2023-08-30 10:11:50,651][00929] Num frames 11400... -[2023-08-30 10:11:50,716][00929] Avg episode rewards: #0: 26.303, true rewards: #0: 11.403 -[2023-08-30 10:11:50,719][00929] Avg episode reward: 26.303, avg true_objective: 11.403 -[2023-08-30 10:13:12,913][00929] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2023-08-30 10:13:48,687][00929] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2023-08-30 10:13:48,689][00929] Overriding arg 'num_workers' with value 1 passed from command line -[2023-08-30 10:13:48,691][00929] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-08-30 10:13:48,693][00929] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-08-30 10:13:48,694][00929] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-08-30 10:13:48,696][00929] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-08-30 10:13:48,697][00929] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2023-08-30 10:13:48,700][00929] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-08-30 10:13:48,701][00929] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2023-08-30 10:13:48,702][00929] Adding new argument 'hf_repository'='AdanLee/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2023-08-30 10:13:48,703][00929] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-08-30 10:13:48,704][00929] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-08-30 10:13:48,705][00929] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-08-30 10:13:48,706][00929] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-08-30 10:13:48,707][00929] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-08-30 10:13:48,754][00929] RunningMeanStd input shape: (3, 72, 128) -[2023-08-30 10:13:48,757][00929] RunningMeanStd input shape: (1,) -[2023-08-30 10:13:48,770][00929] ConvEncoder: input_channels=3 -[2023-08-30 10:13:48,807][00929] Conv encoder output size: 512 -[2023-08-30 10:13:48,809][00929] Policy head output size: 512 -[2023-08-30 10:13:48,827][00929] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-08-30 10:13:49,409][00929] Num frames 100... -[2023-08-30 10:13:49,605][00929] Num frames 200... -[2023-08-30 10:13:49,796][00929] Num frames 300... -[2023-08-30 10:13:49,979][00929] Num frames 400... -[2023-08-30 10:13:50,169][00929] Num frames 500... -[2023-08-30 10:13:50,367][00929] Num frames 600... -[2023-08-30 10:13:50,556][00929] Num frames 700... -[2023-08-30 10:13:50,747][00929] Num frames 800... -[2023-08-30 10:13:50,930][00929] Num frames 900... -[2023-08-30 10:13:51,114][00929] Num frames 1000... -[2023-08-30 10:13:51,304][00929] Num frames 1100... -[2023-08-30 10:13:51,503][00929] Num frames 1200... -[2023-08-30 10:13:51,605][00929] Avg episode rewards: #0: 28.200, true rewards: #0: 12.200 -[2023-08-30 10:13:51,607][00929] Avg episode reward: 28.200, avg true_objective: 12.200 -[2023-08-30 10:13:51,764][00929] Num frames 1300... -[2023-08-30 10:13:51,957][00929] Num frames 1400... -[2023-08-30 10:13:52,142][00929] Num frames 1500... -[2023-08-30 10:13:52,343][00929] Num frames 1600... -[2023-08-30 10:13:52,548][00929] Num frames 1700... -[2023-08-30 10:13:52,742][00929] Num frames 1800... -[2023-08-30 10:13:52,947][00929] Num frames 1900... -[2023-08-30 10:13:53,141][00929] Num frames 2000... -[2023-08-30 10:13:53,358][00929] Avg episode rewards: #0: 23.920, true rewards: #0: 10.420 -[2023-08-30 10:13:53,360][00929] Avg episode reward: 23.920, avg true_objective: 10.420 -[2023-08-30 10:13:53,387][00929] Num frames 2100... -[2023-08-30 10:13:53,539][00929] Num frames 2200... -[2023-08-30 10:13:53,677][00929] Num frames 2300... -[2023-08-30 10:13:53,807][00929] Num frames 2400... -[2023-08-30 10:13:53,937][00929] Num frames 2500... -[2023-08-30 10:13:54,019][00929] Avg episode rewards: #0: 18.397, true rewards: #0: 8.397 -[2023-08-30 10:13:54,021][00929] Avg episode reward: 18.397, avg true_objective: 8.397 -[2023-08-30 10:13:54,137][00929] Num frames 2600... -[2023-08-30 10:13:54,268][00929] Num frames 2700... -[2023-08-30 10:13:54,393][00929] Num frames 2800... -[2023-08-30 10:13:54,531][00929] Num frames 2900... -[2023-08-30 10:13:54,661][00929] Num frames 3000... -[2023-08-30 10:13:54,765][00929] Avg episode rewards: #0: 15.578, true rewards: #0: 7.577 -[2023-08-30 10:13:54,766][00929] Avg episode reward: 15.578, avg true_objective: 7.577 -[2023-08-30 10:13:54,857][00929] Num frames 3100... -[2023-08-30 10:13:54,988][00929] Num frames 3200... -[2023-08-30 10:13:55,121][00929] Num frames 3300... -[2023-08-30 10:13:55,251][00929] Num frames 3400... -[2023-08-30 10:13:55,391][00929] Num frames 3500... -[2023-08-30 10:13:55,540][00929] Num frames 3600... -[2023-08-30 10:13:55,673][00929] Num frames 3700... -[2023-08-30 10:13:55,805][00929] Num frames 3800... -[2023-08-30 10:13:55,935][00929] Num frames 3900... -[2023-08-30 10:13:56,115][00929] Avg episode rewards: #0: 17.582, true rewards: #0: 7.982 -[2023-08-30 10:13:56,117][00929] Avg episode reward: 17.582, avg true_objective: 7.982 -[2023-08-30 10:13:56,135][00929] Num frames 4000... -[2023-08-30 10:13:56,271][00929] Num frames 4100... -[2023-08-30 10:13:56,407][00929] Num frames 4200... -[2023-08-30 10:13:56,558][00929] Num frames 4300... -[2023-08-30 10:13:56,703][00929] Num frames 4400... -[2023-08-30 10:13:56,834][00929] Num frames 4500... -[2023-08-30 10:13:56,966][00929] Num frames 4600... -[2023-08-30 10:13:57,108][00929] Num frames 4700... -[2023-08-30 10:13:57,259][00929] Num frames 4800... -[2023-08-30 10:13:57,390][00929] Num frames 4900... -[2023-08-30 10:13:57,533][00929] Num frames 5000... -[2023-08-30 10:13:57,667][00929] Num frames 5100... -[2023-08-30 10:13:57,796][00929] Num frames 5200... -[2023-08-30 10:13:57,928][00929] Num frames 5300... -[2023-08-30 10:13:58,060][00929] Num frames 5400... -[2023-08-30 10:13:58,193][00929] Num frames 5500... -[2023-08-30 10:13:58,340][00929] Num frames 5600... -[2023-08-30 10:13:58,482][00929] Num frames 5700... -[2023-08-30 10:13:58,619][00929] Num frames 5800... -[2023-08-30 10:13:58,754][00929] Num frames 5900... -[2023-08-30 10:13:58,864][00929] Avg episode rewards: #0: 22.737, true rewards: #0: 9.903 -[2023-08-30 10:13:58,866][00929] Avg episode reward: 22.737, avg true_objective: 9.903 -[2023-08-30 10:13:58,948][00929] Num frames 6000... -[2023-08-30 10:13:59,087][00929] Num frames 6100... -[2023-08-30 10:13:59,214][00929] Num frames 6200... -[2023-08-30 10:13:59,364][00929] Num frames 6300... -[2023-08-30 10:13:59,492][00929] Num frames 6400... -[2023-08-30 10:13:59,633][00929] Num frames 6500... -[2023-08-30 10:13:59,761][00929] Num frames 6600... -[2023-08-30 10:13:59,892][00929] Num frames 6700... -[2023-08-30 10:14:00,023][00929] Num frames 6800... -[2023-08-30 10:14:00,157][00929] Num frames 6900... -[2023-08-30 10:14:00,290][00929] Num frames 7000... -[2023-08-30 10:14:00,386][00929] Avg episode rewards: #0: 23.186, true rewards: #0: 10.043 -[2023-08-30 10:14:00,388][00929] Avg episode reward: 23.186, avg true_objective: 10.043 -[2023-08-30 10:14:00,485][00929] Num frames 7100... -[2023-08-30 10:14:00,626][00929] Num frames 7200... -[2023-08-30 10:14:00,759][00929] Num frames 7300... -[2023-08-30 10:14:00,888][00929] Num frames 7400... -[2023-08-30 10:14:01,041][00929] Avg episode rewards: #0: 21.341, true rewards: #0: 9.341 -[2023-08-30 10:14:01,043][00929] Avg episode reward: 21.341, avg true_objective: 9.341 -[2023-08-30 10:14:01,086][00929] Num frames 7500... -[2023-08-30 10:14:01,230][00929] Num frames 7600... -[2023-08-30 10:14:01,374][00929] Num frames 7700... -[2023-08-30 10:14:01,520][00929] Num frames 7800... -[2023-08-30 10:14:01,671][00929] Num frames 7900... -[2023-08-30 10:14:01,807][00929] Num frames 8000... -[2023-08-30 10:14:01,888][00929] Avg episode rewards: #0: 20.020, true rewards: #0: 8.909 -[2023-08-30 10:14:01,889][00929] Avg episode reward: 20.020, avg true_objective: 8.909 -[2023-08-30 10:14:02,011][00929] Num frames 8100... -[2023-08-30 10:14:02,150][00929] Num frames 8200... -[2023-08-30 10:14:02,291][00929] Num frames 8300... -[2023-08-30 10:14:02,431][00929] Num frames 8400... -[2023-08-30 10:14:02,563][00929] Num frames 8500... -[2023-08-30 10:14:02,702][00929] Num frames 8600... -[2023-08-30 10:14:02,830][00929] Num frames 8700... -[2023-08-30 10:14:02,962][00929] Num frames 8800... -[2023-08-30 10:14:03,089][00929] Num frames 8900... -[2023-08-30 10:14:03,232][00929] Num frames 9000... -[2023-08-30 10:14:03,304][00929] Avg episode rewards: #0: 20.010, true rewards: #0: 9.010 -[2023-08-30 10:14:03,306][00929] Avg episode reward: 20.010, avg true_objective: 9.010 -[2023-08-30 10:15:06,700][00929] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2023-08-30 10:16:44,707][00929] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2023-08-30 10:16:44,708][00929] Overriding arg 'num_workers' with value 1 passed from command line -[2023-08-30 10:16:44,710][00929] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-08-30 10:16:44,712][00929] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-08-30 10:16:44,714][00929] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-08-30 10:16:44,715][00929] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-08-30 10:16:44,716][00929] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2023-08-30 10:16:44,717][00929] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-08-30 10:16:44,718][00929] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2023-08-30 10:16:44,720][00929] Adding new argument 'hf_repository'='AdanLee/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2023-08-30 10:16:44,721][00929] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-08-30 10:16:44,722][00929] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-08-30 10:16:44,723][00929] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-08-30 10:16:44,725][00929] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-08-30 10:16:44,726][00929] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-08-30 10:16:44,766][00929] RunningMeanStd input shape: (3, 72, 128) -[2023-08-30 10:16:44,767][00929] RunningMeanStd input shape: (1,) -[2023-08-30 10:16:44,781][00929] ConvEncoder: input_channels=3 -[2023-08-30 10:16:44,819][00929] Conv encoder output size: 512 -[2023-08-30 10:16:44,821][00929] Policy head output size: 512 -[2023-08-30 10:16:44,841][00929] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-08-30 10:16:45,337][00929] Num frames 100... -[2023-08-30 10:16:45,480][00929] Num frames 200... -[2023-08-30 10:16:45,611][00929] Num frames 300... -[2023-08-30 10:16:45,746][00929] Num frames 400... -[2023-08-30 10:16:45,881][00929] Num frames 500... -[2023-08-30 10:16:46,012][00929] Num frames 600... -[2023-08-30 10:16:46,148][00929] Num frames 700... -[2023-08-30 10:16:46,290][00929] Num frames 800... -[2023-08-30 10:16:46,441][00929] Num frames 900... -[2023-08-30 10:16:46,677][00929] Avg episode rewards: #0: 21.920, true rewards: #0: 9.920 -[2023-08-30 10:16:46,680][00929] Avg episode reward: 21.920, avg true_objective: 9.920 -[2023-08-30 10:16:46,701][00929] Num frames 1000... -[2023-08-30 10:16:46,895][00929] Num frames 1100... -[2023-08-30 10:16:47,098][00929] Num frames 1200... -[2023-08-30 10:16:47,295][00929] Num frames 1300... -[2023-08-30 10:16:47,513][00929] Num frames 1400... -[2023-08-30 10:16:47,722][00929] Num frames 1500... -[2023-08-30 10:16:47,921][00929] Num frames 1600... -[2023-08-30 10:16:48,128][00929] Num frames 1700... -[2023-08-30 10:16:48,335][00929] Num frames 1800... -[2023-08-30 10:16:48,541][00929] Num frames 1900... -[2023-08-30 10:16:48,637][00929] Avg episode rewards: #0: 20.600, true rewards: #0: 9.600 -[2023-08-30 10:16:48,639][00929] Avg episode reward: 20.600, avg true_objective: 9.600 -[2023-08-30 10:16:48,806][00929] Num frames 2000... -[2023-08-30 10:16:49,010][00929] Num frames 2100... -[2023-08-30 10:16:49,207][00929] Num frames 2200... -[2023-08-30 10:16:49,428][00929] Num frames 2300... -[2023-08-30 10:16:49,655][00929] Num frames 2400... -[2023-08-30 10:16:49,865][00929] Num frames 2500... -[2023-08-30 10:16:50,057][00929] Num frames 2600... -[2023-08-30 10:16:50,251][00929] Num frames 2700... -[2023-08-30 10:16:50,453][00929] Num frames 2800... -[2023-08-30 10:16:50,650][00929] Num frames 2900... -[2023-08-30 10:16:50,844][00929] Num frames 3000... -[2023-08-30 10:16:50,977][00929] Num frames 3100... -[2023-08-30 10:16:51,079][00929] Avg episode rewards: #0: 24.454, true rewards: #0: 10.453 -[2023-08-30 10:16:51,080][00929] Avg episode reward: 24.454, avg true_objective: 10.453 -[2023-08-30 10:16:51,174][00929] Num frames 3200... -[2023-08-30 10:16:51,317][00929] Num frames 3300... -[2023-08-30 10:16:51,448][00929] Num frames 3400... -[2023-08-30 10:16:51,590][00929] Num frames 3500... -[2023-08-30 10:16:51,717][00929] Num frames 3600... -[2023-08-30 10:16:51,851][00929] Num frames 3700... -[2023-08-30 10:16:51,987][00929] Num frames 3800... -[2023-08-30 10:16:52,096][00929] Avg episode rewards: #0: 21.850, true rewards: #0: 9.600 -[2023-08-30 10:16:52,098][00929] Avg episode reward: 21.850, avg true_objective: 9.600 -[2023-08-30 10:16:52,197][00929] Num frames 3900... -[2023-08-30 10:16:52,351][00929] Num frames 4000... -[2023-08-30 10:16:52,497][00929] Num frames 4100... -[2023-08-30 10:16:52,644][00929] Num frames 4200... -[2023-08-30 10:16:52,777][00929] Num frames 4300... -[2023-08-30 10:16:52,910][00929] Num frames 4400... -[2023-08-30 10:16:53,046][00929] Num frames 4500... -[2023-08-30 10:16:53,179][00929] Num frames 4600... -[2023-08-30 10:16:53,315][00929] Num frames 4700... -[2023-08-30 10:16:53,462][00929] Num frames 4800... -[2023-08-30 10:16:53,606][00929] Num frames 4900... -[2023-08-30 10:16:53,743][00929] Num frames 5000... -[2023-08-30 10:16:53,876][00929] Num frames 5100... -[2023-08-30 10:16:54,008][00929] Num frames 5200... -[2023-08-30 10:16:54,144][00929] Num frames 5300... -[2023-08-30 10:16:54,277][00929] Num frames 5400... -[2023-08-30 10:16:54,417][00929] Num frames 5500... -[2023-08-30 10:16:54,548][00929] Avg episode rewards: #0: 25.702, true rewards: #0: 11.102 -[2023-08-30 10:16:54,551][00929] Avg episode reward: 25.702, avg true_objective: 11.102 -[2023-08-30 10:16:54,630][00929] Num frames 5600... -[2023-08-30 10:16:54,759][00929] Num frames 5700... -[2023-08-30 10:16:54,891][00929] Num frames 5800... -[2023-08-30 10:16:55,028][00929] Num frames 5900... -[2023-08-30 10:16:55,159][00929] Num frames 6000... -[2023-08-30 10:16:55,322][00929] Avg episode rewards: #0: 23.288, true rewards: #0: 10.122 -[2023-08-30 10:16:55,324][00929] Avg episode reward: 23.288, avg true_objective: 10.122 -[2023-08-30 10:16:55,368][00929] Num frames 6100... -[2023-08-30 10:16:55,508][00929] Num frames 6200... -[2023-08-30 10:16:55,651][00929] Num frames 6300... -[2023-08-30 10:16:55,786][00929] Num frames 6400... -[2023-08-30 10:16:55,919][00929] Num frames 6500... -[2023-08-30 10:16:56,049][00929] Num frames 6600... -[2023-08-30 10:16:56,186][00929] Num frames 6700... -[2023-08-30 10:16:56,321][00929] Num frames 6800... -[2023-08-30 10:16:56,393][00929] Avg episode rewards: #0: 21.870, true rewards: #0: 9.727 -[2023-08-30 10:16:56,396][00929] Avg episode reward: 21.870, avg true_objective: 9.727 -[2023-08-30 10:16:56,533][00929] Num frames 6900... -[2023-08-30 10:16:56,677][00929] Num frames 7000... -[2023-08-30 10:16:56,811][00929] Num frames 7100... -[2023-08-30 10:16:56,943][00929] Num frames 7200... -[2023-08-30 10:16:57,080][00929] Num frames 7300... -[2023-08-30 10:16:57,221][00929] Num frames 7400... -[2023-08-30 10:16:57,358][00929] Num frames 7500... -[2023-08-30 10:16:57,434][00929] Avg episode rewards: #0: 20.641, true rewards: #0: 9.391 -[2023-08-30 10:16:57,437][00929] Avg episode reward: 20.641, avg true_objective: 9.391 -[2023-08-30 10:16:57,552][00929] Num frames 7600... -[2023-08-30 10:16:57,742][00929] Avg episode rewards: #0: 18.551, true rewards: #0: 8.551 -[2023-08-30 10:16:57,744][00929] Avg episode reward: 18.551, avg true_objective: 8.551 -[2023-08-30 10:16:57,756][00929] Num frames 7700... -[2023-08-30 10:16:57,888][00929] Num frames 7800... -[2023-08-30 10:16:58,021][00929] Num frames 7900... -[2023-08-30 10:16:58,162][00929] Num frames 8000... -[2023-08-30 10:16:58,307][00929] Num frames 8100... -[2023-08-30 10:16:58,443][00929] Num frames 8200... -[2023-08-30 10:16:58,579][00929] Num frames 8300... -[2023-08-30 10:16:58,721][00929] Num frames 8400... -[2023-08-30 10:16:58,789][00929] Avg episode rewards: #0: 18.008, true rewards: #0: 8.408 -[2023-08-30 10:16:58,791][00929] Avg episode reward: 18.008, avg true_objective: 8.408 -[2023-08-30 10:17:57,751][00929] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-08-31 04:33:01,977][00354] Heartbeat connected on Batcher_0 +[2023-08-31 04:33:01,988][00354] Heartbeat connected on InferenceWorker_p0-w0 +[2023-08-31 04:33:02,003][00354] Heartbeat connected on RolloutWorker_w0 +[2023-08-31 04:33:02,008][00354] Heartbeat connected on RolloutWorker_w1 +[2023-08-31 04:33:02,013][00354] Heartbeat connected on RolloutWorker_w2 +[2023-08-31 04:33:02,018][00354] Heartbeat connected on RolloutWorker_w3 +[2023-08-31 04:33:02,022][00354] Heartbeat connected on RolloutWorker_w4 +[2023-08-31 04:33:02,027][00354] Heartbeat connected on RolloutWorker_w5 +[2023-08-31 04:33:02,031][00354] Heartbeat connected on RolloutWorker_w6 +[2023-08-31 04:33:02,042][00354] Heartbeat connected on RolloutWorker_w7 +[2023-08-31 04:33:08,529][07777] Using optimizer +[2023-08-31 04:33:08,531][07777] No checkpoints found +[2023-08-31 04:33:08,531][07777] Did not load from checkpoint, starting from scratch! +[2023-08-31 04:33:08,532][07777] Initialized policy 0 weights for model version 0 +[2023-08-31 04:33:08,537][07777] LearnerWorker_p0 finished initialization! +[2023-08-31 04:33:08,538][07777] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-08-31 04:33:08,547][00354] Heartbeat connected on LearnerWorker_p0 +[2023-08-31 04:33:08,731][07790] RunningMeanStd input shape: (3, 72, 128) +[2023-08-31 04:33:08,733][07790] RunningMeanStd input shape: (1,) +[2023-08-31 04:33:08,745][07790] ConvEncoder: input_channels=3 +[2023-08-31 04:33:08,856][07790] Conv encoder output size: 512 +[2023-08-31 04:33:08,856][07790] Policy head output size: 512 +[2023-08-31 04:33:08,976][00354] Inference worker 0-0 is ready! +[2023-08-31 04:33:08,978][00354] All inference workers are ready! Signal rollout workers to start! +[2023-08-31 04:33:09,214][07797] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 04:33:09,216][07793] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 04:33:09,224][07794] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 04:33:09,226][07792] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 04:33:09,221][07795] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 04:33:09,230][07798] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 04:33:09,221][07796] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 04:33:09,219][07791] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 04:33:10,361][00354] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-08-31 04:33:10,788][07792] Decorrelating experience for 0 frames... +[2023-08-31 04:33:10,787][07798] Decorrelating experience for 0 frames... +[2023-08-31 04:33:10,787][07795] Decorrelating experience for 0 frames... +[2023-08-31 04:33:10,792][07796] Decorrelating experience for 0 frames... +[2023-08-31 04:33:10,792][07797] Decorrelating experience for 0 frames... +[2023-08-31 04:33:10,794][07793] Decorrelating experience for 0 frames... +[2023-08-31 04:33:12,096][07797] Decorrelating experience for 32 frames... +[2023-08-31 04:33:12,102][07791] Decorrelating experience for 0 frames... +[2023-08-31 04:33:12,561][07798] Decorrelating experience for 32 frames... +[2023-08-31 04:33:12,566][07792] Decorrelating experience for 32 frames... +[2023-08-31 04:33:12,577][07795] Decorrelating experience for 32 frames... +[2023-08-31 04:33:15,014][07791] Decorrelating experience for 32 frames... +[2023-08-31 04:33:15,016][07793] Decorrelating experience for 32 frames... +[2023-08-31 04:33:15,313][07796] Decorrelating experience for 32 frames... +[2023-08-31 04:33:15,368][00354] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-08-31 04:33:15,654][07794] Decorrelating experience for 0 frames... +[2023-08-31 04:33:16,111][07792] Decorrelating experience for 64 frames... +[2023-08-31 04:33:16,965][07795] Decorrelating experience for 64 frames... +[2023-08-31 04:33:17,312][07794] Decorrelating experience for 32 frames... +[2023-08-31 04:33:17,926][07793] Decorrelating experience for 64 frames... +[2023-08-31 04:33:17,931][07791] Decorrelating experience for 64 frames... +[2023-08-31 04:33:18,015][07797] Decorrelating experience for 64 frames... +[2023-08-31 04:33:18,309][07796] Decorrelating experience for 64 frames... +[2023-08-31 04:33:19,378][07792] Decorrelating experience for 96 frames... +[2023-08-31 04:33:19,414][07793] Decorrelating experience for 96 frames... +[2023-08-31 04:33:19,747][07795] Decorrelating experience for 96 frames... +[2023-08-31 04:33:20,106][07798] Decorrelating experience for 64 frames... +[2023-08-31 04:33:20,114][07794] Decorrelating experience for 64 frames... +[2023-08-31 04:33:20,361][00354] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-08-31 04:33:20,464][07797] Decorrelating experience for 96 frames... +[2023-08-31 04:33:20,895][07796] Decorrelating experience for 96 frames... +[2023-08-31 04:33:22,256][07794] Decorrelating experience for 96 frames... +[2023-08-31 04:33:22,270][07798] Decorrelating experience for 96 frames... +[2023-08-31 04:33:24,660][07791] Decorrelating experience for 96 frames... +[2023-08-31 04:33:25,363][00354] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 124.8. Samples: 1872. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-08-31 04:33:25,372][00354] Avg episode reward: [(0, '2.404')] +[2023-08-31 04:33:25,706][07777] Signal inference workers to stop experience collection... +[2023-08-31 04:33:25,716][07790] InferenceWorker_p0-w0: stopping experience collection +[2023-08-31 04:33:30,361][00354] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 126.0. Samples: 2520. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-08-31 04:33:30,364][00354] Avg episode reward: [(0, '2.577')] +[2023-08-31 04:33:30,710][07777] Signal inference workers to resume experience collection... +[2023-08-31 04:33:30,711][07790] InferenceWorker_p0-w0: resuming experience collection +[2023-08-31 04:33:35,363][00354] Fps is (10 sec: 819.2, 60 sec: 327.7, 300 sec: 327.7). Total num frames: 8192. Throughput: 0: 126.3. Samples: 3158. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2023-08-31 04:33:35,368][00354] Avg episode reward: [(0, '2.830')] +[2023-08-31 04:33:40,361][00354] Fps is (10 sec: 2048.0, 60 sec: 682.7, 300 sec: 682.7). Total num frames: 20480. Throughput: 0: 197.8. Samples: 5934. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2023-08-31 04:33:40,365][00354] Avg episode reward: [(0, '3.499')] +[2023-08-31 04:33:45,154][07790] Updated weights for policy 0, policy_version 10 (0.0030) +[2023-08-31 04:33:45,361][00354] Fps is (10 sec: 3277.3, 60 sec: 1170.3, 300 sec: 1170.3). Total num frames: 40960. Throughput: 0: 307.7. Samples: 10768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:33:45,365][00354] Avg episode reward: [(0, '3.863')] +[2023-08-31 04:33:50,362][00354] Fps is (10 sec: 4095.9, 60 sec: 1536.0, 300 sec: 1536.0). Total num frames: 61440. Throughput: 0: 339.0. Samples: 13560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:33:50,364][00354] Avg episode reward: [(0, '4.594')] +[2023-08-31 04:33:55,361][00354] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 73728. Throughput: 0: 407.0. Samples: 18316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:33:55,375][00354] Avg episode reward: [(0, '4.699')] +[2023-08-31 04:33:58,509][07790] Updated weights for policy 0, policy_version 20 (0.0033) +[2023-08-31 04:34:00,361][00354] Fps is (10 sec: 2457.7, 60 sec: 1720.3, 300 sec: 1720.3). Total num frames: 86016. Throughput: 0: 487.4. Samples: 21928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:34:00,369][00354] Avg episode reward: [(0, '4.608')] +[2023-08-31 04:34:05,361][00354] Fps is (10 sec: 2457.6, 60 sec: 1787.3, 300 sec: 1787.3). Total num frames: 98304. Throughput: 0: 530.2. Samples: 23860. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:34:05,364][00354] Avg episode reward: [(0, '4.309')] +[2023-08-31 04:34:10,362][00354] Fps is (10 sec: 3276.7, 60 sec: 1979.7, 300 sec: 1979.7). Total num frames: 118784. Throughput: 0: 614.1. Samples: 29504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:34:10,364][00354] Avg episode reward: [(0, '4.314')] +[2023-08-31 04:34:10,369][07777] Saving new best policy, reward=4.314! +[2023-08-31 04:34:10,911][07790] Updated weights for policy 0, policy_version 30 (0.0028) +[2023-08-31 04:34:15,361][00354] Fps is (10 sec: 3686.4, 60 sec: 2253.0, 300 sec: 2079.5). Total num frames: 135168. Throughput: 0: 707.7. Samples: 34368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:34:15,365][00354] Avg episode reward: [(0, '4.322')] +[2023-08-31 04:34:15,379][07777] Saving new best policy, reward=4.322! +[2023-08-31 04:34:20,361][00354] Fps is (10 sec: 2867.3, 60 sec: 2457.6, 300 sec: 2106.5). Total num frames: 147456. Throughput: 0: 732.5. Samples: 36120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:34:20,364][00354] Avg episode reward: [(0, '4.344')] +[2023-08-31 04:34:20,375][07777] Saving new best policy, reward=4.344! +[2023-08-31 04:34:25,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2662.5, 300 sec: 2129.9). Total num frames: 159744. Throughput: 0: 748.1. Samples: 39598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:34:25,363][00354] Avg episode reward: [(0, '4.297')] +[2023-08-31 04:34:26,059][07790] Updated weights for policy 0, policy_version 40 (0.0021) +[2023-08-31 04:34:30,365][00354] Fps is (10 sec: 3275.5, 60 sec: 3003.5, 300 sec: 2252.7). Total num frames: 180224. Throughput: 0: 768.7. Samples: 45364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:34:30,368][00354] Avg episode reward: [(0, '4.584')] +[2023-08-31 04:34:30,374][07777] Saving new best policy, reward=4.584! +[2023-08-31 04:34:35,367][00354] Fps is (10 sec: 3684.3, 60 sec: 3140.1, 300 sec: 2312.9). Total num frames: 196608. Throughput: 0: 768.2. Samples: 48132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:34:35,371][00354] Avg episode reward: [(0, '4.545')] +[2023-08-31 04:34:35,380][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000048_196608.pth... +[2023-08-31 04:34:38,674][07790] Updated weights for policy 0, policy_version 50 (0.0021) +[2023-08-31 04:34:40,361][00354] Fps is (10 sec: 2458.6, 60 sec: 3072.0, 300 sec: 2275.6). Total num frames: 204800. Throughput: 0: 746.8. Samples: 51924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:34:40,370][00354] Avg episode reward: [(0, '4.490')] +[2023-08-31 04:34:45,362][00354] Fps is (10 sec: 2049.1, 60 sec: 2935.4, 300 sec: 2285.1). Total num frames: 217088. Throughput: 0: 746.7. Samples: 55532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:34:45,364][00354] Avg episode reward: [(0, '4.381')] +[2023-08-31 04:34:50,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2375.7). Total num frames: 237568. Throughput: 0: 763.0. Samples: 58194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:34:50,363][00354] Avg episode reward: [(0, '4.448')] +[2023-08-31 04:34:51,901][07790] Updated weights for policy 0, policy_version 60 (0.0016) +[2023-08-31 04:34:55,361][00354] Fps is (10 sec: 4096.1, 60 sec: 3072.0, 300 sec: 2457.6). Total num frames: 258048. Throughput: 0: 765.1. Samples: 63934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:34:55,369][00354] Avg episode reward: [(0, '4.388')] +[2023-08-31 04:35:00,362][00354] Fps is (10 sec: 2867.0, 60 sec: 3003.7, 300 sec: 2420.4). Total num frames: 266240. Throughput: 0: 741.7. Samples: 67746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:35:00,366][00354] Avg episode reward: [(0, '4.367')] +[2023-08-31 04:35:05,361][00354] Fps is (10 sec: 2048.0, 60 sec: 3003.7, 300 sec: 2422.0). Total num frames: 278528. Throughput: 0: 734.6. Samples: 69176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:35:05,364][00354] Avg episode reward: [(0, '4.498')] +[2023-08-31 04:35:08,250][07790] Updated weights for policy 0, policy_version 70 (0.0030) +[2023-08-31 04:35:10,361][00354] Fps is (10 sec: 2457.7, 60 sec: 2867.2, 300 sec: 2423.5). Total num frames: 290816. Throughput: 0: 739.2. Samples: 72864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:35:10,364][00354] Avg episode reward: [(0, '4.372')] +[2023-08-31 04:35:15,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2490.4). Total num frames: 311296. Throughput: 0: 736.8. Samples: 78518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:35:15,368][00354] Avg episode reward: [(0, '4.452')] +[2023-08-31 04:35:19,172][07790] Updated weights for policy 0, policy_version 80 (0.0015) +[2023-08-31 04:35:20,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2520.6). Total num frames: 327680. Throughput: 0: 739.2. Samples: 81392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:35:20,368][00354] Avg episode reward: [(0, '4.404')] +[2023-08-31 04:35:25,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2518.3). Total num frames: 339968. Throughput: 0: 737.1. Samples: 85092. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:35:25,366][00354] Avg episode reward: [(0, '4.364')] +[2023-08-31 04:35:30,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2799.1, 300 sec: 2486.9). Total num frames: 348160. Throughput: 0: 719.2. Samples: 87896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:35:30,368][00354] Avg episode reward: [(0, '4.148')] +[2023-08-31 04:35:35,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2730.9, 300 sec: 2485.8). Total num frames: 360448. Throughput: 0: 692.7. Samples: 89366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:35:35,370][00354] Avg episode reward: [(0, '4.218')] +[2023-08-31 04:35:38,595][07790] Updated weights for policy 0, policy_version 90 (0.0036) +[2023-08-31 04:35:40,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2484.9). Total num frames: 372736. Throughput: 0: 650.0. Samples: 93182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:35:40,367][00354] Avg episode reward: [(0, '4.223')] +[2023-08-31 04:35:45,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2510.5). Total num frames: 389120. Throughput: 0: 687.0. Samples: 98662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:35:45,368][00354] Avg episode reward: [(0, '4.611')] +[2023-08-31 04:35:45,398][07777] Saving new best policy, reward=4.611! +[2023-08-31 04:35:50,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2508.8). Total num frames: 401408. Throughput: 0: 693.0. Samples: 100362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:35:50,366][00354] Avg episode reward: [(0, '4.528')] +[2023-08-31 04:35:52,218][07790] Updated weights for policy 0, policy_version 100 (0.0023) +[2023-08-31 04:35:55,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2507.2). Total num frames: 413696. Throughput: 0: 690.0. Samples: 103914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:35:55,364][00354] Avg episode reward: [(0, '4.593')] +[2023-08-31 04:36:00,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2799.0, 300 sec: 2554.0). Total num frames: 434176. Throughput: 0: 678.0. Samples: 109026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:36:00,367][00354] Avg episode reward: [(0, '4.535')] +[2023-08-31 04:36:04,472][07790] Updated weights for policy 0, policy_version 110 (0.0018) +[2023-08-31 04:36:05,361][00354] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2574.6). Total num frames: 450560. Throughput: 0: 676.8. Samples: 111850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:36:05,364][00354] Avg episode reward: [(0, '4.630')] +[2023-08-31 04:36:05,373][07777] Saving new best policy, reward=4.630! +[2023-08-31 04:36:10,363][00354] Fps is (10 sec: 2866.8, 60 sec: 2867.1, 300 sec: 2571.4). Total num frames: 462848. Throughput: 0: 692.6. Samples: 116260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:36:10,366][00354] Avg episode reward: [(0, '4.609')] +[2023-08-31 04:36:15,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2568.3). Total num frames: 475136. Throughput: 0: 706.6. Samples: 119694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:36:15,364][00354] Avg episode reward: [(0, '4.676')] +[2023-08-31 04:36:15,377][07777] Saving new best policy, reward=4.676! +[2023-08-31 04:36:19,575][07790] Updated weights for policy 0, policy_version 120 (0.0028) +[2023-08-31 04:36:20,361][00354] Fps is (10 sec: 2867.6, 60 sec: 2730.7, 300 sec: 2586.9). Total num frames: 491520. Throughput: 0: 721.6. Samples: 121838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:36:20,367][00354] Avg episode reward: [(0, '4.779')] +[2023-08-31 04:36:20,373][07777] Saving new best policy, reward=4.779! +[2023-08-31 04:36:25,361][00354] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2625.6). Total num frames: 512000. Throughput: 0: 763.1. Samples: 127522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:36:25,369][00354] Avg episode reward: [(0, '4.599')] +[2023-08-31 04:36:30,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2621.4). Total num frames: 524288. Throughput: 0: 743.3. Samples: 132112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:36:30,367][00354] Avg episode reward: [(0, '4.499')] +[2023-08-31 04:36:32,724][07790] Updated weights for policy 0, policy_version 130 (0.0037) +[2023-08-31 04:36:35,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2617.4). Total num frames: 536576. Throughput: 0: 744.9. Samples: 133884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:36:35,364][00354] Avg episode reward: [(0, '4.495')] +[2023-08-31 04:36:35,377][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000131_536576.pth... +[2023-08-31 04:36:40,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2633.1). Total num frames: 552960. Throughput: 0: 749.7. Samples: 137652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:36:40,363][00354] Avg episode reward: [(0, '4.514')] +[2023-08-31 04:36:45,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2648.1). Total num frames: 569344. Throughput: 0: 763.5. Samples: 143384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:36:45,364][00354] Avg episode reward: [(0, '4.335')] +[2023-08-31 04:36:45,606][07790] Updated weights for policy 0, policy_version 140 (0.0032) +[2023-08-31 04:36:50,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2662.4). Total num frames: 585728. Throughput: 0: 762.9. Samples: 146182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:36:50,363][00354] Avg episode reward: [(0, '4.347')] +[2023-08-31 04:36:55,369][00354] Fps is (10 sec: 2865.0, 60 sec: 3071.6, 300 sec: 2657.8). Total num frames: 598016. Throughput: 0: 747.6. Samples: 149908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:36:55,378][00354] Avg episode reward: [(0, '4.378')] +[2023-08-31 04:37:00,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2653.5). Total num frames: 610304. Throughput: 0: 754.7. Samples: 153654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:37:00,369][00354] Avg episode reward: [(0, '4.377')] +[2023-08-31 04:37:00,565][07790] Updated weights for policy 0, policy_version 150 (0.0019) +[2023-08-31 04:37:05,361][00354] Fps is (10 sec: 3279.3, 60 sec: 3003.7, 300 sec: 2684.2). Total num frames: 630784. Throughput: 0: 770.6. Samples: 156514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:37:05,368][00354] Avg episode reward: [(0, '4.510')] +[2023-08-31 04:37:10,362][00354] Fps is (10 sec: 3686.3, 60 sec: 3072.1, 300 sec: 2696.5). Total num frames: 647168. Throughput: 0: 770.9. Samples: 162212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:37:10,364][00354] Avg episode reward: [(0, '4.527')] +[2023-08-31 04:37:12,335][07790] Updated weights for policy 0, policy_version 160 (0.0025) +[2023-08-31 04:37:15,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2691.7). Total num frames: 659456. Throughput: 0: 753.2. Samples: 166006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:37:15,372][00354] Avg episode reward: [(0, '4.540')] +[2023-08-31 04:37:20,362][00354] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2687.0). Total num frames: 671744. Throughput: 0: 753.4. Samples: 167786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:37:20,369][00354] Avg episode reward: [(0, '4.520')] +[2023-08-31 04:37:25,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2714.6). Total num frames: 692224. Throughput: 0: 774.3. Samples: 172496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:37:25,367][00354] Avg episode reward: [(0, '4.584')] +[2023-08-31 04:37:26,319][07790] Updated weights for policy 0, policy_version 170 (0.0026) +[2023-08-31 04:37:30,361][00354] Fps is (10 sec: 3686.7, 60 sec: 3072.0, 300 sec: 2725.4). Total num frames: 708608. Throughput: 0: 774.1. Samples: 178220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:37:30,363][00354] Avg episode reward: [(0, '4.547')] +[2023-08-31 04:37:35,363][00354] Fps is (10 sec: 3276.3, 60 sec: 3140.2, 300 sec: 2735.8). Total num frames: 724992. Throughput: 0: 758.0. Samples: 180292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:37:35,365][00354] Avg episode reward: [(0, '4.636')] +[2023-08-31 04:37:40,362][00354] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2715.5). Total num frames: 733184. Throughput: 0: 751.5. Samples: 183720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:37:40,367][00354] Avg episode reward: [(0, '4.569')] +[2023-08-31 04:37:41,140][07790] Updated weights for policy 0, policy_version 180 (0.0018) +[2023-08-31 04:37:45,361][00354] Fps is (10 sec: 2458.0, 60 sec: 3003.7, 300 sec: 2725.7). Total num frames: 749568. Throughput: 0: 763.7. Samples: 188020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:37:45,364][00354] Avg episode reward: [(0, '4.689')] +[2023-08-31 04:37:50,361][00354] Fps is (10 sec: 3686.5, 60 sec: 3072.0, 300 sec: 2750.2). Total num frames: 770048. Throughput: 0: 764.3. Samples: 190908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:37:50,369][00354] Avg episode reward: [(0, '4.843')] +[2023-08-31 04:37:50,375][07777] Saving new best policy, reward=4.843! +[2023-08-31 04:37:52,514][07790] Updated weights for policy 0, policy_version 190 (0.0017) +[2023-08-31 04:37:55,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.4, 300 sec: 2745.0). Total num frames: 782336. Throughput: 0: 750.8. Samples: 195998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:37:55,366][00354] Avg episode reward: [(0, '4.896')] +[2023-08-31 04:37:55,374][07777] Saving new best policy, reward=4.896! +[2023-08-31 04:38:00,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2740.1). Total num frames: 794624. Throughput: 0: 748.5. Samples: 199688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:38:00,365][00354] Avg episode reward: [(0, '4.871')] +[2023-08-31 04:38:05,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2749.2). Total num frames: 811008. Throughput: 0: 748.2. Samples: 201454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:38:05,363][00354] Avg episode reward: [(0, '4.853')] +[2023-08-31 04:38:07,294][07790] Updated weights for policy 0, policy_version 200 (0.0048) +[2023-08-31 04:38:10,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.8, 300 sec: 2804.8). Total num frames: 827392. Throughput: 0: 767.8. Samples: 207046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:38:10,364][00354] Avg episode reward: [(0, '4.814')] +[2023-08-31 04:38:15,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2874.1). Total num frames: 847872. Throughput: 0: 760.5. Samples: 212442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:38:15,364][00354] Avg episode reward: [(0, '4.759')] +[2023-08-31 04:38:20,363][00354] Fps is (10 sec: 2866.8, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 856064. Throughput: 0: 752.8. Samples: 214170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:38:20,367][00354] Avg episode reward: [(0, '4.794')] +[2023-08-31 04:38:20,723][07790] Updated weights for policy 0, policy_version 210 (0.0020) +[2023-08-31 04:38:25,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 868352. Throughput: 0: 741.2. Samples: 217076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:38:25,363][00354] Avg episode reward: [(0, '4.923')] +[2023-08-31 04:38:25,378][07777] Saving new best policy, reward=4.923! +[2023-08-31 04:38:30,361][00354] Fps is (10 sec: 2048.3, 60 sec: 2798.9, 300 sec: 2943.6). Total num frames: 876544. Throughput: 0: 711.5. Samples: 220038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:38:30,368][00354] Avg episode reward: [(0, '4.839')] +[2023-08-31 04:38:35,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2943.6). Total num frames: 888832. Throughput: 0: 686.8. Samples: 221812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:38:35,365][00354] Avg episode reward: [(0, '4.824')] +[2023-08-31 04:38:35,378][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000217_888832.pth... +[2023-08-31 04:38:35,581][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000048_196608.pth +[2023-08-31 04:38:39,072][07790] Updated weights for policy 0, policy_version 220 (0.0016) +[2023-08-31 04:38:40,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 901120. Throughput: 0: 662.3. Samples: 225802. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:38:40,364][00354] Avg episode reward: [(0, '4.797')] +[2023-08-31 04:38:45,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2888.0). Total num frames: 913408. Throughput: 0: 660.6. Samples: 229414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:38:45,368][00354] Avg episode reward: [(0, '4.879')] +[2023-08-31 04:38:50,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2888.0). Total num frames: 925696. Throughput: 0: 661.7. Samples: 231230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:38:50,369][00354] Avg episode reward: [(0, '4.936')] +[2023-08-31 04:38:50,381][07777] Saving new best policy, reward=4.936! +[2023-08-31 04:38:53,627][07790] Updated weights for policy 0, policy_version 230 (0.0041) +[2023-08-31 04:38:55,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2915.8). Total num frames: 946176. Throughput: 0: 655.2. Samples: 236528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:38:55,364][00354] Avg episode reward: [(0, '4.998')] +[2023-08-31 04:38:55,376][07777] Saving new best policy, reward=4.998! +[2023-08-31 04:39:00,369][00354] Fps is (10 sec: 3683.6, 60 sec: 2798.6, 300 sec: 2929.6). Total num frames: 962560. Throughput: 0: 656.8. Samples: 242002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:39:00,377][00354] Avg episode reward: [(0, '4.861')] +[2023-08-31 04:39:05,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2901.9). Total num frames: 974848. Throughput: 0: 657.9. Samples: 243774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:39:05,364][00354] Avg episode reward: [(0, '4.893')] +[2023-08-31 04:39:07,218][07790] Updated weights for policy 0, policy_version 240 (0.0016) +[2023-08-31 04:39:10,362][00354] Fps is (10 sec: 2459.4, 60 sec: 2662.4, 300 sec: 2888.0). Total num frames: 987136. Throughput: 0: 670.7. Samples: 247258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:39:10,367][00354] Avg episode reward: [(0, '4.841')] +[2023-08-31 04:39:15,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2915.8). Total num frames: 1007616. Throughput: 0: 719.6. Samples: 252418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:39:15,369][00354] Avg episode reward: [(0, '4.786')] +[2023-08-31 04:39:19,289][07790] Updated weights for policy 0, policy_version 250 (0.0020) +[2023-08-31 04:39:20,361][00354] Fps is (10 sec: 3686.5, 60 sec: 2799.0, 300 sec: 2929.7). Total num frames: 1024000. Throughput: 0: 743.6. Samples: 255272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:39:20,370][00354] Avg episode reward: [(0, '5.158')] +[2023-08-31 04:39:20,408][07777] Saving new best policy, reward=5.158! +[2023-08-31 04:39:25,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2902.0). Total num frames: 1036288. Throughput: 0: 756.0. Samples: 259824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:39:25,373][00354] Avg episode reward: [(0, '5.509')] +[2023-08-31 04:39:25,472][07777] Saving new best policy, reward=5.509! +[2023-08-31 04:39:30,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2888.1). Total num frames: 1048576. Throughput: 0: 753.7. Samples: 263332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:39:30,371][00354] Avg episode reward: [(0, '5.839')] +[2023-08-31 04:39:30,380][07777] Saving new best policy, reward=5.839! +[2023-08-31 04:39:34,461][07790] Updated weights for policy 0, policy_version 260 (0.0025) +[2023-08-31 04:39:35,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1064960. Throughput: 0: 760.0. Samples: 265432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:39:35,368][00354] Avg episode reward: [(0, '5.710')] +[2023-08-31 04:39:40,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2943.6). Total num frames: 1085440. Throughput: 0: 769.4. Samples: 271152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:39:40,364][00354] Avg episode reward: [(0, '5.408')] +[2023-08-31 04:39:45,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2929.7). Total num frames: 1101824. Throughput: 0: 752.8. Samples: 275874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:39:45,369][00354] Avg episode reward: [(0, '5.435')] +[2023-08-31 04:39:47,016][07790] Updated weights for policy 0, policy_version 270 (0.0033) +[2023-08-31 04:39:50,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2888.0). Total num frames: 1110016. Throughput: 0: 752.4. Samples: 277634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:39:50,365][00354] Avg episode reward: [(0, '5.700')] +[2023-08-31 04:39:55,362][00354] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 1126400. Throughput: 0: 758.6. Samples: 281396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:39:55,364][00354] Avg episode reward: [(0, '6.100')] +[2023-08-31 04:39:55,379][07777] Saving new best policy, reward=6.100! +[2023-08-31 04:40:00,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3004.1, 300 sec: 2929.7). Total num frames: 1142784. Throughput: 0: 761.2. Samples: 286672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:40:00,364][00354] Avg episode reward: [(0, '6.067')] +[2023-08-31 04:40:01,026][07790] Updated weights for policy 0, policy_version 280 (0.0014) +[2023-08-31 04:40:05,361][00354] Fps is (10 sec: 2867.3, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 1155072. Throughput: 0: 738.4. Samples: 288502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:40:05,366][00354] Avg episode reward: [(0, '6.220')] +[2023-08-31 04:40:05,382][07777] Saving new best policy, reward=6.220! +[2023-08-31 04:40:10,363][00354] Fps is (10 sec: 2457.2, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 1167360. Throughput: 0: 718.4. Samples: 292154. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:40:10,370][00354] Avg episode reward: [(0, '5.790')] +[2023-08-31 04:40:15,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1179648. Throughput: 0: 722.9. Samples: 295862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:40:15,368][00354] Avg episode reward: [(0, '5.811')] +[2023-08-31 04:40:16,798][07790] Updated weights for policy 0, policy_version 290 (0.0015) +[2023-08-31 04:40:20,361][00354] Fps is (10 sec: 3277.3, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1200128. Throughput: 0: 738.1. Samples: 298646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:40:20,364][00354] Avg episode reward: [(0, '5.936')] +[2023-08-31 04:40:25,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 1216512. Throughput: 0: 738.2. Samples: 304372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:40:25,368][00354] Avg episode reward: [(0, '5.906')] +[2023-08-31 04:40:29,108][07790] Updated weights for policy 0, policy_version 300 (0.0018) +[2023-08-31 04:40:30,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 1228800. Throughput: 0: 718.6. Samples: 308210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:40:30,364][00354] Avg episode reward: [(0, '5.980')] +[2023-08-31 04:40:35,365][00354] Fps is (10 sec: 2456.8, 60 sec: 2935.3, 300 sec: 2943.5). Total num frames: 1241088. Throughput: 0: 719.3. Samples: 310004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:40:35,367][00354] Avg episode reward: [(0, '5.919')] +[2023-08-31 04:40:35,377][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000303_1241088.pth... +[2023-08-31 04:40:35,545][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000131_536576.pth +[2023-08-31 04:40:40,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2943.6). Total num frames: 1257472. Throughput: 0: 737.3. Samples: 314572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:40:40,369][00354] Avg episode reward: [(0, '5.973')] +[2023-08-31 04:40:42,558][07790] Updated weights for policy 0, policy_version 310 (0.0024) +[2023-08-31 04:40:45,361][00354] Fps is (10 sec: 3687.6, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 1277952. Throughput: 0: 746.3. Samples: 320256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:40:45,363][00354] Avg episode reward: [(0, '6.291')] +[2023-08-31 04:40:45,370][07777] Saving new best policy, reward=6.291! +[2023-08-31 04:40:50,364][00354] Fps is (10 sec: 3276.0, 60 sec: 3003.6, 300 sec: 2971.3). Total num frames: 1290240. Throughput: 0: 752.9. Samples: 322386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:40:50,367][00354] Avg episode reward: [(0, '6.568')] +[2023-08-31 04:40:50,371][07777] Saving new best policy, reward=6.568! +[2023-08-31 04:40:55,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 1302528. Throughput: 0: 751.3. Samples: 325962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:40:55,368][00354] Avg episode reward: [(0, '6.552')] +[2023-08-31 04:40:57,694][07790] Updated weights for policy 0, policy_version 320 (0.0045) +[2023-08-31 04:41:00,361][00354] Fps is (10 sec: 2867.9, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 1318912. Throughput: 0: 769.3. Samples: 330480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:41:00,369][00354] Avg episode reward: [(0, '6.619')] +[2023-08-31 04:41:00,373][07777] Saving new best policy, reward=6.619! +[2023-08-31 04:41:05,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2971.4). Total num frames: 1339392. Throughput: 0: 769.6. Samples: 333276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:41:05,364][00354] Avg episode reward: [(0, '6.591')] +[2023-08-31 04:41:09,089][07790] Updated weights for policy 0, policy_version 330 (0.0013) +[2023-08-31 04:41:10,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 2971.3). Total num frames: 1351680. Throughput: 0: 757.2. Samples: 338444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:41:10,366][00354] Avg episode reward: [(0, '6.670')] +[2023-08-31 04:41:10,370][07777] Saving new best policy, reward=6.670! +[2023-08-31 04:41:15,363][00354] Fps is (10 sec: 2457.2, 60 sec: 3071.9, 300 sec: 2957.4). Total num frames: 1363968. Throughput: 0: 751.2. Samples: 342016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:41:15,365][00354] Avg episode reward: [(0, '6.674')] +[2023-08-31 04:41:15,381][07777] Saving new best policy, reward=6.674! +[2023-08-31 04:41:20,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 1380352. Throughput: 0: 750.4. Samples: 343770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:41:20,369][00354] Avg episode reward: [(0, '6.656')] +[2023-08-31 04:41:24,203][07790] Updated weights for policy 0, policy_version 340 (0.0038) +[2023-08-31 04:41:25,361][00354] Fps is (10 sec: 2867.6, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 1392640. Throughput: 0: 756.3. Samples: 348604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:41:25,367][00354] Avg episode reward: [(0, '6.564')] +[2023-08-31 04:41:30,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 1404928. Throughput: 0: 705.9. Samples: 352020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:41:30,364][00354] Avg episode reward: [(0, '6.914')] +[2023-08-31 04:41:30,366][07777] Saving new best policy, reward=6.914! +[2023-08-31 04:41:35,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2867.4, 300 sec: 2915.8). Total num frames: 1413120. Throughput: 0: 688.8. Samples: 353382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:41:35,365][00354] Avg episode reward: [(0, '6.919')] +[2023-08-31 04:41:35,385][07777] Saving new best policy, reward=6.919! +[2023-08-31 04:41:40,362][00354] Fps is (10 sec: 1638.2, 60 sec: 2730.6, 300 sec: 2888.0). Total num frames: 1421312. Throughput: 0: 672.9. Samples: 356242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:41:40,370][00354] Avg episode reward: [(0, '6.562')] +[2023-08-31 04:41:44,011][07790] Updated weights for policy 0, policy_version 350 (0.0033) +[2023-08-31 04:41:45,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2888.0). Total num frames: 1437696. Throughput: 0: 649.1. Samples: 359688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:41:45,363][00354] Avg episode reward: [(0, '6.604')] +[2023-08-31 04:41:50,361][00354] Fps is (10 sec: 3277.1, 60 sec: 2730.8, 300 sec: 2902.0). Total num frames: 1454080. Throughput: 0: 649.6. Samples: 362508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:41:50,364][00354] Avg episode reward: [(0, '6.657')] +[2023-08-31 04:41:55,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 1470464. Throughput: 0: 661.2. Samples: 368198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:41:55,372][00354] Avg episode reward: [(0, '6.735')] +[2023-08-31 04:41:55,387][07790] Updated weights for policy 0, policy_version 360 (0.0021) +[2023-08-31 04:42:00,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2888.0). Total num frames: 1482752. Throughput: 0: 667.7. Samples: 372060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:42:00,367][00354] Avg episode reward: [(0, '6.835')] +[2023-08-31 04:42:05,362][00354] Fps is (10 sec: 2457.5, 60 sec: 2594.1, 300 sec: 2874.1). Total num frames: 1495040. Throughput: 0: 667.6. Samples: 373814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:42:05,370][00354] Avg episode reward: [(0, '6.787')] +[2023-08-31 04:42:09,962][07790] Updated weights for policy 0, policy_version 370 (0.0039) +[2023-08-31 04:42:10,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2901.9). Total num frames: 1515520. Throughput: 0: 664.4. Samples: 378502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:42:10,364][00354] Avg episode reward: [(0, '6.746')] +[2023-08-31 04:42:15,362][00354] Fps is (10 sec: 3686.5, 60 sec: 2799.0, 300 sec: 2915.8). Total num frames: 1531904. Throughput: 0: 715.2. Samples: 384204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:42:15,364][00354] Avg episode reward: [(0, '6.829')] +[2023-08-31 04:42:20,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2901.9). Total num frames: 1548288. Throughput: 0: 729.5. Samples: 386210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:42:20,364][00354] Avg episode reward: [(0, '7.095')] +[2023-08-31 04:42:20,370][07777] Saving new best policy, reward=7.095! +[2023-08-31 04:42:23,935][07790] Updated weights for policy 0, policy_version 380 (0.0025) +[2023-08-31 04:42:25,364][00354] Fps is (10 sec: 2457.0, 60 sec: 2730.5, 300 sec: 2874.1). Total num frames: 1556480. Throughput: 0: 740.8. Samples: 389578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:42:25,381][00354] Avg episode reward: [(0, '7.049')] +[2023-08-31 04:42:30,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2874.2). Total num frames: 1572864. Throughput: 0: 767.2. Samples: 394214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:42:30,368][00354] Avg episode reward: [(0, '7.478')] +[2023-08-31 04:42:30,371][07777] Saving new best policy, reward=7.478! +[2023-08-31 04:42:35,362][00354] Fps is (10 sec: 3687.1, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 1593344. Throughput: 0: 767.6. Samples: 397050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:42:35,367][00354] Avg episode reward: [(0, '7.861')] +[2023-08-31 04:42:35,385][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000389_1593344.pth... +[2023-08-31 04:42:35,512][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000217_888832.pth +[2023-08-31 04:42:35,533][07777] Saving new best policy, reward=7.861! +[2023-08-31 04:42:36,187][07790] Updated weights for policy 0, policy_version 390 (0.0034) +[2023-08-31 04:42:40,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 2901.9). Total num frames: 1605632. Throughput: 0: 750.5. Samples: 401970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:42:40,364][00354] Avg episode reward: [(0, '7.974')] +[2023-08-31 04:42:40,366][07777] Saving new best policy, reward=7.974! +[2023-08-31 04:42:45,363][00354] Fps is (10 sec: 2457.4, 60 sec: 3003.7, 300 sec: 2874.1). Total num frames: 1617920. Throughput: 0: 746.6. Samples: 405660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:42:45,373][00354] Avg episode reward: [(0, '8.156')] +[2023-08-31 04:42:45,385][07777] Saving new best policy, reward=8.156! +[2023-08-31 04:42:50,364][00354] Fps is (10 sec: 2866.5, 60 sec: 3003.6, 300 sec: 2888.0). Total num frames: 1634304. Throughput: 0: 746.1. Samples: 407392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:42:50,367][00354] Avg episode reward: [(0, '8.154')] +[2023-08-31 04:42:51,128][07790] Updated weights for policy 0, policy_version 400 (0.0033) +[2023-08-31 04:42:55,361][00354] Fps is (10 sec: 3277.3, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 1650688. Throughput: 0: 763.4. Samples: 412856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:42:55,364][00354] Avg episode reward: [(0, '7.737')] +[2023-08-31 04:43:00,361][00354] Fps is (10 sec: 3277.6, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 1667072. Throughput: 0: 751.8. Samples: 418036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:43:00,364][00354] Avg episode reward: [(0, '8.284')] +[2023-08-31 04:43:00,370][07777] Saving new best policy, reward=8.284! +[2023-08-31 04:43:04,025][07790] Updated weights for policy 0, policy_version 410 (0.0021) +[2023-08-31 04:43:05,364][00354] Fps is (10 sec: 2866.4, 60 sec: 3071.9, 300 sec: 2888.0). Total num frames: 1679360. Throughput: 0: 745.7. Samples: 419770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:43:05,368][00354] Avg episode reward: [(0, '8.276')] +[2023-08-31 04:43:10,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2860.3). Total num frames: 1691648. Throughput: 0: 747.9. Samples: 423232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:43:10,364][00354] Avg episode reward: [(0, '8.777')] +[2023-08-31 04:43:10,367][07777] Saving new best policy, reward=8.777! +[2023-08-31 04:43:15,361][00354] Fps is (10 sec: 3277.7, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 1712128. Throughput: 0: 763.5. Samples: 428570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:43:15,364][00354] Avg episode reward: [(0, '9.591')] +[2023-08-31 04:43:15,375][07777] Saving new best policy, reward=9.591! +[2023-08-31 04:43:17,315][07790] Updated weights for policy 0, policy_version 420 (0.0025) +[2023-08-31 04:43:20,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1724416. Throughput: 0: 756.6. Samples: 431096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:43:20,364][00354] Avg episode reward: [(0, '10.149')] +[2023-08-31 04:43:20,369][07777] Saving new best policy, reward=10.149! +[2023-08-31 04:43:25,362][00354] Fps is (10 sec: 2457.5, 60 sec: 3003.8, 300 sec: 2915.8). Total num frames: 1736704. Throughput: 0: 720.8. Samples: 434408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:43:25,365][00354] Avg episode reward: [(0, '10.121')] +[2023-08-31 04:43:30,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1748992. Throughput: 0: 717.1. Samples: 437930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:43:30,368][00354] Avg episode reward: [(0, '10.382')] +[2023-08-31 04:43:30,370][07777] Saving new best policy, reward=10.382! +[2023-08-31 04:43:33,644][07790] Updated weights for policy 0, policy_version 430 (0.0023) +[2023-08-31 04:43:35,361][00354] Fps is (10 sec: 2867.4, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 1765376. Throughput: 0: 727.8. Samples: 440142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:43:35,367][00354] Avg episode reward: [(0, '10.456')] +[2023-08-31 04:43:35,380][07777] Saving new best policy, reward=10.456! +[2023-08-31 04:43:40,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 1785856. Throughput: 0: 730.9. Samples: 445746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:43:40,364][00354] Avg episode reward: [(0, '10.494')] +[2023-08-31 04:43:40,366][07777] Saving new best policy, reward=10.494! +[2023-08-31 04:43:45,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.8, 300 sec: 2957.5). Total num frames: 1798144. Throughput: 0: 713.0. Samples: 450120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:43:45,367][00354] Avg episode reward: [(0, '10.984')] +[2023-08-31 04:43:45,377][07777] Saving new best policy, reward=10.984! +[2023-08-31 04:43:46,710][07790] Updated weights for policy 0, policy_version 440 (0.0019) +[2023-08-31 04:43:50,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.6, 300 sec: 2929.7). Total num frames: 1810432. Throughput: 0: 711.8. Samples: 451798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:43:50,364][00354] Avg episode reward: [(0, '10.851')] +[2023-08-31 04:43:55,362][00354] Fps is (10 sec: 2457.5, 60 sec: 2867.2, 300 sec: 2915.9). Total num frames: 1822720. Throughput: 0: 722.6. Samples: 455748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:43:55,368][00354] Avg episode reward: [(0, '12.241')] +[2023-08-31 04:43:55,381][07777] Saving new best policy, reward=12.241! +[2023-08-31 04:43:59,967][07790] Updated weights for policy 0, policy_version 450 (0.0019) +[2023-08-31 04:44:00,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 1843200. Throughput: 0: 727.7. Samples: 461318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:44:00,364][00354] Avg episode reward: [(0, '12.878')] +[2023-08-31 04:44:00,368][07777] Saving new best policy, reward=12.878! +[2023-08-31 04:44:05,362][00354] Fps is (10 sec: 3686.1, 60 sec: 3003.8, 300 sec: 2957.4). Total num frames: 1859584. Throughput: 0: 729.5. Samples: 463922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:44:05,367][00354] Avg episode reward: [(0, '13.439')] +[2023-08-31 04:44:05,383][07777] Saving new best policy, reward=13.439! +[2023-08-31 04:44:10,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1867776. Throughput: 0: 729.2. Samples: 467222. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:44:10,366][00354] Avg episode reward: [(0, '14.106')] +[2023-08-31 04:44:10,372][07777] Saving new best policy, reward=14.106! +[2023-08-31 04:44:15,361][00354] Fps is (10 sec: 2048.2, 60 sec: 2798.9, 300 sec: 2901.9). Total num frames: 1880064. Throughput: 0: 729.7. Samples: 470766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:44:15,369][00354] Avg episode reward: [(0, '14.897')] +[2023-08-31 04:44:15,382][07777] Saving new best policy, reward=14.897! +[2023-08-31 04:44:15,965][07790] Updated weights for policy 0, policy_version 460 (0.0015) +[2023-08-31 04:44:20,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 1896448. Throughput: 0: 740.0. Samples: 473442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:44:20,365][00354] Avg episode reward: [(0, '15.167')] +[2023-08-31 04:44:20,439][07777] Saving new best policy, reward=15.167! +[2023-08-31 04:44:25,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 1912832. Throughput: 0: 732.3. Samples: 478700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:44:25,364][00354] Avg episode reward: [(0, '15.214')] +[2023-08-31 04:44:25,376][07777] Saving new best policy, reward=15.214! +[2023-08-31 04:44:29,751][07790] Updated weights for policy 0, policy_version 470 (0.0052) +[2023-08-31 04:44:30,367][00354] Fps is (10 sec: 2865.6, 60 sec: 2935.2, 300 sec: 2915.7). Total num frames: 1925120. Throughput: 0: 710.0. Samples: 482072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:44:30,369][00354] Avg episode reward: [(0, '15.754')] +[2023-08-31 04:44:30,371][07777] Saving new best policy, reward=15.754! +[2023-08-31 04:44:35,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1937408. Throughput: 0: 710.2. Samples: 483756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:44:35,364][00354] Avg episode reward: [(0, '15.769')] +[2023-08-31 04:44:35,377][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000473_1937408.pth... +[2023-08-31 04:44:35,501][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000303_1241088.pth +[2023-08-31 04:44:35,509][07777] Saving new best policy, reward=15.769! +[2023-08-31 04:44:40,361][00354] Fps is (10 sec: 2868.8, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 1953792. Throughput: 0: 719.8. Samples: 488138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:44:40,364][00354] Avg episode reward: [(0, '14.934')] +[2023-08-31 04:44:43,469][07790] Updated weights for policy 0, policy_version 480 (0.0022) +[2023-08-31 04:44:45,364][00354] Fps is (10 sec: 3276.0, 60 sec: 2867.1, 300 sec: 2915.8). Total num frames: 1970176. Throughput: 0: 716.7. Samples: 493572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:44:45,367][00354] Avg episode reward: [(0, '15.837')] +[2023-08-31 04:44:45,380][07777] Saving new best policy, reward=15.837! +[2023-08-31 04:44:50,368][00354] Fps is (10 sec: 2865.3, 60 sec: 2866.9, 300 sec: 2901.9). Total num frames: 1982464. Throughput: 0: 698.3. Samples: 495348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:44:50,372][00354] Avg episode reward: [(0, '16.077')] +[2023-08-31 04:44:50,374][07777] Saving new best policy, reward=16.077! +[2023-08-31 04:44:55,365][00354] Fps is (10 sec: 2457.3, 60 sec: 2867.0, 300 sec: 2888.0). Total num frames: 1994752. Throughput: 0: 701.0. Samples: 498770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:44:55,368][00354] Avg episode reward: [(0, '16.684')] +[2023-08-31 04:44:55,382][07777] Saving new best policy, reward=16.684! +[2023-08-31 04:45:00,361][00354] Fps is (10 sec: 2049.3, 60 sec: 2662.4, 300 sec: 2874.1). Total num frames: 2002944. Throughput: 0: 692.6. Samples: 501934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:45:00,368][00354] Avg episode reward: [(0, '17.156')] +[2023-08-31 04:45:00,444][07790] Updated weights for policy 0, policy_version 490 (0.0017) +[2023-08-31 04:45:00,445][07777] Saving new best policy, reward=17.156! +[2023-08-31 04:45:05,361][00354] Fps is (10 sec: 2868.3, 60 sec: 2730.7, 300 sec: 2901.9). Total num frames: 2023424. Throughput: 0: 692.6. Samples: 504610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:45:05,364][00354] Avg episode reward: [(0, '18.259')] +[2023-08-31 04:45:05,378][07777] Saving new best policy, reward=18.259! +[2023-08-31 04:45:10,361][00354] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 2039808. Throughput: 0: 701.8. Samples: 510282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:45:10,364][00354] Avg episode reward: [(0, '18.110')] +[2023-08-31 04:45:12,616][07790] Updated weights for policy 0, policy_version 500 (0.0021) +[2023-08-31 04:45:15,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 2052096. Throughput: 0: 709.0. Samples: 513972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:45:15,369][00354] Avg episode reward: [(0, '18.111')] +[2023-08-31 04:45:20,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2874.1). Total num frames: 2064384. Throughput: 0: 711.8. Samples: 515786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:45:20,369][00354] Avg episode reward: [(0, '18.537')] +[2023-08-31 04:45:20,373][07777] Saving new best policy, reward=18.537! +[2023-08-31 04:45:25,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2901.9). Total num frames: 2084864. Throughput: 0: 728.3. Samples: 520912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:45:25,364][00354] Avg episode reward: [(0, '19.161')] +[2023-08-31 04:45:25,373][07777] Saving new best policy, reward=19.161! +[2023-08-31 04:45:25,945][07790] Updated weights for policy 0, policy_version 510 (0.0021) +[2023-08-31 04:45:30,361][00354] Fps is (10 sec: 3686.4, 60 sec: 2935.7, 300 sec: 2915.8). Total num frames: 2101248. Throughput: 0: 736.4. Samples: 526706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:45:30,364][00354] Avg episode reward: [(0, '17.728')] +[2023-08-31 04:45:35,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 2113536. Throughput: 0: 736.3. Samples: 528476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:45:35,364][00354] Avg episode reward: [(0, '17.419')] +[2023-08-31 04:45:40,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2874.1). Total num frames: 2125824. Throughput: 0: 738.5. Samples: 531998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:45:40,364][00354] Avg episode reward: [(0, '17.814')] +[2023-08-31 04:45:40,680][07790] Updated weights for policy 0, policy_version 520 (0.0024) +[2023-08-31 04:45:45,363][00354] Fps is (10 sec: 3276.4, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 2146304. Throughput: 0: 779.1. Samples: 536996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:45:45,365][00354] Avg episode reward: [(0, '17.815')] +[2023-08-31 04:45:50,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3004.1, 300 sec: 2915.8). Total num frames: 2162688. Throughput: 0: 781.6. Samples: 539784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:45:50,368][00354] Avg episode reward: [(0, '17.877')] +[2023-08-31 04:45:52,231][07790] Updated weights for policy 0, policy_version 530 (0.0025) +[2023-08-31 04:45:55,361][00354] Fps is (10 sec: 2867.6, 60 sec: 3003.9, 300 sec: 2901.9). Total num frames: 2174976. Throughput: 0: 759.2. Samples: 544444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:45:55,369][00354] Avg episode reward: [(0, '17.787')] +[2023-08-31 04:46:00,362][00354] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2874.1). Total num frames: 2187264. Throughput: 0: 756.6. Samples: 548020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:46:00,363][00354] Avg episode reward: [(0, '18.787')] +[2023-08-31 04:46:05,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2888.0). Total num frames: 2203648. Throughput: 0: 760.6. Samples: 550014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:46:05,366][00354] Avg episode reward: [(0, '18.156')] +[2023-08-31 04:46:06,646][07790] Updated weights for policy 0, policy_version 540 (0.0038) +[2023-08-31 04:46:10,361][00354] Fps is (10 sec: 3686.5, 60 sec: 3072.0, 300 sec: 2915.8). Total num frames: 2224128. Throughput: 0: 772.3. Samples: 555666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:46:10,366][00354] Avg episode reward: [(0, '19.028')] +[2023-08-31 04:46:15,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 2236416. Throughput: 0: 750.4. Samples: 560476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:46:15,365][00354] Avg episode reward: [(0, '19.899')] +[2023-08-31 04:46:15,381][07777] Saving new best policy, reward=19.899! +[2023-08-31 04:46:20,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 2248704. Throughput: 0: 748.2. Samples: 562144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:46:20,364][00354] Avg episode reward: [(0, '20.640')] +[2023-08-31 04:46:20,374][07777] Saving new best policy, reward=20.640! +[2023-08-31 04:46:20,992][07790] Updated weights for policy 0, policy_version 550 (0.0025) +[2023-08-31 04:46:25,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 2265088. Throughput: 0: 749.5. Samples: 565724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:46:25,368][00354] Avg episode reward: [(0, '20.507')] +[2023-08-31 04:46:30,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 2281472. Throughput: 0: 765.7. Samples: 571452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:46:30,364][00354] Avg episode reward: [(0, '20.670')] +[2023-08-31 04:46:30,370][07777] Saving new best policy, reward=20.670! +[2023-08-31 04:46:32,790][07790] Updated weights for policy 0, policy_version 560 (0.0023) +[2023-08-31 04:46:35,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 2297856. Throughput: 0: 765.6. Samples: 574234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:46:35,364][00354] Avg episode reward: [(0, '20.652')] +[2023-08-31 04:46:35,384][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000561_2297856.pth... +[2023-08-31 04:46:35,545][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000389_1593344.pth +[2023-08-31 04:46:40,364][00354] Fps is (10 sec: 2866.5, 60 sec: 3071.9, 300 sec: 2957.4). Total num frames: 2310144. Throughput: 0: 747.0. Samples: 578062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:46:40,367][00354] Avg episode reward: [(0, '21.272')] +[2023-08-31 04:46:40,370][07777] Saving new best policy, reward=21.272! +[2023-08-31 04:46:45,362][00354] Fps is (10 sec: 2457.5, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 2322432. Throughput: 0: 746.5. Samples: 581612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:46:45,364][00354] Avg episode reward: [(0, '20.506')] +[2023-08-31 04:46:47,880][07790] Updated weights for policy 0, policy_version 570 (0.0016) +[2023-08-31 04:46:50,361][00354] Fps is (10 sec: 3277.6, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 2342912. Throughput: 0: 762.3. Samples: 584316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:46:50,364][00354] Avg episode reward: [(0, '20.206')] +[2023-08-31 04:46:55,364][00354] Fps is (10 sec: 3685.6, 60 sec: 3071.9, 300 sec: 2971.3). Total num frames: 2359296. Throughput: 0: 764.7. Samples: 590078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:46:55,371][00354] Avg episode reward: [(0, '20.107')] +[2023-08-31 04:47:00,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 2371584. Throughput: 0: 748.9. Samples: 594176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:47:00,369][00354] Avg episode reward: [(0, '20.018')] +[2023-08-31 04:47:00,520][07790] Updated weights for policy 0, policy_version 580 (0.0014) +[2023-08-31 04:47:05,365][00354] Fps is (10 sec: 2457.4, 60 sec: 3003.6, 300 sec: 2943.5). Total num frames: 2383872. Throughput: 0: 751.2. Samples: 595952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:47:05,367][00354] Avg episode reward: [(0, '20.507')] +[2023-08-31 04:47:10,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 2404352. Throughput: 0: 769.8. Samples: 600366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:47:10,364][00354] Avg episode reward: [(0, '20.066')] +[2023-08-31 04:47:13,596][07790] Updated weights for policy 0, policy_version 590 (0.0022) +[2023-08-31 04:47:15,361][00354] Fps is (10 sec: 3687.7, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 2420736. Throughput: 0: 771.0. Samples: 606146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:47:15,367][00354] Avg episode reward: [(0, '19.159')] +[2023-08-31 04:47:20,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2971.4). Total num frames: 2433024. Throughput: 0: 763.7. Samples: 608602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:47:20,364][00354] Avg episode reward: [(0, '19.183')] +[2023-08-31 04:47:25,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 2445312. Throughput: 0: 741.7. Samples: 611436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:47:25,364][00354] Avg episode reward: [(0, '18.796')] +[2023-08-31 04:47:30,362][00354] Fps is (10 sec: 2047.8, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 2453504. Throughput: 0: 724.3. Samples: 614208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:47:30,367][00354] Avg episode reward: [(0, '18.458')] +[2023-08-31 04:47:31,783][07790] Updated weights for policy 0, policy_version 600 (0.0016) +[2023-08-31 04:47:35,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 2465792. Throughput: 0: 698.4. Samples: 615744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:47:35,369][00354] Avg episode reward: [(0, '18.006')] +[2023-08-31 04:47:40,361][00354] Fps is (10 sec: 3277.1, 60 sec: 2935.6, 300 sec: 2943.6). Total num frames: 2486272. Throughput: 0: 683.6. Samples: 620836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:47:40,363][00354] Avg episode reward: [(0, '18.443')] +[2023-08-31 04:47:43,829][07790] Updated weights for policy 0, policy_version 610 (0.0025) +[2023-08-31 04:47:45,362][00354] Fps is (10 sec: 3276.7, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 2498560. Throughput: 0: 706.0. Samples: 625948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:47:45,368][00354] Avg episode reward: [(0, '18.832')] +[2023-08-31 04:47:50,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 2510848. Throughput: 0: 706.1. Samples: 627726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:47:50,370][00354] Avg episode reward: [(0, '18.870')] +[2023-08-31 04:47:55,361][00354] Fps is (10 sec: 2867.3, 60 sec: 2799.1, 300 sec: 2915.8). Total num frames: 2527232. Throughput: 0: 687.2. Samples: 631288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:47:55,369][00354] Avg episode reward: [(0, '18.433')] +[2023-08-31 04:47:58,400][07790] Updated weights for policy 0, policy_version 620 (0.0025) +[2023-08-31 04:48:00,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 2543616. Throughput: 0: 681.9. Samples: 636832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:48:00,374][00354] Avg episode reward: [(0, '18.325')] +[2023-08-31 04:48:05,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.9, 300 sec: 2957.5). Total num frames: 2564096. Throughput: 0: 690.7. Samples: 639682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:48:05,368][00354] Avg episode reward: [(0, '17.072')] +[2023-08-31 04:48:10,368][00354] Fps is (10 sec: 2865.4, 60 sec: 2798.6, 300 sec: 2915.7). Total num frames: 2572288. Throughput: 0: 719.9. Samples: 643834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:48:10,379][00354] Avg episode reward: [(0, '16.775')] +[2023-08-31 04:48:12,609][07790] Updated weights for policy 0, policy_version 630 (0.0024) +[2023-08-31 04:48:15,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2915.8). Total num frames: 2584576. Throughput: 0: 727.2. Samples: 646932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:48:15,367][00354] Avg episode reward: [(0, '17.388')] +[2023-08-31 04:48:20,361][00354] Fps is (10 sec: 2869.0, 60 sec: 2798.9, 300 sec: 2929.7). Total num frames: 2600960. Throughput: 0: 734.3. Samples: 648788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:48:20,368][00354] Avg episode reward: [(0, '18.010')] +[2023-08-31 04:48:25,362][00354] Fps is (10 sec: 3276.7, 60 sec: 2867.2, 300 sec: 2943.6). Total num frames: 2617344. Throughput: 0: 748.3. Samples: 654508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:48:25,369][00354] Avg episode reward: [(0, '19.122')] +[2023-08-31 04:48:25,541][07790] Updated weights for policy 0, policy_version 640 (0.0030) +[2023-08-31 04:48:30,362][00354] Fps is (10 sec: 3276.7, 60 sec: 3003.8, 300 sec: 2943.6). Total num frames: 2633728. Throughput: 0: 743.8. Samples: 659418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:48:30,364][00354] Avg episode reward: [(0, '20.998')] +[2023-08-31 04:48:35,366][00354] Fps is (10 sec: 2865.8, 60 sec: 3003.5, 300 sec: 2915.7). Total num frames: 2646016. Throughput: 0: 743.7. Samples: 661198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:48:35,369][00354] Avg episode reward: [(0, '21.082')] +[2023-08-31 04:48:35,386][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000646_2646016.pth... +[2023-08-31 04:48:35,581][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000473_1937408.pth +[2023-08-31 04:48:40,361][00354] Fps is (10 sec: 2457.7, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 2658304. Throughput: 0: 742.2. Samples: 664686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:48:40,367][00354] Avg episode reward: [(0, '21.686')] +[2023-08-31 04:48:40,370][07777] Saving new best policy, reward=21.686! +[2023-08-31 04:48:40,708][07790] Updated weights for policy 0, policy_version 650 (0.0028) +[2023-08-31 04:48:45,361][00354] Fps is (10 sec: 3278.4, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 2678784. Throughput: 0: 742.6. Samples: 670250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:48:45,364][00354] Avg episode reward: [(0, '21.983')] +[2023-08-31 04:48:45,378][07777] Saving new best policy, reward=21.983! +[2023-08-31 04:48:50,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 2695168. Throughput: 0: 741.7. Samples: 673060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:48:50,364][00354] Avg episode reward: [(0, '21.456')] +[2023-08-31 04:48:52,931][07790] Updated weights for policy 0, policy_version 660 (0.0022) +[2023-08-31 04:48:55,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 2707456. Throughput: 0: 738.1. Samples: 677044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:48:55,364][00354] Avg episode reward: [(0, '21.973')] +[2023-08-31 04:49:00,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 2719744. Throughput: 0: 747.6. Samples: 680576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:49:00,371][00354] Avg episode reward: [(0, '21.627')] +[2023-08-31 04:49:05,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2943.6). Total num frames: 2736128. Throughput: 0: 767.3. Samples: 683316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:49:05,368][00354] Avg episode reward: [(0, '20.913')] +[2023-08-31 04:49:06,560][07790] Updated weights for policy 0, policy_version 670 (0.0024) +[2023-08-31 04:49:10,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.3, 300 sec: 2971.3). Total num frames: 2756608. Throughput: 0: 766.0. Samples: 688978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:49:10,366][00354] Avg episode reward: [(0, '21.678')] +[2023-08-31 04:49:15,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 2768896. Throughput: 0: 748.0. Samples: 693080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:49:15,367][00354] Avg episode reward: [(0, '21.909')] +[2023-08-31 04:49:20,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 2781184. Throughput: 0: 747.6. Samples: 694834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:49:20,363][00354] Avg episode reward: [(0, '23.062')] +[2023-08-31 04:49:20,373][07777] Saving new best policy, reward=23.062! +[2023-08-31 04:49:21,467][07790] Updated weights for policy 0, policy_version 680 (0.0033) +[2023-08-31 04:49:25,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 2797568. Throughput: 0: 766.4. Samples: 699174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:49:25,370][00354] Avg episode reward: [(0, '23.100')] +[2023-08-31 04:49:25,382][07777] Saving new best policy, reward=23.100! +[2023-08-31 04:49:30,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2985.2). Total num frames: 2818048. Throughput: 0: 768.5. Samples: 704832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:49:30,371][00354] Avg episode reward: [(0, '24.390')] +[2023-08-31 04:49:30,373][07777] Saving new best policy, reward=24.390! +[2023-08-31 04:49:32,905][07790] Updated weights for policy 0, policy_version 690 (0.0018) +[2023-08-31 04:49:35,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.3, 300 sec: 2971.3). Total num frames: 2830336. Throughput: 0: 756.9. Samples: 707120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:49:35,364][00354] Avg episode reward: [(0, '24.745')] +[2023-08-31 04:49:35,383][07777] Saving new best policy, reward=24.745! +[2023-08-31 04:49:40,362][00354] Fps is (10 sec: 2457.5, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 2842624. Throughput: 0: 746.5. Samples: 710636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:49:40,370][00354] Avg episode reward: [(0, '24.379')] +[2023-08-31 04:49:45,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2957.5). Total num frames: 2854912. Throughput: 0: 762.0. Samples: 714864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:49:45,364][00354] Avg episode reward: [(0, '23.239')] +[2023-08-31 04:49:47,589][07790] Updated weights for policy 0, policy_version 700 (0.0036) +[2023-08-31 04:49:50,361][00354] Fps is (10 sec: 3277.0, 60 sec: 3003.7, 300 sec: 2985.3). Total num frames: 2875392. Throughput: 0: 764.4. Samples: 717714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:49:50,364][00354] Avg episode reward: [(0, '23.278')] +[2023-08-31 04:49:55,363][00354] Fps is (10 sec: 3276.4, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 2887680. Throughput: 0: 738.9. Samples: 722230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:49:55,365][00354] Avg episode reward: [(0, '22.496')] +[2023-08-31 04:50:00,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2899968. Throughput: 0: 720.1. Samples: 725484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:50:00,364][00354] Avg episode reward: [(0, '22.569')] +[2023-08-31 04:50:03,657][07790] Updated weights for policy 0, policy_version 710 (0.0026) +[2023-08-31 04:50:05,361][00354] Fps is (10 sec: 2457.9, 60 sec: 2935.5, 300 sec: 2957.5). Total num frames: 2912256. Throughput: 0: 719.4. Samples: 727206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:50:05,364][00354] Avg episode reward: [(0, '21.758')] +[2023-08-31 04:50:10,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2971.3). Total num frames: 2928640. Throughput: 0: 733.6. Samples: 732184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:50:10,370][00354] Avg episode reward: [(0, '21.902')] +[2023-08-31 04:50:14,899][07790] Updated weights for policy 0, policy_version 720 (0.0014) +[2023-08-31 04:50:15,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 2949120. Throughput: 0: 734.4. Samples: 737882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:50:15,367][00354] Avg episode reward: [(0, '22.005')] +[2023-08-31 04:50:20,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2957.5). Total num frames: 2957312. Throughput: 0: 720.1. Samples: 739524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:50:20,367][00354] Avg episode reward: [(0, '22.061')] +[2023-08-31 04:50:25,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2867.2, 300 sec: 2943.6). Total num frames: 2969600. Throughput: 0: 703.8. Samples: 742306. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:50:25,364][00354] Avg episode reward: [(0, '22.037')] +[2023-08-31 04:50:30,363][00354] Fps is (10 sec: 2047.7, 60 sec: 2662.3, 300 sec: 2929.7). Total num frames: 2977792. Throughput: 0: 672.6. Samples: 745132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:50:30,365][00354] Avg episode reward: [(0, '23.112')] +[2023-08-31 04:50:34,166][07790] Updated weights for policy 0, policy_version 730 (0.0041) +[2023-08-31 04:50:35,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2943.6). Total num frames: 2994176. Throughput: 0: 648.2. Samples: 746884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:50:35,369][00354] Avg episode reward: [(0, '23.902')] +[2023-08-31 04:50:35,378][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000731_2994176.pth... +[2023-08-31 04:50:35,504][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000561_2297856.pth +[2023-08-31 04:50:40,362][00354] Fps is (10 sec: 3277.1, 60 sec: 2798.9, 300 sec: 2929.7). Total num frames: 3010560. Throughput: 0: 672.6. Samples: 752496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:50:40,370][00354] Avg episode reward: [(0, '22.515')] +[2023-08-31 04:50:45,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 3022848. Throughput: 0: 697.1. Samples: 756852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:50:45,366][00354] Avg episode reward: [(0, '23.242')] +[2023-08-31 04:50:47,645][07790] Updated weights for policy 0, policy_version 740 (0.0013) +[2023-08-31 04:50:50,363][00354] Fps is (10 sec: 2457.2, 60 sec: 2662.3, 300 sec: 2915.8). Total num frames: 3035136. Throughput: 0: 697.0. Samples: 758574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:50:50,373][00354] Avg episode reward: [(0, '22.824')] +[2023-08-31 04:50:55,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2929.7). Total num frames: 3051520. Throughput: 0: 679.8. Samples: 762774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:50:55,364][00354] Avg episode reward: [(0, '22.170')] +[2023-08-31 04:51:00,132][07790] Updated weights for policy 0, policy_version 750 (0.0026) +[2023-08-31 04:51:00,361][00354] Fps is (10 sec: 3687.1, 60 sec: 2867.2, 300 sec: 2943.6). Total num frames: 3072000. Throughput: 0: 681.6. Samples: 768556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:51:00,372][00354] Avg episode reward: [(0, '22.842')] +[2023-08-31 04:51:05,363][00354] Fps is (10 sec: 3276.4, 60 sec: 2867.1, 300 sec: 2915.8). Total num frames: 3084288. Throughput: 0: 702.6. Samples: 771144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:51:05,368][00354] Avg episode reward: [(0, '22.206')] +[2023-08-31 04:51:10,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 3096576. Throughput: 0: 719.9. Samples: 774700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:51:10,364][00354] Avg episode reward: [(0, '21.435')] +[2023-08-31 04:51:14,979][07790] Updated weights for policy 0, policy_version 760 (0.0020) +[2023-08-31 04:51:15,361][00354] Fps is (10 sec: 2867.5, 60 sec: 2730.7, 300 sec: 2929.7). Total num frames: 3112960. Throughput: 0: 747.8. Samples: 778784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:51:15,366][00354] Avg episode reward: [(0, '21.171')] +[2023-08-31 04:51:20,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 3129344. Throughput: 0: 772.4. Samples: 781640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:51:20,364][00354] Avg episode reward: [(0, '21.342')] +[2023-08-31 04:51:25,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 3145728. Throughput: 0: 773.6. Samples: 787306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:51:25,364][00354] Avg episode reward: [(0, '21.654')] +[2023-08-31 04:51:27,124][07790] Updated weights for policy 0, policy_version 770 (0.0024) +[2023-08-31 04:51:30,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 2915.8). Total num frames: 3158016. Throughput: 0: 755.2. Samples: 790836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-08-31 04:51:30,365][00354] Avg episode reward: [(0, '22.097')] +[2023-08-31 04:51:35,362][00354] Fps is (10 sec: 2457.5, 60 sec: 2935.4, 300 sec: 2915.8). Total num frames: 3170304. Throughput: 0: 748.5. Samples: 792256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:51:35,366][00354] Avg episode reward: [(0, '22.690')] +[2023-08-31 04:51:40,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 3186688. Throughput: 0: 745.9. Samples: 796338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:51:40,364][00354] Avg episode reward: [(0, '23.602')] +[2023-08-31 04:51:42,193][07790] Updated weights for policy 0, policy_version 780 (0.0013) +[2023-08-31 04:51:45,361][00354] Fps is (10 sec: 3686.6, 60 sec: 3072.0, 300 sec: 2929.7). Total num frames: 3207168. Throughput: 0: 745.3. Samples: 802094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:51:45,370][00354] Avg episode reward: [(0, '24.074')] +[2023-08-31 04:51:50,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 2915.8). Total num frames: 3219456. Throughput: 0: 742.0. Samples: 804534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:51:50,364][00354] Avg episode reward: [(0, '24.289')] +[2023-08-31 04:51:55,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 3231744. Throughput: 0: 744.9. Samples: 808220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:51:55,369][00354] Avg episode reward: [(0, '23.803')] +[2023-08-31 04:51:56,251][07790] Updated weights for policy 0, policy_version 790 (0.0046) +[2023-08-31 04:52:00,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 3248128. Throughput: 0: 745.2. Samples: 812320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:52:00,364][00354] Avg episode reward: [(0, '23.822')] +[2023-08-31 04:52:05,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.8, 300 sec: 2915.8). Total num frames: 3264512. Throughput: 0: 745.6. Samples: 815194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:52:05,364][00354] Avg episode reward: [(0, '23.815')] +[2023-08-31 04:52:07,882][07790] Updated weights for policy 0, policy_version 800 (0.0035) +[2023-08-31 04:52:10,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2915.8). Total num frames: 3280896. Throughput: 0: 741.4. Samples: 820670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:52:10,364][00354] Avg episode reward: [(0, '23.968')] +[2023-08-31 04:52:15,362][00354] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 3293184. Throughput: 0: 743.7. Samples: 824302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:52:15,365][00354] Avg episode reward: [(0, '23.459')] +[2023-08-31 04:52:20,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 3305472. Throughput: 0: 750.7. Samples: 826036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:52:20,366][00354] Avg episode reward: [(0, '23.800')] +[2023-08-31 04:52:22,820][07790] Updated weights for policy 0, policy_version 810 (0.0030) +[2023-08-31 04:52:25,361][00354] Fps is (10 sec: 3276.9, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 3325952. Throughput: 0: 772.4. Samples: 831096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:52:25,368][00354] Avg episode reward: [(0, '23.497')] +[2023-08-31 04:52:30,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 3342336. Throughput: 0: 771.7. Samples: 836820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:52:30,364][00354] Avg episode reward: [(0, '24.315')] +[2023-08-31 04:52:35,362][00354] Fps is (10 sec: 2866.9, 60 sec: 3072.0, 300 sec: 2943.6). Total num frames: 3354624. Throughput: 0: 757.0. Samples: 838600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:52:35,370][00354] Avg episode reward: [(0, '24.068')] +[2023-08-31 04:52:35,383][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000819_3354624.pth... +[2023-08-31 04:52:35,574][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000646_2646016.pth +[2023-08-31 04:52:35,688][07790] Updated weights for policy 0, policy_version 820 (0.0013) +[2023-08-31 04:52:40,362][00354] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 3366912. Throughput: 0: 754.3. Samples: 842166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:52:40,371][00354] Avg episode reward: [(0, '24.266')] +[2023-08-31 04:52:45,361][00354] Fps is (10 sec: 3277.1, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 3387392. Throughput: 0: 773.2. Samples: 847114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:52:45,364][00354] Avg episode reward: [(0, '22.881')] +[2023-08-31 04:52:48,520][07790] Updated weights for policy 0, policy_version 830 (0.0018) +[2023-08-31 04:52:50,361][00354] Fps is (10 sec: 3686.6, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 3403776. Throughput: 0: 772.6. Samples: 849960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:52:50,370][00354] Avg episode reward: [(0, '21.749')] +[2023-08-31 04:52:55,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 3416064. Throughput: 0: 757.9. Samples: 854774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:52:55,364][00354] Avg episode reward: [(0, '21.403')] +[2023-08-31 04:53:00,362][00354] Fps is (10 sec: 2457.3, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 3428352. Throughput: 0: 757.1. Samples: 858374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:53:00,366][00354] Avg episode reward: [(0, '20.985')] +[2023-08-31 04:53:03,355][07790] Updated weights for policy 0, policy_version 840 (0.0015) +[2023-08-31 04:53:05,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 3444736. Throughput: 0: 762.0. Samples: 860324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:53:05,367][00354] Avg episode reward: [(0, '20.792')] +[2023-08-31 04:53:10,361][00354] Fps is (10 sec: 3686.8, 60 sec: 3072.0, 300 sec: 2985.2). Total num frames: 3465216. Throughput: 0: 777.2. Samples: 866068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:53:10,368][00354] Avg episode reward: [(0, '21.642')] +[2023-08-31 04:53:15,364][00354] Fps is (10 sec: 3276.0, 60 sec: 3071.9, 300 sec: 2971.3). Total num frames: 3477504. Throughput: 0: 745.7. Samples: 870378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:53:15,371][00354] Avg episode reward: [(0, '21.527')] +[2023-08-31 04:53:16,290][07790] Updated weights for policy 0, policy_version 850 (0.0034) +[2023-08-31 04:53:20,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 3489792. Throughput: 0: 740.5. Samples: 871924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:53:20,364][00354] Avg episode reward: [(0, '22.411')] +[2023-08-31 04:53:25,361][00354] Fps is (10 sec: 2048.5, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 3497984. Throughput: 0: 733.5. Samples: 875172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:53:25,364][00354] Avg episode reward: [(0, '23.035')] +[2023-08-31 04:53:30,367][00354] Fps is (10 sec: 2046.9, 60 sec: 2798.7, 300 sec: 2929.7). Total num frames: 3510272. Throughput: 0: 693.3. Samples: 878316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:53:30,369][00354] Avg episode reward: [(0, '23.489')] +[2023-08-31 04:53:34,761][07790] Updated weights for policy 0, policy_version 860 (0.0020) +[2023-08-31 04:53:35,362][00354] Fps is (10 sec: 2457.5, 60 sec: 2799.0, 300 sec: 2929.7). Total num frames: 3522560. Throughput: 0: 670.2. Samples: 880120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:53:35,364][00354] Avg episode reward: [(0, '24.509')] +[2023-08-31 04:53:40,361][00354] Fps is (10 sec: 2868.8, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 3538944. Throughput: 0: 665.4. Samples: 884718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:53:40,366][00354] Avg episode reward: [(0, '26.315')] +[2023-08-31 04:53:40,373][07777] Saving new best policy, reward=26.315! +[2023-08-31 04:53:45,361][00354] Fps is (10 sec: 2867.3, 60 sec: 2730.7, 300 sec: 2901.9). Total num frames: 3551232. Throughput: 0: 663.8. Samples: 888244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:53:45,366][00354] Avg episode reward: [(0, '26.855')] +[2023-08-31 04:53:45,383][07777] Saving new best policy, reward=26.855! +[2023-08-31 04:53:49,810][07790] Updated weights for policy 0, policy_version 870 (0.0043) +[2023-08-31 04:53:50,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2901.9). Total num frames: 3563520. Throughput: 0: 658.9. Samples: 889976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 04:53:50,372][00354] Avg episode reward: [(0, '26.522')] +[2023-08-31 04:53:55,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2929.7). Total num frames: 3584000. Throughput: 0: 650.9. Samples: 895358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:53:55,371][00354] Avg episode reward: [(0, '27.020')] +[2023-08-31 04:53:55,384][07777] Saving new best policy, reward=27.020! +[2023-08-31 04:54:00,361][00354] Fps is (10 sec: 3686.4, 60 sec: 2867.3, 300 sec: 2929.7). Total num frames: 3600384. Throughput: 0: 673.9. Samples: 900702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:54:00,370][00354] Avg episode reward: [(0, '27.742')] +[2023-08-31 04:54:00,373][07777] Saving new best policy, reward=27.742! +[2023-08-31 04:54:01,985][07790] Updated weights for policy 0, policy_version 880 (0.0023) +[2023-08-31 04:54:05,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2901.9). Total num frames: 3612672. Throughput: 0: 678.3. Samples: 902446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:54:05,366][00354] Avg episode reward: [(0, '27.729')] +[2023-08-31 04:54:10,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2594.1, 300 sec: 2888.0). Total num frames: 3620864. Throughput: 0: 684.0. Samples: 905952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:54:10,371][00354] Avg episode reward: [(0, '28.395')] +[2023-08-31 04:54:10,412][07777] Saving new best policy, reward=28.395! +[2023-08-31 04:54:15,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2730.8, 300 sec: 2915.8). Total num frames: 3641344. Throughput: 0: 725.9. Samples: 910976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:54:15,364][00354] Avg episode reward: [(0, '27.053')] +[2023-08-31 04:54:15,960][07790] Updated weights for policy 0, policy_version 890 (0.0033) +[2023-08-31 04:54:20,361][00354] Fps is (10 sec: 4096.0, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 3661824. Throughput: 0: 749.2. Samples: 913832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:54:20,368][00354] Avg episode reward: [(0, '27.045')] +[2023-08-31 04:54:25,364][00354] Fps is (10 sec: 3276.0, 60 sec: 2935.3, 300 sec: 2901.9). Total num frames: 3674112. Throughput: 0: 747.8. Samples: 918370. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:54:25,369][00354] Avg episode reward: [(0, '26.975')] +[2023-08-31 04:54:30,262][07790] Updated weights for policy 0, policy_version 900 (0.0017) +[2023-08-31 04:54:30,362][00354] Fps is (10 sec: 2457.4, 60 sec: 2935.7, 300 sec: 2901.9). Total num frames: 3686400. Throughput: 0: 750.5. Samples: 922016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:54:30,369][00354] Avg episode reward: [(0, '26.567')] +[2023-08-31 04:54:35,361][00354] Fps is (10 sec: 2867.9, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 3702784. Throughput: 0: 759.6. Samples: 924160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:54:35,369][00354] Avg episode reward: [(0, '26.289')] +[2023-08-31 04:54:35,385][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000904_3702784.pth... +[2023-08-31 04:54:35,534][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000731_2994176.pth +[2023-08-31 04:54:40,361][00354] Fps is (10 sec: 3277.0, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 3719168. Throughput: 0: 765.6. Samples: 929808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:54:40,364][00354] Avg episode reward: [(0, '26.571')] +[2023-08-31 04:54:42,005][07790] Updated weights for policy 0, policy_version 910 (0.0030) +[2023-08-31 04:54:45,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 3731456. Throughput: 0: 749.6. Samples: 934436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-08-31 04:54:45,363][00354] Avg episode reward: [(0, '26.216')] +[2023-08-31 04:54:50,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 3743744. Throughput: 0: 751.2. Samples: 936250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-08-31 04:54:50,365][00354] Avg episode reward: [(0, '25.752')] +[2023-08-31 04:54:55,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2901.9). Total num frames: 3756032. Throughput: 0: 741.0. Samples: 939298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:54:55,364][00354] Avg episode reward: [(0, '24.488')] +[2023-08-31 04:54:58,296][07790] Updated weights for policy 0, policy_version 920 (0.0018) +[2023-08-31 04:55:00,362][00354] Fps is (10 sec: 2867.1, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 3772416. Throughput: 0: 740.1. Samples: 944280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:55:00,369][00354] Avg episode reward: [(0, '25.034')] +[2023-08-31 04:55:05,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 3792896. Throughput: 0: 736.6. Samples: 946980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:55:05,367][00354] Avg episode reward: [(0, '25.357')] +[2023-08-31 04:55:10,361][00354] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 3805184. Throughput: 0: 732.5. Samples: 951332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:55:10,364][00354] Avg episode reward: [(0, '25.355')] +[2023-08-31 04:55:11,867][07790] Updated weights for policy 0, policy_version 930 (0.0013) +[2023-08-31 04:55:15,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 3817472. Throughput: 0: 730.9. Samples: 954904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-08-31 04:55:15,370][00354] Avg episode reward: [(0, '25.129')] +[2023-08-31 04:55:20,362][00354] Fps is (10 sec: 2867.1, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 3833856. Throughput: 0: 733.4. Samples: 957162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:55:20,374][00354] Avg episode reward: [(0, '25.061')] +[2023-08-31 04:55:24,101][07790] Updated weights for policy 0, policy_version 940 (0.0040) +[2023-08-31 04:55:25,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.9, 300 sec: 2971.3). Total num frames: 3854336. Throughput: 0: 737.2. Samples: 962982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:55:25,364][00354] Avg episode reward: [(0, '24.835')] +[2023-08-31 04:55:30,365][00354] Fps is (10 sec: 3275.8, 60 sec: 3003.6, 300 sec: 2957.4). Total num frames: 3866624. Throughput: 0: 732.2. Samples: 967386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:55:30,368][00354] Avg episode reward: [(0, '25.542')] +[2023-08-31 04:55:35,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 3878912. Throughput: 0: 732.3. Samples: 969202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:55:35,365][00354] Avg episode reward: [(0, '25.353')] +[2023-08-31 04:55:39,151][07790] Updated weights for policy 0, policy_version 950 (0.0022) +[2023-08-31 04:55:40,361][00354] Fps is (10 sec: 2868.2, 60 sec: 2935.5, 300 sec: 2957.5). Total num frames: 3895296. Throughput: 0: 754.4. Samples: 973246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:55:40,371][00354] Avg episode reward: [(0, '24.975')] +[2023-08-31 04:55:45,362][00354] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 2971.4). Total num frames: 3911680. Throughput: 0: 770.0. Samples: 978928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:55:45,368][00354] Avg episode reward: [(0, '26.223')] +[2023-08-31 04:55:50,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 3928064. Throughput: 0: 771.3. Samples: 981688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:55:50,365][00354] Avg episode reward: [(0, '25.555')] +[2023-08-31 04:55:51,311][07790] Updated weights for policy 0, policy_version 960 (0.0021) +[2023-08-31 04:55:55,361][00354] Fps is (10 sec: 2867.3, 60 sec: 3072.0, 300 sec: 2943.6). Total num frames: 3940352. Throughput: 0: 753.8. Samples: 985252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:55:55,370][00354] Avg episode reward: [(0, '25.200')] +[2023-08-31 04:56:00,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 3952640. Throughput: 0: 759.2. Samples: 989066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:56:00,363][00354] Avg episode reward: [(0, '26.310')] +[2023-08-31 04:56:04,987][07790] Updated weights for policy 0, policy_version 970 (0.0018) +[2023-08-31 04:56:05,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 3973120. Throughput: 0: 772.1. Samples: 991908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:56:05,364][00354] Avg episode reward: [(0, '25.600')] +[2023-08-31 04:56:10,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 3989504. Throughput: 0: 770.5. Samples: 997654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:56:10,369][00354] Avg episode reward: [(0, '25.587')] +[2023-08-31 04:56:15,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 4001792. Throughput: 0: 752.5. Samples: 1001248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:56:15,370][00354] Avg episode reward: [(0, '25.563')] +[2023-08-31 04:56:19,681][07790] Updated weights for policy 0, policy_version 980 (0.0029) +[2023-08-31 04:56:20,365][00354] Fps is (10 sec: 2456.8, 60 sec: 3003.6, 300 sec: 2943.5). Total num frames: 4014080. Throughput: 0: 751.6. Samples: 1003028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:56:20,367][00354] Avg episode reward: [(0, '24.743')] +[2023-08-31 04:56:25,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2943.6). Total num frames: 4026368. Throughput: 0: 752.7. Samples: 1007116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:56:25,366][00354] Avg episode reward: [(0, '25.091')] +[2023-08-31 04:56:30,361][00354] Fps is (10 sec: 2458.4, 60 sec: 2867.4, 300 sec: 2943.6). Total num frames: 4038656. Throughput: 0: 704.7. Samples: 1010638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:56:30,367][00354] Avg episode reward: [(0, '24.619')] +[2023-08-31 04:56:35,364][00354] Fps is (10 sec: 2047.5, 60 sec: 2798.8, 300 sec: 2915.8). Total num frames: 4046848. Throughput: 0: 675.0. Samples: 1012064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:56:35,367][00354] Avg episode reward: [(0, '24.549')] +[2023-08-31 04:56:35,376][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000988_4046848.pth... +[2023-08-31 04:56:35,549][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000819_3354624.pth +[2023-08-31 04:56:38,092][07790] Updated weights for policy 0, policy_version 990 (0.0042) +[2023-08-31 04:56:40,366][00354] Fps is (10 sec: 2047.1, 60 sec: 2730.5, 300 sec: 2888.0). Total num frames: 4059136. Throughput: 0: 657.6. Samples: 1014848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:56:40,368][00354] Avg episode reward: [(0, '24.180')] +[2023-08-31 04:56:45,361][00354] Fps is (10 sec: 2458.2, 60 sec: 2662.4, 300 sec: 2888.0). Total num frames: 4071424. Throughput: 0: 653.3. Samples: 1018466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:56:45,370][00354] Avg episode reward: [(0, '23.971')] +[2023-08-31 04:56:50,361][00354] Fps is (10 sec: 2868.5, 60 sec: 2662.4, 300 sec: 2901.9). Total num frames: 4087808. Throughput: 0: 650.6. Samples: 1021186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-08-31 04:56:50,375][00354] Avg episode reward: [(0, '25.600')] +[2023-08-31 04:56:51,822][07790] Updated weights for policy 0, policy_version 1000 (0.0014) +[2023-08-31 04:56:55,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2901.9). Total num frames: 4104192. Throughput: 0: 644.1. Samples: 1026638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:56:55,366][00354] Avg episode reward: [(0, '25.601')] +[2023-08-31 04:57:00,362][00354] Fps is (10 sec: 2867.1, 60 sec: 2730.6, 300 sec: 2888.0). Total num frames: 4116480. Throughput: 0: 646.2. Samples: 1030326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-08-31 04:57:00,366][00354] Avg episode reward: [(0, '25.828')] +[2023-08-31 04:57:05,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2874.1). Total num frames: 4128768. Throughput: 0: 643.3. Samples: 1031976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:57:05,371][00354] Avg episode reward: [(0, '26.443')] +[2023-08-31 04:57:07,454][07790] Updated weights for policy 0, policy_version 1010 (0.0025) +[2023-08-31 04:57:10,361][00354] Fps is (10 sec: 2867.3, 60 sec: 2594.1, 300 sec: 2888.0). Total num frames: 4145152. Throughput: 0: 645.1. Samples: 1036144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:57:10,364][00354] Avg episode reward: [(0, '26.141')] +[2023-08-31 04:57:15,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2901.9). Total num frames: 4161536. Throughput: 0: 684.1. Samples: 1041424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:57:15,367][00354] Avg episode reward: [(0, '26.462')] +[2023-08-31 04:57:20,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2662.5, 300 sec: 2874.1). Total num frames: 4173824. Throughput: 0: 700.3. Samples: 1043576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:57:20,366][00354] Avg episode reward: [(0, '26.408')] +[2023-08-31 04:57:20,958][07790] Updated weights for policy 0, policy_version 1020 (0.0017) +[2023-08-31 04:57:25,363][00354] Fps is (10 sec: 2457.1, 60 sec: 2662.3, 300 sec: 2860.2). Total num frames: 4186112. Throughput: 0: 712.4. Samples: 1046906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:57:25,370][00354] Avg episode reward: [(0, '25.523')] +[2023-08-31 04:57:30,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2874.2). Total num frames: 4202496. Throughput: 0: 725.3. Samples: 1051106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:57:30,369][00354] Avg episode reward: [(0, '25.234')] +[2023-08-31 04:57:34,689][07790] Updated weights for policy 0, policy_version 1030 (0.0032) +[2023-08-31 04:57:35,361][00354] Fps is (10 sec: 3277.5, 60 sec: 2867.3, 300 sec: 2888.0). Total num frames: 4218880. Throughput: 0: 724.4. Samples: 1053782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:57:35,370][00354] Avg episode reward: [(0, '24.864')] +[2023-08-31 04:57:40,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.4, 300 sec: 2860.3). Total num frames: 4231168. Throughput: 0: 713.5. Samples: 1058744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:57:40,369][00354] Avg episode reward: [(0, '23.971')] +[2023-08-31 04:57:45,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2846.4). Total num frames: 4243456. Throughput: 0: 705.3. Samples: 1062066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:57:45,365][00354] Avg episode reward: [(0, '24.684')] +[2023-08-31 04:57:50,189][07790] Updated weights for policy 0, policy_version 1040 (0.0022) +[2023-08-31 04:57:50,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2860.3). Total num frames: 4259840. Throughput: 0: 705.9. Samples: 1063742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:57:50,363][00354] Avg episode reward: [(0, '25.454')] +[2023-08-31 04:57:55,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2874.2). Total num frames: 4276224. Throughput: 0: 729.2. Samples: 1068956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:57:55,366][00354] Avg episode reward: [(0, '24.885')] +[2023-08-31 04:58:00,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 4292608. Throughput: 0: 728.4. Samples: 1074204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:58:00,371][00354] Avg episode reward: [(0, '25.394')] +[2023-08-31 04:58:02,644][07790] Updated weights for policy 0, policy_version 1050 (0.0029) +[2023-08-31 04:58:05,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2846.4). Total num frames: 4304896. Throughput: 0: 718.6. Samples: 1075912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:58:05,369][00354] Avg episode reward: [(0, '26.231')] +[2023-08-31 04:58:10,362][00354] Fps is (10 sec: 2457.4, 60 sec: 2867.2, 300 sec: 2846.4). Total num frames: 4317184. Throughput: 0: 726.4. Samples: 1079592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:58:10,373][00354] Avg episode reward: [(0, '26.984')] +[2023-08-31 04:58:15,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 4329472. Throughput: 0: 720.2. Samples: 1083514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:58:15,364][00354] Avg episode reward: [(0, '27.933')] +[2023-08-31 04:58:17,662][07790] Updated weights for policy 0, policy_version 1060 (0.0029) +[2023-08-31 04:58:20,361][00354] Fps is (10 sec: 3277.1, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 4349952. Throughput: 0: 723.1. Samples: 1086320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:58:20,364][00354] Avg episode reward: [(0, '28.293')] +[2023-08-31 04:58:25,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.6, 300 sec: 2888.1). Total num frames: 4362240. Throughput: 0: 715.0. Samples: 1090920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 04:58:25,364][00354] Avg episode reward: [(0, '28.026')] +[2023-08-31 04:58:30,362][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 4374528. Throughput: 0: 722.5. Samples: 1094578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:58:30,370][00354] Avg episode reward: [(0, '27.295')] +[2023-08-31 04:58:32,445][07790] Updated weights for policy 0, policy_version 1070 (0.0027) +[2023-08-31 04:58:35,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 4390912. Throughput: 0: 735.1. Samples: 1096822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:58:35,367][00354] Avg episode reward: [(0, '26.681')] +[2023-08-31 04:58:35,382][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001072_4390912.pth... +[2023-08-31 04:58:35,514][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000904_3702784.pth +[2023-08-31 04:58:40,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 4411392. Throughput: 0: 742.9. Samples: 1102388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:58:40,369][00354] Avg episode reward: [(0, '25.211')] +[2023-08-31 04:58:44,037][07790] Updated weights for policy 0, policy_version 1080 (0.0013) +[2023-08-31 04:58:45,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 4423680. Throughput: 0: 726.7. Samples: 1106906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:58:45,368][00354] Avg episode reward: [(0, '24.842')] +[2023-08-31 04:58:50,362][00354] Fps is (10 sec: 2457.5, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 4435968. Throughput: 0: 728.8. Samples: 1108710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 04:58:50,371][00354] Avg episode reward: [(0, '25.570')] +[2023-08-31 04:58:55,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 4452352. Throughput: 0: 733.5. Samples: 1112598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:58:55,370][00354] Avg episode reward: [(0, '23.626')] +[2023-08-31 04:58:58,465][07790] Updated weights for policy 0, policy_version 1090 (0.0034) +[2023-08-31 04:59:00,361][00354] Fps is (10 sec: 3276.9, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 4468736. Throughput: 0: 773.6. Samples: 1118324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:59:00,369][00354] Avg episode reward: [(0, '24.608')] +[2023-08-31 04:59:05,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 4485120. Throughput: 0: 774.6. Samples: 1121176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:59:05,370][00354] Avg episode reward: [(0, '24.428')] +[2023-08-31 04:59:10,364][00354] Fps is (10 sec: 2866.6, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 4497408. Throughput: 0: 752.2. Samples: 1124770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 04:59:10,372][00354] Avg episode reward: [(0, '24.946')] +[2023-08-31 04:59:12,716][07790] Updated weights for policy 0, policy_version 1100 (0.0021) +[2023-08-31 04:59:15,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2874.1). Total num frames: 4509696. Throughput: 0: 754.4. Samples: 1128524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:59:15,364][00354] Avg episode reward: [(0, '25.433')] +[2023-08-31 04:59:20,361][00354] Fps is (10 sec: 3277.5, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 4530176. Throughput: 0: 766.7. Samples: 1131324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:59:20,364][00354] Avg episode reward: [(0, '24.607')] +[2023-08-31 04:59:24,454][07790] Updated weights for policy 0, policy_version 1110 (0.0030) +[2023-08-31 04:59:25,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2915.8). Total num frames: 4546560. Throughput: 0: 769.2. Samples: 1137002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:59:25,369][00354] Avg episode reward: [(0, '23.651')] +[2023-08-31 04:59:30,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2888.0). Total num frames: 4554752. Throughput: 0: 732.8. Samples: 1139880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 04:59:30,364][00354] Avg episode reward: [(0, '23.619')] +[2023-08-31 04:59:35,363][00354] Fps is (10 sec: 1638.1, 60 sec: 2867.1, 300 sec: 2860.2). Total num frames: 4562944. Throughput: 0: 724.3. Samples: 1141306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:59:35,371][00354] Avg episode reward: [(0, '24.351')] +[2023-08-31 04:59:40,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2860.3). Total num frames: 4575232. Throughput: 0: 702.1. Samples: 1144194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-08-31 04:59:40,370][00354] Avg episode reward: [(0, '24.502')] +[2023-08-31 04:59:43,763][07790] Updated weights for policy 0, policy_version 1120 (0.0039) +[2023-08-31 04:59:45,361][00354] Fps is (10 sec: 2867.7, 60 sec: 2798.9, 300 sec: 2874.1). Total num frames: 4591616. Throughput: 0: 675.4. Samples: 1148718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 04:59:45,370][00354] Avg episode reward: [(0, '25.050')] +[2023-08-31 04:59:50,361][00354] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 4612096. Throughput: 0: 674.4. Samples: 1151526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:59:50,369][00354] Avg episode reward: [(0, '25.874')] +[2023-08-31 04:59:55,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2874.1). Total num frames: 4620288. Throughput: 0: 687.9. Samples: 1155724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 04:59:55,364][00354] Avg episode reward: [(0, '26.150')] +[2023-08-31 04:59:57,824][07790] Updated weights for policy 0, policy_version 1130 (0.0026) +[2023-08-31 05:00:00,366][00354] Fps is (10 sec: 2047.1, 60 sec: 2730.5, 300 sec: 2846.3). Total num frames: 4632576. Throughput: 0: 676.9. Samples: 1158988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:00:00,378][00354] Avg episode reward: [(0, '26.342')] +[2023-08-31 05:00:05,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2846.4). Total num frames: 4644864. Throughput: 0: 650.8. Samples: 1160612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:00:05,366][00354] Avg episode reward: [(0, '26.202')] +[2023-08-31 05:00:10,361][00354] Fps is (10 sec: 3278.3, 60 sec: 2799.0, 300 sec: 2874.1). Total num frames: 4665344. Throughput: 0: 638.3. Samples: 1165726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:00:10,369][00354] Avg episode reward: [(0, '26.140')] +[2023-08-31 05:00:11,439][07790] Updated weights for policy 0, policy_version 1140 (0.0036) +[2023-08-31 05:00:15,370][00354] Fps is (10 sec: 3683.3, 60 sec: 2866.8, 300 sec: 2874.1). Total num frames: 4681728. Throughput: 0: 686.0. Samples: 1170756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:00:15,376][00354] Avg episode reward: [(0, '27.548')] +[2023-08-31 05:00:20,364][00354] Fps is (10 sec: 2457.0, 60 sec: 2662.3, 300 sec: 2832.5). Total num frames: 4689920. Throughput: 0: 691.5. Samples: 1172422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:00:20,368][00354] Avg episode reward: [(0, '26.347')] +[2023-08-31 05:00:25,361][00354] Fps is (10 sec: 2049.7, 60 sec: 2594.1, 300 sec: 2832.5). Total num frames: 4702208. Throughput: 0: 705.0. Samples: 1175920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 05:00:25,364][00354] Avg episode reward: [(0, '26.316')] +[2023-08-31 05:00:26,966][07790] Updated weights for policy 0, policy_version 1150 (0.0030) +[2023-08-31 05:00:30,361][00354] Fps is (10 sec: 3277.6, 60 sec: 2798.9, 300 sec: 2860.3). Total num frames: 4722688. Throughput: 0: 721.3. Samples: 1181176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:00:30,369][00354] Avg episode reward: [(0, '25.695')] +[2023-08-31 05:00:35,362][00354] Fps is (10 sec: 3686.3, 60 sec: 2935.5, 300 sec: 2860.3). Total num frames: 4739072. Throughput: 0: 721.4. Samples: 1183990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:00:35,364][00354] Avg episode reward: [(0, '24.913')] +[2023-08-31 05:00:35,382][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001157_4739072.pth... +[2023-08-31 05:00:35,552][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000988_4046848.pth +[2023-08-31 05:00:39,252][07790] Updated weights for policy 0, policy_version 1160 (0.0025) +[2023-08-31 05:00:40,363][00354] Fps is (10 sec: 2866.8, 60 sec: 2935.4, 300 sec: 2846.4). Total num frames: 4751360. Throughput: 0: 723.5. Samples: 1188282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:00:40,366][00354] Avg episode reward: [(0, '25.767')] +[2023-08-31 05:00:45,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2832.5). Total num frames: 4763648. Throughput: 0: 727.5. Samples: 1191720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:00:45,364][00354] Avg episode reward: [(0, '26.224')] +[2023-08-31 05:00:50,361][00354] Fps is (10 sec: 2867.6, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 4780032. Throughput: 0: 742.6. Samples: 1194030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:00:50,364][00354] Avg episode reward: [(0, '24.851')] +[2023-08-31 05:00:52,921][07790] Updated weights for policy 0, policy_version 1170 (0.0028) +[2023-08-31 05:00:55,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2874.1). Total num frames: 4800512. Throughput: 0: 755.7. Samples: 1199734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:00:55,369][00354] Avg episode reward: [(0, '24.633')] +[2023-08-31 05:01:00,362][00354] Fps is (10 sec: 3276.6, 60 sec: 3003.9, 300 sec: 2846.4). Total num frames: 4812800. Throughput: 0: 743.1. Samples: 1204188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:01:00,367][00354] Avg episode reward: [(0, '25.374')] +[2023-08-31 05:01:05,362][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2832.5). Total num frames: 4825088. Throughput: 0: 745.1. Samples: 1205952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:01:05,366][00354] Avg episode reward: [(0, '25.091')] +[2023-08-31 05:01:07,785][07790] Updated weights for policy 0, policy_version 1180 (0.0025) +[2023-08-31 05:01:10,361][00354] Fps is (10 sec: 2867.4, 60 sec: 2935.5, 300 sec: 2846.4). Total num frames: 4841472. Throughput: 0: 755.5. Samples: 1209918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:01:10,365][00354] Avg episode reward: [(0, '24.543')] +[2023-08-31 05:01:15,361][00354] Fps is (10 sec: 3276.9, 60 sec: 2935.9, 300 sec: 2860.3). Total num frames: 4857856. Throughput: 0: 764.1. Samples: 1215560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:01:15,364][00354] Avg episode reward: [(0, '25.787')] +[2023-08-31 05:01:19,295][07790] Updated weights for policy 0, policy_version 1190 (0.0027) +[2023-08-31 05:01:20,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 2874.1). Total num frames: 4874240. Throughput: 0: 764.1. Samples: 1218376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:01:20,364][00354] Avg episode reward: [(0, '25.708')] +[2023-08-31 05:01:25,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2874.1). Total num frames: 4886528. Throughput: 0: 749.9. Samples: 1222028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:01:25,372][00354] Avg episode reward: [(0, '27.323')] +[2023-08-31 05:01:30,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2888.1). Total num frames: 4898816. Throughput: 0: 756.0. Samples: 1225742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:01:30,365][00354] Avg episode reward: [(0, '26.993')] +[2023-08-31 05:01:35,248][07790] Updated weights for policy 0, policy_version 1200 (0.0046) +[2023-08-31 05:01:35,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2902.0). Total num frames: 4915200. Throughput: 0: 751.2. Samples: 1227836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:01:35,365][00354] Avg episode reward: [(0, '26.368')] +[2023-08-31 05:01:40,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.8, 300 sec: 2915.8). Total num frames: 4931584. Throughput: 0: 737.5. Samples: 1232922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:01:40,372][00354] Avg episode reward: [(0, '25.632')] +[2023-08-31 05:01:45,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 4943872. Throughput: 0: 723.3. Samples: 1236738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:01:45,367][00354] Avg episode reward: [(0, '25.870')] +[2023-08-31 05:01:49,721][07790] Updated weights for policy 0, policy_version 1210 (0.0027) +[2023-08-31 05:01:50,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 4956160. Throughput: 0: 723.6. Samples: 1238512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:01:50,368][00354] Avg episode reward: [(0, '26.224')] +[2023-08-31 05:01:55,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2901.9). Total num frames: 4972544. Throughput: 0: 737.3. Samples: 1243096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:01:55,369][00354] Avg episode reward: [(0, '26.170')] +[2023-08-31 05:02:00,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3003.8, 300 sec: 2929.7). Total num frames: 4993024. Throughput: 0: 738.4. Samples: 1248786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 05:02:00,369][00354] Avg episode reward: [(0, '26.716')] +[2023-08-31 05:02:01,311][07790] Updated weights for policy 0, policy_version 1220 (0.0025) +[2023-08-31 05:02:05,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 5005312. Throughput: 0: 722.8. Samples: 1250904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:02:05,366][00354] Avg episode reward: [(0, '26.205')] +[2023-08-31 05:02:10,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 5017600. Throughput: 0: 720.8. Samples: 1254464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 05:02:10,365][00354] Avg episode reward: [(0, '27.626')] +[2023-08-31 05:02:15,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 5033984. Throughput: 0: 740.3. Samples: 1259056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:02:15,369][00354] Avg episode reward: [(0, '28.873')] +[2023-08-31 05:02:15,381][07777] Saving new best policy, reward=28.873! +[2023-08-31 05:02:16,292][07790] Updated weights for policy 0, policy_version 1230 (0.0029) +[2023-08-31 05:02:20,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 5050368. Throughput: 0: 754.8. Samples: 1261804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:02:20,366][00354] Avg episode reward: [(0, '27.499')] +[2023-08-31 05:02:25,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 5066752. Throughput: 0: 757.4. Samples: 1267006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:02:25,364][00354] Avg episode reward: [(0, '27.328')] +[2023-08-31 05:02:30,354][07790] Updated weights for policy 0, policy_version 1240 (0.0041) +[2023-08-31 05:02:30,362][00354] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 5079040. Throughput: 0: 744.3. Samples: 1270234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:02:30,365][00354] Avg episode reward: [(0, '27.535')] +[2023-08-31 05:02:35,362][00354] Fps is (10 sec: 2047.9, 60 sec: 2867.2, 300 sec: 2901.9). Total num frames: 5087232. Throughput: 0: 735.2. Samples: 1271596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:02:35,364][00354] Avg episode reward: [(0, '25.403')] +[2023-08-31 05:02:35,384][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001242_5087232.pth... +[2023-08-31 05:02:35,605][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001072_4390912.pth +[2023-08-31 05:02:40,364][00354] Fps is (10 sec: 1638.1, 60 sec: 2730.6, 300 sec: 2888.0). Total num frames: 5095424. Throughput: 0: 701.0. Samples: 1274644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:02:40,366][00354] Avg episode reward: [(0, '25.079')] +[2023-08-31 05:02:45,361][00354] Fps is (10 sec: 2867.3, 60 sec: 2867.2, 300 sec: 2901.9). Total num frames: 5115904. Throughput: 0: 678.2. Samples: 1279306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:02:45,367][00354] Avg episode reward: [(0, '24.850')] +[2023-08-31 05:02:46,476][07790] Updated weights for policy 0, policy_version 1250 (0.0020) +[2023-08-31 05:02:50,361][00354] Fps is (10 sec: 3277.6, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 5128192. Throughput: 0: 689.1. Samples: 1281912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:02:50,364][00354] Avg episode reward: [(0, '24.332')] +[2023-08-31 05:02:55,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2874.1). Total num frames: 5140480. Throughput: 0: 691.1. Samples: 1285562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 05:02:55,364][00354] Avg episode reward: [(0, '24.630')] +[2023-08-31 05:03:00,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2888.0). Total num frames: 5156864. Throughput: 0: 678.0. Samples: 1289566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:03:00,364][00354] Avg episode reward: [(0, '25.302')] +[2023-08-31 05:03:01,295][07790] Updated weights for policy 0, policy_version 1260 (0.0025) +[2023-08-31 05:03:05,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2901.9). Total num frames: 5173248. Throughput: 0: 680.3. Samples: 1292418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 05:03:05,364][00354] Avg episode reward: [(0, '25.926')] +[2023-08-31 05:03:10,362][00354] Fps is (10 sec: 3276.6, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 5189632. Throughput: 0: 687.9. Samples: 1297964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:03:10,369][00354] Avg episode reward: [(0, '26.457')] +[2023-08-31 05:03:14,408][07790] Updated weights for policy 0, policy_version 1270 (0.0034) +[2023-08-31 05:03:15,362][00354] Fps is (10 sec: 2867.0, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 5201920. Throughput: 0: 688.8. Samples: 1301232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:03:15,367][00354] Avg episode reward: [(0, '27.876')] +[2023-08-31 05:03:20,361][00354] Fps is (10 sec: 2457.7, 60 sec: 2730.7, 300 sec: 2888.0). Total num frames: 5214208. Throughput: 0: 690.2. Samples: 1302654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:03:20,368][00354] Avg episode reward: [(0, '28.729')] +[2023-08-31 05:03:25,361][00354] Fps is (10 sec: 2867.4, 60 sec: 2730.7, 300 sec: 2901.9). Total num frames: 5230592. Throughput: 0: 721.5. Samples: 1307112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:03:25,367][00354] Avg episode reward: [(0, '28.593')] +[2023-08-31 05:03:28,338][07790] Updated weights for policy 0, policy_version 1280 (0.0018) +[2023-08-31 05:03:30,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2799.0, 300 sec: 2901.9). Total num frames: 5246976. Throughput: 0: 744.4. Samples: 1312804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:03:30,364][00354] Avg episode reward: [(0, '27.650')] +[2023-08-31 05:03:35,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 5263360. Throughput: 0: 738.2. Samples: 1315132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:03:35,365][00354] Avg episode reward: [(0, '26.305')] +[2023-08-31 05:03:40,362][00354] Fps is (10 sec: 2867.1, 60 sec: 3003.8, 300 sec: 2888.0). Total num frames: 5275648. Throughput: 0: 736.5. Samples: 1318704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:03:40,368][00354] Avg episode reward: [(0, '27.274')] +[2023-08-31 05:03:43,549][07790] Updated weights for policy 0, policy_version 1290 (0.0023) +[2023-08-31 05:03:45,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 5287936. Throughput: 0: 741.5. Samples: 1322934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:03:45,363][00354] Avg episode reward: [(0, '28.016')] +[2023-08-31 05:03:50,361][00354] Fps is (10 sec: 3276.9, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 5308416. Throughput: 0: 741.0. Samples: 1325762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:03:50,364][00354] Avg episode reward: [(0, '27.967')] +[2023-08-31 05:03:54,918][07790] Updated weights for policy 0, policy_version 1300 (0.0015) +[2023-08-31 05:03:55,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 5324800. Throughput: 0: 737.3. Samples: 1331142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:03:55,373][00354] Avg episode reward: [(0, '26.747')] +[2023-08-31 05:04:00,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2888.0). Total num frames: 5337088. Throughput: 0: 744.1. Samples: 1334714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:04:00,364][00354] Avg episode reward: [(0, '26.655')] +[2023-08-31 05:04:05,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 5349376. Throughput: 0: 752.4. Samples: 1336512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:04:05,365][00354] Avg episode reward: [(0, '26.703')] +[2023-08-31 05:04:09,340][07790] Updated weights for policy 0, policy_version 1310 (0.0033) +[2023-08-31 05:04:10,362][00354] Fps is (10 sec: 3276.6, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 5369856. Throughput: 0: 771.0. Samples: 1341808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:04:10,369][00354] Avg episode reward: [(0, '28.178')] +[2023-08-31 05:04:15,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 5386240. Throughput: 0: 765.0. Samples: 1347230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:04:15,371][00354] Avg episode reward: [(0, '28.194')] +[2023-08-31 05:04:20,361][00354] Fps is (10 sec: 2867.4, 60 sec: 3072.0, 300 sec: 2888.0). Total num frames: 5398528. Throughput: 0: 753.7. Samples: 1349050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 05:04:20,368][00354] Avg episode reward: [(0, '26.742')] +[2023-08-31 05:04:23,080][07790] Updated weights for policy 0, policy_version 1320 (0.0051) +[2023-08-31 05:04:25,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 5410816. Throughput: 0: 754.6. Samples: 1352660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 05:04:25,368][00354] Avg episode reward: [(0, '26.382')] +[2023-08-31 05:04:30,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 5427200. Throughput: 0: 774.5. Samples: 1357788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:04:30,364][00354] Avg episode reward: [(0, '25.529')] +[2023-08-31 05:04:34,969][07790] Updated weights for policy 0, policy_version 1330 (0.0027) +[2023-08-31 05:04:35,362][00354] Fps is (10 sec: 3686.3, 60 sec: 3072.0, 300 sec: 2957.4). Total num frames: 5447680. Throughput: 0: 773.8. Samples: 1360584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:04:35,367][00354] Avg episode reward: [(0, '25.206')] +[2023-08-31 05:04:35,381][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001330_5447680.pth... +[2023-08-31 05:04:35,512][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001157_4739072.pth +[2023-08-31 05:04:40,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2943.6). Total num frames: 5459968. Throughput: 0: 755.3. Samples: 1365130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:04:40,364][00354] Avg episode reward: [(0, '24.881')] +[2023-08-31 05:04:45,362][00354] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2915.8). Total num frames: 5472256. Throughput: 0: 754.2. Samples: 1368652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 05:04:45,370][00354] Avg episode reward: [(0, '24.437')] +[2023-08-31 05:04:50,117][07790] Updated weights for policy 0, policy_version 1340 (0.0017) +[2023-08-31 05:04:50,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 5488640. Throughput: 0: 762.5. Samples: 1370826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:04:50,366][00354] Avg episode reward: [(0, '24.477')] +[2023-08-31 05:04:55,362][00354] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 5505024. Throughput: 0: 770.8. Samples: 1376496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:04:55,365][00354] Avg episode reward: [(0, '24.728')] +[2023-08-31 05:05:00,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2957.5). Total num frames: 5517312. Throughput: 0: 724.5. Samples: 1379832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:05:00,366][00354] Avg episode reward: [(0, '25.396')] +[2023-08-31 05:05:05,164][07790] Updated weights for policy 0, policy_version 1350 (0.0028) +[2023-08-31 05:05:05,363][00354] Fps is (10 sec: 2457.4, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 5529600. Throughput: 0: 722.4. Samples: 1381560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 05:05:05,368][00354] Avg episode reward: [(0, '25.896')] +[2023-08-31 05:05:10,362][00354] Fps is (10 sec: 2457.5, 60 sec: 2867.2, 300 sec: 2915.9). Total num frames: 5541888. Throughput: 0: 719.2. Samples: 1385022. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2023-08-31 05:05:10,365][00354] Avg episode reward: [(0, '24.181')] +[2023-08-31 05:05:15,361][00354] Fps is (10 sec: 2867.6, 60 sec: 2867.2, 300 sec: 2943.6). Total num frames: 5558272. Throughput: 0: 725.1. Samples: 1390416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2023-08-31 05:05:15,369][00354] Avg episode reward: [(0, '23.907')] +[2023-08-31 05:05:17,895][07790] Updated weights for policy 0, policy_version 1360 (0.0044) +[2023-08-31 05:05:20,361][00354] Fps is (10 sec: 3276.9, 60 sec: 2935.5, 300 sec: 2957.5). Total num frames: 5574656. Throughput: 0: 725.5. Samples: 1393232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 05:05:20,370][00354] Avg episode reward: [(0, '24.272')] +[2023-08-31 05:05:25,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 5586944. Throughput: 0: 707.2. Samples: 1396954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 05:05:25,364][00354] Avg episode reward: [(0, '24.024')] +[2023-08-31 05:05:30,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 5599232. Throughput: 0: 711.2. Samples: 1400658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-08-31 05:05:30,369][00354] Avg episode reward: [(0, '24.347')] +[2023-08-31 05:05:34,004][07790] Updated weights for policy 0, policy_version 1370 (0.0034) +[2023-08-31 05:05:35,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2915.8). Total num frames: 5611520. Throughput: 0: 709.6. Samples: 1402760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-08-31 05:05:35,364][00354] Avg episode reward: [(0, '24.284')] +[2023-08-31 05:05:40,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2915.8). Total num frames: 5623808. Throughput: 0: 663.3. Samples: 1406342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:05:40,372][00354] Avg episode reward: [(0, '25.540')] +[2023-08-31 05:05:45,367][00354] Fps is (10 sec: 2046.9, 60 sec: 2662.2, 300 sec: 2888.0). Total num frames: 5632000. Throughput: 0: 657.7. Samples: 1409430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:05:45,369][00354] Avg episode reward: [(0, '25.121')] +[2023-08-31 05:05:50,361][00354] Fps is (10 sec: 2048.0, 60 sec: 2594.1, 300 sec: 2860.3). Total num frames: 5644288. Throughput: 0: 658.0. Samples: 1411168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:05:50,364][00354] Avg episode reward: [(0, '25.653')] +[2023-08-31 05:05:52,431][07790] Updated weights for policy 0, policy_version 1380 (0.0052) +[2023-08-31 05:05:55,364][00354] Fps is (10 sec: 2868.1, 60 sec: 2594.0, 300 sec: 2874.1). Total num frames: 5660672. Throughput: 0: 675.5. Samples: 1415422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:05:55,366][00354] Avg episode reward: [(0, '27.189')] +[2023-08-31 05:06:00,361][00354] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2901.9). Total num frames: 5681152. Throughput: 0: 683.2. Samples: 1421158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 05:06:00,370][00354] Avg episode reward: [(0, '26.505')] +[2023-08-31 05:06:03,267][07790] Updated weights for policy 0, policy_version 1390 (0.0024) +[2023-08-31 05:06:05,361][00354] Fps is (10 sec: 3687.3, 60 sec: 2799.0, 300 sec: 2901.9). Total num frames: 5697536. Throughput: 0: 678.5. Samples: 1423766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:06:05,364][00354] Avg episode reward: [(0, '27.090')] +[2023-08-31 05:06:10,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2799.0, 300 sec: 2888.0). Total num frames: 5709824. Throughput: 0: 674.0. Samples: 1427284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:06:10,365][00354] Avg episode reward: [(0, '28.229')] +[2023-08-31 05:06:15,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2874.1). Total num frames: 5722112. Throughput: 0: 683.9. Samples: 1431432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:06:15,364][00354] Avg episode reward: [(0, '27.243')] +[2023-08-31 05:06:18,008][07790] Updated weights for policy 0, policy_version 1400 (0.0022) +[2023-08-31 05:06:20,361][00354] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2901.9). Total num frames: 5742592. Throughput: 0: 699.0. Samples: 1434216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:06:20,364][00354] Avg episode reward: [(0, '26.438')] +[2023-08-31 05:06:25,365][00354] Fps is (10 sec: 3685.1, 60 sec: 2867.0, 300 sec: 2915.8). Total num frames: 5758976. Throughput: 0: 746.2. Samples: 1439922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:06:25,368][00354] Avg episode reward: [(0, '26.759')] +[2023-08-31 05:06:30,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2901.9). Total num frames: 5771264. Throughput: 0: 758.3. Samples: 1443548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-08-31 05:06:30,366][00354] Avg episode reward: [(0, '27.389')] +[2023-08-31 05:06:31,448][07790] Updated weights for policy 0, policy_version 1410 (0.0015) +[2023-08-31 05:06:35,365][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.0, 300 sec: 2888.0). Total num frames: 5783552. Throughput: 0: 758.2. Samples: 1445288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:06:35,371][00354] Avg episode reward: [(0, '27.324')] +[2023-08-31 05:06:35,386][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001412_5783552.pth... +[2023-08-31 05:06:35,540][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001242_5087232.pth +[2023-08-31 05:06:40,361][00354] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 5795840. Throughput: 0: 746.9. Samples: 1449030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:06:40,374][00354] Avg episode reward: [(0, '26.097')] +[2023-08-31 05:06:44,993][07790] Updated weights for policy 0, policy_version 1420 (0.0032) +[2023-08-31 05:06:45,361][00354] Fps is (10 sec: 3278.0, 60 sec: 3072.3, 300 sec: 2915.8). Total num frames: 5816320. Throughput: 0: 746.0. Samples: 1454728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:06:45,364][00354] Avg episode reward: [(0, '24.801')] +[2023-08-31 05:06:50,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2901.9). Total num frames: 5828608. Throughput: 0: 744.1. Samples: 1457250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:06:50,369][00354] Avg episode reward: [(0, '25.528')] +[2023-08-31 05:06:55,364][00354] Fps is (10 sec: 2457.0, 60 sec: 3003.7, 300 sec: 2874.1). Total num frames: 5840896. Throughput: 0: 746.0. Samples: 1460856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:06:55,371][00354] Avg episode reward: [(0, '25.134')] +[2023-08-31 05:06:59,915][07790] Updated weights for policy 0, policy_version 1430 (0.0028) +[2023-08-31 05:07:00,361][00354] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 5857280. Throughput: 0: 745.2. Samples: 1464968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:07:00,369][00354] Avg episode reward: [(0, '25.810')] +[2023-08-31 05:07:05,361][00354] Fps is (10 sec: 3687.3, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 5877760. Throughput: 0: 747.5. Samples: 1467852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-08-31 05:07:05,364][00354] Avg episode reward: [(0, '25.435')] +[2023-08-31 05:07:10,366][00354] Fps is (10 sec: 3684.8, 60 sec: 3071.8, 300 sec: 2915.8). Total num frames: 5894144. Throughput: 0: 744.7. Samples: 1473436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:07:10,369][00354] Avg episode reward: [(0, '25.155')] +[2023-08-31 05:07:11,730][07790] Updated weights for policy 0, policy_version 1440 (0.0016) +[2023-08-31 05:07:15,366][00354] Fps is (10 sec: 2865.8, 60 sec: 3071.8, 300 sec: 2901.9). Total num frames: 5906432. Throughput: 0: 745.0. Samples: 1477078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-08-31 05:07:15,372][00354] Avg episode reward: [(0, '26.089')] +[2023-08-31 05:07:20,361][00354] Fps is (10 sec: 2458.7, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 5918720. Throughput: 0: 746.6. Samples: 1478884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 05:07:20,369][00354] Avg episode reward: [(0, '26.486')] +[2023-08-31 05:07:25,361][00354] Fps is (10 sec: 2868.5, 60 sec: 2935.6, 300 sec: 2901.9). Total num frames: 5935104. Throughput: 0: 774.8. Samples: 1483894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-08-31 05:07:25,370][00354] Avg episode reward: [(0, '25.124')] +[2023-08-31 05:07:25,665][07790] Updated weights for policy 0, policy_version 1450 (0.0024) +[2023-08-31 05:07:30,361][00354] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2943.6). Total num frames: 5955584. Throughput: 0: 774.8. Samples: 1489596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:07:30,364][00354] Avg episode reward: [(0, '24.840')] +[2023-08-31 05:07:35,361][00354] Fps is (10 sec: 3276.8, 60 sec: 3072.2, 300 sec: 2957.5). Total num frames: 5967872. Throughput: 0: 760.0. Samples: 1491450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 05:07:35,364][00354] Avg episode reward: [(0, '24.522')] +[2023-08-31 05:07:39,962][07790] Updated weights for policy 0, policy_version 1460 (0.0027) +[2023-08-31 05:07:40,361][00354] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2929.7). Total num frames: 5980160. Throughput: 0: 760.6. Samples: 1495082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-08-31 05:07:40,364][00354] Avg episode reward: [(0, '23.759')] +[2023-08-31 05:07:45,361][00354] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2943.6). Total num frames: 5996544. Throughput: 0: 774.4. Samples: 1499818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-08-31 05:07:45,367][00354] Avg episode reward: [(0, '24.447')] +[2023-08-31 05:07:47,018][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... +[2023-08-31 05:07:47,024][00354] Component Batcher_0 stopped! +[2023-08-31 05:07:47,020][07777] Stopping Batcher_0... +[2023-08-31 05:07:47,037][07777] Loop batcher_evt_loop terminating... +[2023-08-31 05:07:47,092][00354] Component RolloutWorker_w6 stopped! +[2023-08-31 05:07:47,097][00354] Component RolloutWorker_w7 stopped! +[2023-08-31 05:07:47,099][07797] Stopping RolloutWorker_w6... +[2023-08-31 05:07:47,100][07797] Loop rollout_proc6_evt_loop terminating... +[2023-08-31 05:07:47,113][07790] Weights refcount: 2 0 +[2023-08-31 05:07:47,093][07798] Stopping RolloutWorker_w7... +[2023-08-31 05:07:47,121][07798] Loop rollout_proc7_evt_loop terminating... +[2023-08-31 05:07:47,122][00354] Component InferenceWorker_p0-w0 stopped! +[2023-08-31 05:07:47,127][07790] Stopping InferenceWorker_p0-w0... +[2023-08-31 05:07:47,127][07790] Loop inference_proc0-0_evt_loop terminating... +[2023-08-31 05:07:47,137][07795] Stopping RolloutWorker_w5... +[2023-08-31 05:07:47,136][00354] Component RolloutWorker_w3 stopped! +[2023-08-31 05:07:47,140][00354] Component RolloutWorker_w5 stopped! +[2023-08-31 05:07:47,136][07794] Stopping RolloutWorker_w3... +[2023-08-31 05:07:47,150][00354] Component RolloutWorker_w0 stopped! +[2023-08-31 05:07:47,138][07795] Loop rollout_proc5_evt_loop terminating... +[2023-08-31 05:07:47,160][07796] Stopping RolloutWorker_w4... +[2023-08-31 05:07:47,160][07796] Loop rollout_proc4_evt_loop terminating... +[2023-08-31 05:07:47,161][07792] Stopping RolloutWorker_w1... +[2023-08-31 05:07:47,156][00354] Component RolloutWorker_w4 stopped! +[2023-08-31 05:07:47,164][00354] Component RolloutWorker_w1 stopped! +[2023-08-31 05:07:47,145][07794] Loop rollout_proc3_evt_loop terminating... +[2023-08-31 05:07:47,161][07792] Loop rollout_proc1_evt_loop terminating... +[2023-08-31 05:07:47,153][07791] Stopping RolloutWorker_w0... +[2023-08-31 05:07:47,179][07791] Loop rollout_proc0_evt_loop terminating... +[2023-08-31 05:07:47,183][00354] Component RolloutWorker_w2 stopped! +[2023-08-31 05:07:47,190][07793] Stopping RolloutWorker_w2... +[2023-08-31 05:07:47,190][07793] Loop rollout_proc2_evt_loop terminating... +[2023-08-31 05:07:47,237][07777] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001330_5447680.pth +[2023-08-31 05:07:47,257][07777] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... +[2023-08-31 05:07:47,435][00354] Component LearnerWorker_p0 stopped! +[2023-08-31 05:07:47,444][00354] Waiting for process learner_proc0 to stop... +[2023-08-31 05:07:47,452][07777] Stopping LearnerWorker_p0... +[2023-08-31 05:07:47,453][07777] Loop learner_proc0_evt_loop terminating... +[2023-08-31 05:07:49,490][00354] Waiting for process inference_proc0-0 to join... +[2023-08-31 05:07:49,497][00354] Waiting for process rollout_proc0 to join... +[2023-08-31 05:07:51,089][00354] Waiting for process rollout_proc1 to join... +[2023-08-31 05:07:51,154][00354] Waiting for process rollout_proc2 to join... +[2023-08-31 05:07:51,159][00354] Waiting for process rollout_proc3 to join... +[2023-08-31 05:07:51,161][00354] Waiting for process rollout_proc4 to join... +[2023-08-31 05:07:51,168][00354] Waiting for process rollout_proc5 to join... +[2023-08-31 05:07:51,169][00354] Waiting for process rollout_proc6 to join... +[2023-08-31 05:07:51,170][00354] Waiting for process rollout_proc7 to join... +[2023-08-31 05:07:51,172][00354] Batcher 0 profile tree view: +batching: 46.4333, releasing_batches: 0.0373 +[2023-08-31 05:07:51,173][00354] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0001 + wait_policy_total: 946.9288 +update_model: 14.1219 + weight_update: 0.0029 +one_step: 0.0050 + handle_policy_step: 1034.2627 + deserialize: 27.9700, stack: 5.3380, obs_to_device_normalize: 197.4114, forward: 569.1924, send_messages: 48.7170 + prepare_outputs: 136.4909 + to_cpu: 77.6260 +[2023-08-31 05:07:51,175][00354] Learner 0 profile tree view: +misc: 0.0100, prepare_batch: 27.5837 +train: 115.5694 + epoch_init: 0.0202, minibatch_init: 0.0151, losses_postprocess: 1.0120, kl_divergence: 1.0395, after_optimizer: 6.4908 + calculate_losses: 39.7318 + losses_init: 0.0061, forward_head: 1.9836, bptt_initial: 25.7143, tail: 1.7919, advantages_returns: 0.4549, losses: 5.6091 + bptt: 3.6175 + bptt_forward_core: 3.4860 + update: 66.0824 + clip: 48.8780 +[2023-08-31 05:07:51,176][00354] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.6482, enqueue_policy_requests: 284.4727, env_step: 1545.0806, overhead: 43.7061, complete_rollouts: 12.8863 +save_policy_outputs: 39.1894 + split_output_tensors: 18.7165 +[2023-08-31 05:07:51,178][00354] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.7754, enqueue_policy_requests: 289.1712, env_step: 1541.5377, overhead: 44.2261, complete_rollouts: 13.0093 +save_policy_outputs: 38.7855 + split_output_tensors: 18.1956 +[2023-08-31 05:07:51,179][00354] Loop Runner_EvtLoop terminating... +[2023-08-31 05:07:51,181][00354] Runner profile tree view: +main_loop: 2109.1443 +[2023-08-31 05:07:51,182][00354] Collected {0: 6004736}, FPS: 2847.0 +[2023-08-31 05:07:56,202][00354] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-08-31 05:07:56,204][00354] Overriding arg 'num_workers' with value 1 passed from command line +[2023-08-31 05:07:56,207][00354] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-08-31 05:07:56,210][00354] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-08-31 05:07:56,213][00354] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-08-31 05:07:56,215][00354] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-08-31 05:07:56,219][00354] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-08-31 05:07:56,220][00354] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-08-31 05:07:56,221][00354] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-08-31 05:07:56,223][00354] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-08-31 05:07:56,225][00354] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-08-31 05:07:56,227][00354] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-08-31 05:07:56,229][00354] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-08-31 05:07:56,230][00354] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-08-31 05:07:56,232][00354] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-08-31 05:07:56,268][00354] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-08-31 05:07:56,272][00354] RunningMeanStd input shape: (3, 72, 128) +[2023-08-31 05:07:56,280][00354] RunningMeanStd input shape: (1,) +[2023-08-31 05:07:56,297][00354] ConvEncoder: input_channels=3 +[2023-08-31 05:07:56,432][00354] Conv encoder output size: 512 +[2023-08-31 05:07:56,434][00354] Policy head output size: 512 +[2023-08-31 05:07:59,067][00354] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... +[2023-08-31 05:08:00,498][00354] Num frames 100... +[2023-08-31 05:08:00,647][00354] Num frames 200... +[2023-08-31 05:08:00,785][00354] Num frames 300... +[2023-08-31 05:08:00,928][00354] Num frames 400... +[2023-08-31 05:08:01,072][00354] Num frames 500... +[2023-08-31 05:08:01,214][00354] Num frames 600... +[2023-08-31 05:08:01,358][00354] Num frames 700... +[2023-08-31 05:08:01,493][00354] Num frames 800... +[2023-08-31 05:08:01,632][00354] Num frames 900... +[2023-08-31 05:08:01,775][00354] Num frames 1000... +[2023-08-31 05:08:01,928][00354] Num frames 1100... +[2023-08-31 05:08:02,078][00354] Num frames 1200... +[2023-08-31 05:08:02,175][00354] Avg episode rewards: #0: 29.240, true rewards: #0: 12.240 +[2023-08-31 05:08:02,177][00354] Avg episode reward: 29.240, avg true_objective: 12.240 +[2023-08-31 05:08:02,286][00354] Num frames 1300... +[2023-08-31 05:08:02,439][00354] Num frames 1400... +[2023-08-31 05:08:02,587][00354] Num frames 1500... +[2023-08-31 05:08:02,726][00354] Num frames 1600... +[2023-08-31 05:08:02,866][00354] Num frames 1700... +[2023-08-31 05:08:03,014][00354] Num frames 1800... +[2023-08-31 05:08:03,163][00354] Num frames 1900... +[2023-08-31 05:08:03,307][00354] Num frames 2000... +[2023-08-31 05:08:03,456][00354] Num frames 2100... +[2023-08-31 05:08:03,600][00354] Num frames 2200... +[2023-08-31 05:08:03,739][00354] Num frames 2300... +[2023-08-31 05:08:03,891][00354] Num frames 2400... +[2023-08-31 05:08:04,049][00354] Avg episode rewards: #0: 28.360, true rewards: #0: 12.360 +[2023-08-31 05:08:04,051][00354] Avg episode reward: 28.360, avg true_objective: 12.360 +[2023-08-31 05:08:04,104][00354] Num frames 2500... +[2023-08-31 05:08:04,245][00354] Num frames 2600... +[2023-08-31 05:08:04,389][00354] Num frames 2700... +[2023-08-31 05:08:04,527][00354] Num frames 2800... +[2023-08-31 05:08:04,706][00354] Num frames 2900... +[2023-08-31 05:08:04,915][00354] Num frames 3000... +[2023-08-31 05:08:05,130][00354] Num frames 3100... +[2023-08-31 05:08:05,340][00354] Num frames 3200... +[2023-08-31 05:08:05,546][00354] Num frames 3300... +[2023-08-31 05:08:05,774][00354] Avg episode rewards: #0: 24.944, true rewards: #0: 11.277 +[2023-08-31 05:08:05,776][00354] Avg episode reward: 24.944, avg true_objective: 11.277 +[2023-08-31 05:08:05,814][00354] Num frames 3400... +[2023-08-31 05:08:06,027][00354] Num frames 3500... +[2023-08-31 05:08:06,238][00354] Num frames 3600... +[2023-08-31 05:08:06,439][00354] Num frames 3700... +[2023-08-31 05:08:06,651][00354] Num frames 3800... +[2023-08-31 05:08:06,860][00354] Num frames 3900... +[2023-08-31 05:08:07,072][00354] Num frames 4000... +[2023-08-31 05:08:07,286][00354] Num frames 4100... +[2023-08-31 05:08:07,491][00354] Num frames 4200... +[2023-08-31 05:08:07,693][00354] Num frames 4300... +[2023-08-31 05:08:07,776][00354] Avg episode rewards: #0: 24.278, true rewards: #0: 10.777 +[2023-08-31 05:08:07,778][00354] Avg episode reward: 24.278, avg true_objective: 10.777 +[2023-08-31 05:08:07,963][00354] Num frames 4400... +[2023-08-31 05:08:08,167][00354] Num frames 4500... +[2023-08-31 05:08:08,387][00354] Num frames 4600... +[2023-08-31 05:08:08,600][00354] Num frames 4700... +[2023-08-31 05:08:08,807][00354] Num frames 4800... +[2023-08-31 05:08:08,987][00354] Num frames 4900... +[2023-08-31 05:08:09,132][00354] Num frames 5000... +[2023-08-31 05:08:09,273][00354] Num frames 5100... +[2023-08-31 05:08:09,422][00354] Num frames 5200... +[2023-08-31 05:08:09,564][00354] Num frames 5300... +[2023-08-31 05:08:09,705][00354] Num frames 5400... +[2023-08-31 05:08:09,845][00354] Num frames 5500... +[2023-08-31 05:08:09,995][00354] Num frames 5600... +[2023-08-31 05:08:10,134][00354] Num frames 5700... +[2023-08-31 05:08:10,308][00354] Avg episode rewards: #0: 27.366, true rewards: #0: 11.566 +[2023-08-31 05:08:10,310][00354] Avg episode reward: 27.366, avg true_objective: 11.566 +[2023-08-31 05:08:10,347][00354] Num frames 5800... +[2023-08-31 05:08:10,491][00354] Num frames 5900... +[2023-08-31 05:08:10,634][00354] Num frames 6000... +[2023-08-31 05:08:10,771][00354] Num frames 6100... +[2023-08-31 05:08:10,916][00354] Num frames 6200... +[2023-08-31 05:08:11,054][00354] Num frames 6300... +[2023-08-31 05:08:11,196][00354] Num frames 6400... +[2023-08-31 05:08:11,352][00354] Num frames 6500... +[2023-08-31 05:08:11,496][00354] Num frames 6600... +[2023-08-31 05:08:11,636][00354] Num frames 6700... +[2023-08-31 05:08:11,780][00354] Num frames 6800... +[2023-08-31 05:08:11,922][00354] Num frames 6900... +[2023-08-31 05:08:11,984][00354] Avg episode rewards: #0: 27.505, true rewards: #0: 11.505 +[2023-08-31 05:08:11,985][00354] Avg episode reward: 27.505, avg true_objective: 11.505 +[2023-08-31 05:08:12,125][00354] Num frames 7000... +[2023-08-31 05:08:12,268][00354] Num frames 7100... +[2023-08-31 05:08:12,431][00354] Num frames 7200... +[2023-08-31 05:08:12,578][00354] Num frames 7300... +[2023-08-31 05:08:12,718][00354] Num frames 7400... +[2023-08-31 05:08:12,855][00354] Num frames 7500... +[2023-08-31 05:08:12,998][00354] Num frames 7600... +[2023-08-31 05:08:13,138][00354] Num frames 7700... +[2023-08-31 05:08:13,276][00354] Num frames 7800... +[2023-08-31 05:08:13,426][00354] Num frames 7900... +[2023-08-31 05:08:13,568][00354] Num frames 8000... +[2023-08-31 05:08:13,720][00354] Num frames 8100... +[2023-08-31 05:08:13,914][00354] Num frames 8200... +[2023-08-31 05:08:14,154][00354] Num frames 8300... +[2023-08-31 05:08:14,366][00354] Num frames 8400... +[2023-08-31 05:08:14,576][00354] Num frames 8500... +[2023-08-31 05:08:14,779][00354] Num frames 8600... +[2023-08-31 05:08:14,960][00354] Avg episode rewards: #0: 29.656, true rewards: #0: 12.370 +[2023-08-31 05:08:14,962][00354] Avg episode reward: 29.656, avg true_objective: 12.370 +[2023-08-31 05:08:15,052][00354] Num frames 8700... +[2023-08-31 05:08:15,259][00354] Num frames 8800... +[2023-08-31 05:08:15,470][00354] Num frames 8900... +[2023-08-31 05:08:15,673][00354] Num frames 9000... +[2023-08-31 05:08:15,875][00354] Num frames 9100... +[2023-08-31 05:08:16,018][00354] Num frames 9200... +[2023-08-31 05:08:16,156][00354] Num frames 9300... +[2023-08-31 05:08:16,306][00354] Num frames 9400... +[2023-08-31 05:08:16,449][00354] Num frames 9500... +[2023-08-31 05:08:16,606][00354] Num frames 9600... +[2023-08-31 05:08:16,747][00354] Num frames 9700... +[2023-08-31 05:08:16,893][00354] Num frames 9800... +[2023-08-31 05:08:17,030][00354] Num frames 9900... +[2023-08-31 05:08:17,168][00354] Num frames 10000... +[2023-08-31 05:08:17,316][00354] Num frames 10100... +[2023-08-31 05:08:17,457][00354] Num frames 10200... +[2023-08-31 05:08:17,604][00354] Num frames 10300... +[2023-08-31 05:08:17,752][00354] Num frames 10400... +[2023-08-31 05:08:17,889][00354] Num frames 10500... +[2023-08-31 05:08:18,011][00354] Avg episode rewards: #0: 31.809, true rewards: #0: 13.184 +[2023-08-31 05:08:18,013][00354] Avg episode reward: 31.809, avg true_objective: 13.184 +[2023-08-31 05:08:18,093][00354] Num frames 10600... +[2023-08-31 05:08:18,232][00354] Num frames 10700... +[2023-08-31 05:08:18,378][00354] Num frames 10800... +[2023-08-31 05:08:18,525][00354] Num frames 10900... +[2023-08-31 05:08:18,667][00354] Num frames 11000... +[2023-08-31 05:08:18,802][00354] Num frames 11100... +[2023-08-31 05:08:18,949][00354] Num frames 11200... +[2023-08-31 05:08:19,176][00354] Avg episode rewards: #0: 29.869, true rewards: #0: 12.536 +[2023-08-31 05:08:19,179][00354] Avg episode reward: 29.869, avg true_objective: 12.536 +[2023-08-31 05:08:19,217][00354] Num frames 11300... +[2023-08-31 05:08:19,416][00354] Num frames 11400... +[2023-08-31 05:08:19,617][00354] Num frames 11500... +[2023-08-31 05:08:19,822][00354] Num frames 11600... +[2023-08-31 05:08:20,029][00354] Num frames 11700... +[2023-08-31 05:08:20,235][00354] Num frames 11800... +[2023-08-31 05:08:20,452][00354] Num frames 11900... +[2023-08-31 05:08:20,663][00354] Num frames 12000... +[2023-08-31 05:08:20,730][00354] Avg episode rewards: #0: 28.603, true rewards: #0: 12.003 +[2023-08-31 05:08:20,736][00354] Avg episode reward: 28.603, avg true_objective: 12.003 +[2023-08-31 05:09:47,567][00354] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-08-31 05:10:01,618][00354] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-08-31 05:10:01,620][00354] Overriding arg 'num_workers' with value 1 passed from command line +[2023-08-31 05:10:01,623][00354] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-08-31 05:10:01,624][00354] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-08-31 05:10:01,625][00354] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-08-31 05:10:01,627][00354] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-08-31 05:10:01,628][00354] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-08-31 05:10:01,630][00354] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-08-31 05:10:01,631][00354] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-08-31 05:10:01,632][00354] Adding new argument 'hf_repository'='AdanLee/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-08-31 05:10:01,634][00354] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-08-31 05:10:01,636][00354] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-08-31 05:10:01,637][00354] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-08-31 05:10:01,646][00354] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-08-31 05:10:01,651][00354] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-08-31 05:10:01,684][00354] RunningMeanStd input shape: (3, 72, 128) +[2023-08-31 05:10:01,688][00354] RunningMeanStd input shape: (1,) +[2023-08-31 05:10:01,702][00354] ConvEncoder: input_channels=3 +[2023-08-31 05:10:01,742][00354] Conv encoder output size: 512 +[2023-08-31 05:10:01,744][00354] Policy head output size: 512 +[2023-08-31 05:10:01,766][00354] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... +[2023-08-31 05:10:02,269][00354] Num frames 100... +[2023-08-31 05:10:02,417][00354] Num frames 200... +[2023-08-31 05:10:02,553][00354] Num frames 300... +[2023-08-31 05:10:02,699][00354] Num frames 400... +[2023-08-31 05:10:02,809][00354] Avg episode rewards: #0: 6.370, true rewards: #0: 4.370 +[2023-08-31 05:10:02,810][00354] Avg episode reward: 6.370, avg true_objective: 4.370 +[2023-08-31 05:10:02,904][00354] Num frames 500... +[2023-08-31 05:10:03,047][00354] Num frames 600... +[2023-08-31 05:10:03,196][00354] Num frames 700... +[2023-08-31 05:10:03,354][00354] Num frames 800... +[2023-08-31 05:10:03,498][00354] Num frames 900... +[2023-08-31 05:10:03,643][00354] Num frames 1000... +[2023-08-31 05:10:03,792][00354] Num frames 1100... +[2023-08-31 05:10:03,938][00354] Num frames 1200... +[2023-08-31 05:10:04,078][00354] Num frames 1300... +[2023-08-31 05:10:04,237][00354] Num frames 1400... +[2023-08-31 05:10:04,382][00354] Num frames 1500... +[2023-08-31 05:10:04,562][00354] Avg episode rewards: #0: 15.945, true rewards: #0: 7.945 +[2023-08-31 05:10:04,565][00354] Avg episode reward: 15.945, avg true_objective: 7.945 +[2023-08-31 05:10:04,584][00354] Num frames 1600... +[2023-08-31 05:10:04,727][00354] Num frames 1700... +[2023-08-31 05:10:04,869][00354] Num frames 1800... +[2023-08-31 05:10:05,014][00354] Num frames 1900... +[2023-08-31 05:10:05,165][00354] Num frames 2000... +[2023-08-31 05:10:05,306][00354] Num frames 2100... +[2023-08-31 05:10:05,451][00354] Num frames 2200... +[2023-08-31 05:10:05,592][00354] Num frames 2300... +[2023-08-31 05:10:05,734][00354] Num frames 2400... +[2023-08-31 05:10:05,883][00354] Num frames 2500... +[2023-08-31 05:10:06,029][00354] Num frames 2600... +[2023-08-31 05:10:06,154][00354] Avg episode rewards: #0: 18.150, true rewards: #0: 8.817 +[2023-08-31 05:10:06,157][00354] Avg episode reward: 18.150, avg true_objective: 8.817 +[2023-08-31 05:10:06,279][00354] Num frames 2700... +[2023-08-31 05:10:06,486][00354] Num frames 2800... +[2023-08-31 05:10:06,694][00354] Num frames 2900... +[2023-08-31 05:10:06,898][00354] Num frames 3000... +[2023-08-31 05:10:07,098][00354] Num frames 3100... +[2023-08-31 05:10:07,313][00354] Num frames 3200... +[2023-08-31 05:10:07,519][00354] Num frames 3300... +[2023-08-31 05:10:07,727][00354] Num frames 3400... +[2023-08-31 05:10:07,936][00354] Num frames 3500... +[2023-08-31 05:10:08,140][00354] Num frames 3600... +[2023-08-31 05:10:08,349][00354] Num frames 3700... +[2023-08-31 05:10:08,566][00354] Num frames 3800... +[2023-08-31 05:10:08,787][00354] Num frames 3900... +[2023-08-31 05:10:09,040][00354] Avg episode rewards: #0: 22.230, true rewards: #0: 9.980 +[2023-08-31 05:10:09,042][00354] Avg episode reward: 22.230, avg true_objective: 9.980 +[2023-08-31 05:10:09,062][00354] Num frames 4000... +[2023-08-31 05:10:09,274][00354] Num frames 4100... +[2023-08-31 05:10:09,490][00354] Num frames 4200... +[2023-08-31 05:10:09,698][00354] Num frames 4300... +[2023-08-31 05:10:09,905][00354] Num frames 4400... +[2023-08-31 05:10:10,111][00354] Num frames 4500... +[2023-08-31 05:10:10,323][00354] Num frames 4600... +[2023-08-31 05:10:10,534][00354] Num frames 4700... +[2023-08-31 05:10:10,689][00354] Num frames 4800... +[2023-08-31 05:10:10,828][00354] Num frames 4900... +[2023-08-31 05:10:10,969][00354] Num frames 5000... +[2023-08-31 05:10:11,108][00354] Num frames 5100... +[2023-08-31 05:10:11,245][00354] Num frames 5200... +[2023-08-31 05:10:11,391][00354] Num frames 5300... +[2023-08-31 05:10:11,454][00354] Avg episode rewards: #0: 24.008, true rewards: #0: 10.608 +[2023-08-31 05:10:11,455][00354] Avg episode reward: 24.008, avg true_objective: 10.608 +[2023-08-31 05:10:11,597][00354] Num frames 5400... +[2023-08-31 05:10:11,732][00354] Num frames 5500... +[2023-08-31 05:10:11,875][00354] Num frames 5600... +[2023-08-31 05:10:12,019][00354] Num frames 5700... +[2023-08-31 05:10:12,155][00354] Num frames 5800... +[2023-08-31 05:10:12,244][00354] Avg episode rewards: #0: 21.705, true rewards: #0: 9.705 +[2023-08-31 05:10:12,246][00354] Avg episode reward: 21.705, avg true_objective: 9.705 +[2023-08-31 05:10:12,360][00354] Num frames 5900... +[2023-08-31 05:10:12,515][00354] Num frames 6000... +[2023-08-31 05:10:12,657][00354] Num frames 6100... +[2023-08-31 05:10:12,795][00354] Num frames 6200... +[2023-08-31 05:10:12,941][00354] Num frames 6300... +[2023-08-31 05:10:13,079][00354] Num frames 6400... +[2023-08-31 05:10:13,220][00354] Num frames 6500... +[2023-08-31 05:10:13,362][00354] Num frames 6600... +[2023-08-31 05:10:13,511][00354] Num frames 6700... +[2023-08-31 05:10:13,584][00354] Avg episode rewards: #0: 21.587, true rewards: #0: 9.587 +[2023-08-31 05:10:13,587][00354] Avg episode reward: 21.587, avg true_objective: 9.587 +[2023-08-31 05:10:13,721][00354] Num frames 6800... +[2023-08-31 05:10:13,862][00354] Num frames 6900... +[2023-08-31 05:10:13,999][00354] Num frames 7000... +[2023-08-31 05:10:14,143][00354] Num frames 7100... +[2023-08-31 05:10:14,279][00354] Num frames 7200... +[2023-08-31 05:10:14,448][00354] Num frames 7300... +[2023-08-31 05:10:14,611][00354] Num frames 7400... +[2023-08-31 05:10:14,747][00354] Num frames 7500... +[2023-08-31 05:10:14,884][00354] Num frames 7600... +[2023-08-31 05:10:15,023][00354] Num frames 7700... +[2023-08-31 05:10:15,159][00354] Num frames 7800... +[2023-08-31 05:10:15,343][00354] Avg episode rewards: #0: 22.369, true rewards: #0: 9.869 +[2023-08-31 05:10:15,345][00354] Avg episode reward: 22.369, avg true_objective: 9.869 +[2023-08-31 05:10:15,358][00354] Num frames 7900... +[2023-08-31 05:10:15,508][00354] Num frames 8000... +[2023-08-31 05:10:15,646][00354] Num frames 8100... +[2023-08-31 05:10:15,780][00354] Num frames 8200... +[2023-08-31 05:10:15,925][00354] Num frames 8300... +[2023-08-31 05:10:16,061][00354] Num frames 8400... +[2023-08-31 05:10:16,196][00354] Num frames 8500... +[2023-08-31 05:10:16,341][00354] Num frames 8600... +[2023-08-31 05:10:16,485][00354] Num frames 8700... +[2023-08-31 05:10:16,624][00354] Num frames 8800... +[2023-08-31 05:10:16,763][00354] Num frames 8900... +[2023-08-31 05:10:16,913][00354] Num frames 9000... +[2023-08-31 05:10:17,053][00354] Num frames 9100... +[2023-08-31 05:10:17,202][00354] Num frames 9200... +[2023-08-31 05:10:17,344][00354] Num frames 9300... +[2023-08-31 05:10:17,487][00354] Num frames 9400... +[2023-08-31 05:10:17,641][00354] Num frames 9500... +[2023-08-31 05:10:17,783][00354] Num frames 9600... +[2023-08-31 05:10:17,924][00354] Num frames 9700... +[2023-08-31 05:10:18,054][00354] Avg episode rewards: #0: 25.726, true rewards: #0: 10.837 +[2023-08-31 05:10:18,056][00354] Avg episode reward: 25.726, avg true_objective: 10.837 +[2023-08-31 05:10:18,125][00354] Num frames 9800... +[2023-08-31 05:10:18,262][00354] Num frames 9900... +[2023-08-31 05:10:18,406][00354] Num frames 10000... +[2023-08-31 05:10:18,559][00354] Num frames 10100... +[2023-08-31 05:10:18,699][00354] Num frames 10200... +[2023-08-31 05:10:18,837][00354] Num frames 10300... +[2023-08-31 05:10:18,987][00354] Num frames 10400... +[2023-08-31 05:10:19,139][00354] Num frames 10500... +[2023-08-31 05:10:19,276][00354] Avg episode rewards: #0: 24.953, true rewards: #0: 10.553 +[2023-08-31 05:10:19,278][00354] Avg episode reward: 24.953, avg true_objective: 10.553 +[2023-08-31 05:11:33,142][00354] Replay video saved to /content/train_dir/default_experiment/replay.mp4!