|
Training learned + default: 5%|████▎ | 500/10000 [20:16<5:34:40, 2.11s/it, loss=5.1522, lr=5.98e-04, mfu=9.54%, time_per_iter_ms=2115.61ms] |
|
|
|
Step 100: |
|
Train loss: 6.8032, Val loss: 6.7993 |
|
wikitext-103-v1 - Train loss: 7.9265, Val loss: 7.9253 |
|
ptb - Train loss: 7.8234, Val loss: 7.8409 |
|
lambada - Train loss: 6.6363, Val loss: 6.6338 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 200: |
|
Train loss: 5.9722, Val loss: 5.9627 |
|
wikitext-103-v1 - Train loss: 7.2738, Val loss: 7.2757 |
|
ptb - Train loss: 7.6326, Val loss: 7.6526 |
|
lambada - Train loss: 5.7320, Val loss: 5.7384 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 300: |
|
Train loss: 5.5839, Val loss: 5.5772 |
|
wikitext-103-v1 - Train loss: 6.9805, Val loss: 6.9747 |
|
ptb - Train loss: 7.3015, Val loss: 7.3389 |
|
lambada - Train loss: 5.4309, Val loss: 5.4437 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 400: |
|
Train loss: 5.3047, Val loss: 5.3081 |
|
wikitext-103-v1 - Train loss: 6.7798, Val loss: 6.7778 |
|
ptb - Train loss: 7.1573, Val loss: 7.2023 |
|
lambada - Train loss: 5.2620, Val loss: 5.2760 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 500: |
|
Train loss: 5.1114, Val loss: 5.1103 |
|
wikitext-103-v1 - Train loss: 6.5907, Val loss: 6.5975 |
|
ptb - Train loss: 7.0237, Val loss: 7.0712 |
|
lambada - Train loss: 5.1399, Val loss: 5.1552 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 600: |
|
Train loss: 4.9545, Val loss: 4.9576 |
|
wikitext-103-v1 - Train loss: 6.4569, Val loss: 6.4556 |
|
ptb - Train loss: 6.8527, Val loss: 6.9085 |
|
lambada - Train loss: 5.0597, Val loss: 5.0714 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 700: |
|
Train loss: 4.8290, Val loss: 4.8339 |
|
wikitext-103-v1 - Train loss: 6.2913, Val loss: 6.2909 |
|
ptb - Train loss: 6.7313, Val loss: 6.7954 |
|
lambada - Train loss: 4.9799, Val loss: 4.9836 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 800: |
|
Train loss: 4.7139, Val loss: 4.7239 |
|
wikitext-103-v1 - Train loss: 6.0997, Val loss: 6.0978 |
|
ptb - Train loss: 6.5667, Val loss: 6.6242 |
|
lambada - Train loss: 4.9063, Val loss: 4.9139 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 900: |
|
Train loss: 4.6221, Val loss: 4.6155 |
|
wikitext-103-v1 - Train loss: 5.9550, Val loss: 5.9595 |
|
ptb - Train loss: 6.2530, Val loss: 6.3241 |
|
lambada - Train loss: 4.8532, Val loss: 4.8637 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1000: |
|
Train loss: 4.5327, Val loss: 4.5270 |
|
wikitext-103-v1 - Train loss: 5.7643, Val loss: 5.7591 |
|
ptb - Train loss: 5.9274, Val loss: 6.0056 |
|
lambada - Train loss: 4.8230, Val loss: 4.8276 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1100: |
|
Train loss: 4.4717, Val loss: 4.4726 |
|
wikitext-103-v1 - Train loss: 5.7093, Val loss: 5.6960 |
|
ptb - Train loss: 5.7632, Val loss: 5.8636 |
|
lambada - Train loss: 4.7889, Val loss: 4.7901 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1200: |
|
Train loss: 4.4203, Val loss: 4.4154 |
|
wikitext-103-v1 - Train loss: 5.5998, Val loss: 5.5942 |
|
ptb - Train loss: 5.6131, Val loss: 5.7138 |
|
lambada - Train loss: 4.7443, Val loss: 4.7461 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1300: |
|
Train loss: 4.3688, Val loss: 4.3721 |
|
wikitext-103-v1 - Train loss: 5.5592, Val loss: 5.5410 |
|
ptb - Train loss: 5.5749, Val loss: 5.6745 |
|
lambada - Train loss: 4.7367, Val loss: 4.7359 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1400: |
|
Train loss: 4.3226, Val loss: 4.3247 |
|
wikitext-103-v1 - Train loss: 5.4883, Val loss: 5.4766 |
|
ptb - Train loss: 5.4738, Val loss: 5.5849 |
|
lambada - Train loss: 4.7088, Val loss: 4.7114 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1500: |
|
Train loss: 4.3097, Val loss: 4.3124 |
|
wikitext-103-v1 - Train loss: 5.4808, Val loss: 5.4717 |
|
ptb - Train loss: 5.4547, Val loss: 5.5645 |
|
lambada - Train loss: 4.6809, Val loss: 4.6836 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1600: |
|
Train loss: 4.2790, Val loss: 4.2761 |
|
wikitext-103-v1 - Train loss: 5.4472, Val loss: 5.4361 |
|
ptb - Train loss: 5.4254, Val loss: 5.5293 |
|
lambada - Train loss: 4.6738, Val loss: 4.6735 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1700: |
|
Train loss: 4.2513, Val loss: 4.2528 |
|
wikitext-103-v1 - Train loss: 5.4258, Val loss: 5.4111 |
|
ptb - Train loss: 5.3701, Val loss: 5.4803 |
|
lambada - Train loss: 4.6548, Val loss: 4.6593 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1800: |
|
Train loss: 4.2352, Val loss: 4.2316 |
|
wikitext-103-v1 - Train loss: 5.4078, Val loss: 5.3891 |
|
ptb - Train loss: 5.3572, Val loss: 5.4603 |
|
lambada - Train loss: 4.6556, Val loss: 4.6565 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 1900: |
|
Train loss: 4.2195, Val loss: 4.2252 |
|
wikitext-103-v1 - Train loss: 5.3737, Val loss: 5.3626 |
|
ptb - Train loss: 5.3345, Val loss: 5.4339 |
|
lambada - Train loss: 4.6529, Val loss: 4.6563 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2000: |
|
Train loss: 4.2012, Val loss: 4.2005 |
|
wikitext-103-v1 - Train loss: 5.3446, Val loss: 5.3297 |
|
ptb - Train loss: 5.3035, Val loss: 5.4030 |
|
lambada - Train loss: 4.6363, Val loss: 4.6326 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2100: |
|
Train loss: 4.1822, Val loss: 4.1835 |
|
wikitext-103-v1 - Train loss: 5.3210, Val loss: 5.3054 |
|
ptb - Train loss: 5.2642, Val loss: 5.3595 |
|
lambada - Train loss: 4.6256, Val loss: 4.6323 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2200: |
|
Train loss: 4.1684, Val loss: 4.1703 |
|
wikitext-103-v1 - Train loss: 5.3057, Val loss: 5.2861 |
|
ptb - Train loss: 5.2387, Val loss: 5.3464 |
|
lambada - Train loss: 4.6153, Val loss: 4.6231 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2300: |
|
Train loss: 4.1523, Val loss: 4.1526 |
|
wikitext-103-v1 - Train loss: 5.2923, Val loss: 5.2744 |
|
ptb - Train loss: 5.2167, Val loss: 5.3225 |
|
lambada - Train loss: 4.6124, Val loss: 4.6114 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2400: |
|
Train loss: 4.1516, Val loss: 4.1510 |
|
wikitext-103-v1 - Train loss: 5.2770, Val loss: 5.2597 |
|
ptb - Train loss: 5.1981, Val loss: 5.2995 |
|
lambada - Train loss: 4.6100, Val loss: 4.6138 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2500: |
|
Train loss: 4.1383, Val loss: 4.1276 |
|
wikitext-103-v1 - Train loss: 5.2818, Val loss: 5.2594 |
|
ptb - Train loss: 5.2314, Val loss: 5.3417 |
|
lambada - Train loss: 4.5945, Val loss: 4.5930 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2600: |
|
Train loss: 4.1288, Val loss: 4.1315 |
|
wikitext-103-v1 - Train loss: 5.2726, Val loss: 5.2537 |
|
ptb - Train loss: 5.1872, Val loss: 5.2890 |
|
lambada - Train loss: 4.5969, Val loss: 4.5930 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2700: |
|
Train loss: 4.1171, Val loss: 4.1135 |
|
wikitext-103-v1 - Train loss: 5.2677, Val loss: 5.2534 |
|
ptb - Train loss: 5.1779, Val loss: 5.2767 |
|
lambada - Train loss: 4.5791, Val loss: 4.5822 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2800: |
|
Train loss: 4.1073, Val loss: 4.1115 |
|
wikitext-103-v1 - Train loss: 5.2372, Val loss: 5.2228 |
|
ptb - Train loss: 5.1593, Val loss: 5.2606 |
|
lambada - Train loss: 4.5610, Val loss: 4.5603 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 2900: |
|
Train loss: 4.1054, Val loss: 4.1016 |
|
wikitext-103-v1 - Train loss: 5.2383, Val loss: 5.2185 |
|
ptb - Train loss: 5.1539, Val loss: 5.2590 |
|
lambada - Train loss: 4.5615, Val loss: 4.5572 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3000: |
|
Train loss: 4.0893, Val loss: 4.0916 |
|
wikitext-103-v1 - Train loss: 5.2253, Val loss: 5.2072 |
|
ptb - Train loss: 5.1492, Val loss: 5.2536 |
|
lambada - Train loss: 4.5675, Val loss: 4.5651 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3100: |
|
Train loss: 4.0689, Val loss: 4.0765 |
|
wikitext-103-v1 - Train loss: 5.2255, Val loss: 5.2005 |
|
ptb - Train loss: 5.1293, Val loss: 5.2378 |
|
lambada - Train loss: 4.5673, Val loss: 4.5667 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3200: |
|
Train loss: 4.0737, Val loss: 4.0775 |
|
wikitext-103-v1 - Train loss: 5.1956, Val loss: 5.1803 |
|
ptb - Train loss: 5.1232, Val loss: 5.2260 |
|
lambada - Train loss: 4.5593, Val loss: 4.5600 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3300: |
|
Train loss: 4.0583, Val loss: 4.0664 |
|
wikitext-103-v1 - Train loss: 5.2007, Val loss: 5.1825 |
|
ptb - Train loss: 5.1177, Val loss: 5.2178 |
|
lambada - Train loss: 4.5612, Val loss: 4.5585 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3400: |
|
Train loss: 4.0600, Val loss: 4.0617 |
|
wikitext-103-v1 - Train loss: 5.2096, Val loss: 5.1810 |
|
ptb - Train loss: 5.1161, Val loss: 5.2188 |
|
lambada - Train loss: 4.5469, Val loss: 4.5447 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3500: |
|
Train loss: 4.0536, Val loss: 4.0544 |
|
wikitext-103-v1 - Train loss: 5.2071, Val loss: 5.1921 |
|
ptb - Train loss: 5.1391, Val loss: 5.2390 |
|
lambada - Train loss: 4.5528, Val loss: 4.5518 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3600: |
|
Train loss: 4.0476, Val loss: 4.0560 |
|
wikitext-103-v1 - Train loss: 5.1899, Val loss: 5.1726 |
|
ptb - Train loss: 5.1008, Val loss: 5.2113 |
|
lambada - Train loss: 4.5439, Val loss: 4.5402 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3700: |
|
Train loss: 4.0375, Val loss: 4.0389 |
|
wikitext-103-v1 - Train loss: 5.1512, Val loss: 5.1296 |
|
ptb - Train loss: 5.0778, Val loss: 5.1806 |
|
lambada - Train loss: 4.5436, Val loss: 4.5419 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3800: |
|
Train loss: 4.0322, Val loss: 4.0433 |
|
wikitext-103-v1 - Train loss: 5.1781, Val loss: 5.1482 |
|
ptb - Train loss: 5.0573, Val loss: 5.1706 |
|
lambada - Train loss: 4.5353, Val loss: 4.5294 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 3900: |
|
Train loss: 4.0340, Val loss: 4.0310 |
|
wikitext-103-v1 - Train loss: 5.1386, Val loss: 5.1170 |
|
ptb - Train loss: 5.0528, Val loss: 5.1560 |
|
lambada - Train loss: 4.5376, Val loss: 4.5344 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4000: |
|
Train loss: 4.0235, Val loss: 4.0245 |
|
wikitext-103-v1 - Train loss: 5.1477, Val loss: 5.1248 |
|
ptb - Train loss: 5.0713, Val loss: 5.1788 |
|
lambada - Train loss: 4.5250, Val loss: 4.5239 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4100: |
|
Train loss: 4.0182, Val loss: 4.0258 |
|
wikitext-103-v1 - Train loss: 5.1462, Val loss: 5.1180 |
|
ptb - Train loss: 5.0588, Val loss: 5.1628 |
|
lambada - Train loss: 4.5343, Val loss: 4.5286 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4200: |
|
Train loss: 4.0123, Val loss: 4.0196 |
|
wikitext-103-v1 - Train loss: 5.1272, Val loss: 5.1129 |
|
ptb - Train loss: 5.0320, Val loss: 5.1364 |
|
lambada - Train loss: 4.5119, Val loss: 4.5056 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4300: |
|
Train loss: 4.0150, Val loss: 4.0128 |
|
wikitext-103-v1 - Train loss: 5.1390, Val loss: 5.1290 |
|
ptb - Train loss: 5.0476, Val loss: 5.1519 |
|
lambada - Train loss: 4.5109, Val loss: 4.5126 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4400: |
|
Train loss: 4.0048, Val loss: 4.0103 |
|
wikitext-103-v1 - Train loss: 5.1374, Val loss: 5.1172 |
|
ptb - Train loss: 5.0301, Val loss: 5.1310 |
|
lambada - Train loss: 4.5117, Val loss: 4.5070 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4500: |
|
Train loss: 3.9993, Val loss: 4.0021 |
|
wikitext-103-v1 - Train loss: 5.1364, Val loss: 5.1178 |
|
ptb - Train loss: 5.0345, Val loss: 5.1396 |
|
lambada - Train loss: 4.5214, Val loss: 4.5189 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4600: |
|
Train loss: 3.9977, Val loss: 4.0002 |
|
wikitext-103-v1 - Train loss: 5.1226, Val loss: 5.1113 |
|
ptb - Train loss: 5.0425, Val loss: 5.1453 |
|
lambada - Train loss: 4.5012, Val loss: 4.5054 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4700: |
|
Train loss: 3.9946, Val loss: 3.9956 |
|
wikitext-103-v1 - Train loss: 5.1216, Val loss: 5.0983 |
|
ptb - Train loss: 5.0082, Val loss: 5.1103 |
|
lambada - Train loss: 4.5028, Val loss: 4.4988 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4800: |
|
Train loss: 3.9908, Val loss: 3.9915 |
|
wikitext-103-v1 - Train loss: 5.1079, Val loss: 5.0938 |
|
ptb - Train loss: 5.0188, Val loss: 5.1317 |
|
lambada - Train loss: 4.5031, Val loss: 4.5009 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 4900: |
|
Train loss: 3.9890, Val loss: 3.9908 |
|
wikitext-103-v1 - Train loss: 5.0976, Val loss: 5.0793 |
|
ptb - Train loss: 5.0167, Val loss: 5.1213 |
|
lambada - Train loss: 4.4953, Val loss: 4.4882 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5000: |
|
Train loss: 3.9935, Val loss: 3.9828 |
|
wikitext-103-v1 - Train loss: 5.0895, Val loss: 5.0715 |
|
ptb - Train loss: 5.0054, Val loss: 5.1086 |
|
lambada - Train loss: 4.4902, Val loss: 4.4907 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5100: |
|
Train loss: 3.9891, Val loss: 3.9839 |
|
wikitext-103-v1 - Train loss: 5.1023, Val loss: 5.0816 |
|
ptb - Train loss: 5.0059, Val loss: 5.1125 |
|
lambada - Train loss: 4.5029, Val loss: 4.5041 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5200: |
|
Train loss: 3.9840, Val loss: 3.9829 |
|
wikitext-103-v1 - Train loss: 5.0980, Val loss: 5.0919 |
|
ptb - Train loss: 5.0296, Val loss: 5.1328 |
|
lambada - Train loss: 4.5027, Val loss: 4.5026 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5300: |
|
Train loss: 3.9795, Val loss: 3.9726 |
|
wikitext-103-v1 - Train loss: 5.1008, Val loss: 5.0863 |
|
ptb - Train loss: 4.9883, Val loss: 5.0865 |
|
lambada - Train loss: 4.4961, Val loss: 4.4934 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5400: |
|
Train loss: 3.9786, Val loss: 3.9782 |
|
wikitext-103-v1 - Train loss: 5.0820, Val loss: 5.0668 |
|
ptb - Train loss: 5.0117, Val loss: 5.1150 |
|
lambada - Train loss: 4.4934, Val loss: 4.4912 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5500: |
|
Train loss: 3.9630, Val loss: 3.9652 |
|
wikitext-103-v1 - Train loss: 5.0868, Val loss: 5.0748 |
|
ptb - Train loss: 4.9915, Val loss: 5.0996 |
|
lambada - Train loss: 4.4840, Val loss: 4.4843 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5600: |
|
Train loss: 3.9606, Val loss: 3.9665 |
|
wikitext-103-v1 - Train loss: 5.0826, Val loss: 5.0593 |
|
ptb - Train loss: 4.9854, Val loss: 5.0888 |
|
lambada - Train loss: 4.4885, Val loss: 4.4871 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5700: |
|
Train loss: 3.9574, Val loss: 3.9657 |
|
wikitext-103-v1 - Train loss: 5.0830, Val loss: 5.0589 |
|
ptb - Train loss: 4.9841, Val loss: 5.0851 |
|
lambada - Train loss: 4.4843, Val loss: 4.4842 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5800: |
|
Train loss: 3.9602, Val loss: 3.9628 |
|
wikitext-103-v1 - Train loss: 5.0794, Val loss: 5.0630 |
|
ptb - Train loss: 4.9863, Val loss: 5.0912 |
|
lambada - Train loss: 4.4771, Val loss: 4.4706 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 5900: |
|
Train loss: 3.9600, Val loss: 3.9594 |
|
wikitext-103-v1 - Train loss: 5.0722, Val loss: 5.0742 |
|
ptb - Train loss: 4.9944, Val loss: 5.0966 |
|
lambada - Train loss: 4.4791, Val loss: 4.4735 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6000: |
|
Train loss: 3.9588, Val loss: 3.9564 |
|
wikitext-103-v1 - Train loss: 5.0610, Val loss: 5.0431 |
|
ptb - Train loss: 4.9591, Val loss: 5.0625 |
|
lambada - Train loss: 4.4814, Val loss: 4.4781 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6100: |
|
Train loss: 3.9546, Val loss: 3.9566 |
|
wikitext-103-v1 - Train loss: 5.0793, Val loss: 5.0536 |
|
ptb - Train loss: 4.9757, Val loss: 5.0808 |
|
lambada - Train loss: 4.4890, Val loss: 4.4869 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6200: |
|
Train loss: 3.9561, Val loss: 3.9541 |
|
wikitext-103-v1 - Train loss: 5.0793, Val loss: 5.0652 |
|
ptb - Train loss: 4.9814, Val loss: 5.0856 |
|
lambada - Train loss: 4.4658, Val loss: 4.4646 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6300: |
|
Train loss: 3.9403, Val loss: 3.9502 |
|
wikitext-103-v1 - Train loss: 5.0766, Val loss: 5.0515 |
|
ptb - Train loss: 4.9730, Val loss: 5.0778 |
|
lambada - Train loss: 4.4707, Val loss: 4.4703 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6400: |
|
Train loss: 3.9567, Val loss: 3.9471 |
|
wikitext-103-v1 - Train loss: 5.0741, Val loss: 5.0497 |
|
ptb - Train loss: 4.9640, Val loss: 5.0730 |
|
lambada - Train loss: 4.4720, Val loss: 4.4707 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6500: |
|
Train loss: 3.9452, Val loss: 3.9472 |
|
wikitext-103-v1 - Train loss: 5.0761, Val loss: 5.0524 |
|
ptb - Train loss: 4.9755, Val loss: 5.0792 |
|
lambada - Train loss: 4.4621, Val loss: 4.4622 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6600: |
|
Train loss: 3.9384, Val loss: 3.9474 |
|
wikitext-103-v1 - Train loss: 5.0562, Val loss: 5.0416 |
|
ptb - Train loss: 4.9626, Val loss: 5.0699 |
|
lambada - Train loss: 4.4697, Val loss: 4.4686 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6700: |
|
Train loss: 3.9335, Val loss: 3.9443 |
|
wikitext-103-v1 - Train loss: 5.0542, Val loss: 5.0332 |
|
ptb - Train loss: 4.9656, Val loss: 5.0712 |
|
lambada - Train loss: 4.4697, Val loss: 4.4654 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6800: |
|
Train loss: 3.9380, Val loss: 3.9330 |
|
wikitext-103-v1 - Train loss: 5.0613, Val loss: 5.0359 |
|
ptb - Train loss: 4.9592, Val loss: 5.0638 |
|
lambada - Train loss: 4.4643, Val loss: 4.4640 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 6900: |
|
Train loss: 3.9411, Val loss: 3.9336 |
|
wikitext-103-v1 - Train loss: 5.0485, Val loss: 5.0398 |
|
ptb - Train loss: 4.9616, Val loss: 5.0670 |
|
lambada - Train loss: 4.4633, Val loss: 4.4630 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7000: |
|
Train loss: 3.9413, Val loss: 3.9381 |
|
wikitext-103-v1 - Train loss: 5.0501, Val loss: 5.0340 |
|
ptb - Train loss: 4.9666, Val loss: 5.0762 |
|
lambada - Train loss: 4.4721, Val loss: 4.4664 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7100: |
|
Train loss: 3.9324, Val loss: 3.9473 |
|
wikitext-103-v1 - Train loss: 5.0561, Val loss: 5.0308 |
|
ptb - Train loss: 4.9581, Val loss: 5.0628 |
|
lambada - Train loss: 4.4575, Val loss: 4.4547 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7200: |
|
Train loss: 3.9319, Val loss: 3.9323 |
|
wikitext-103-v1 - Train loss: 5.0370, Val loss: 5.0313 |
|
ptb - Train loss: 4.9458, Val loss: 5.0583 |
|
lambada - Train loss: 4.4593, Val loss: 4.4594 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7300: |
|
Train loss: 3.9288, Val loss: 3.9323 |
|
wikitext-103-v1 - Train loss: 5.0520, Val loss: 5.0356 |
|
ptb - Train loss: 4.9514, Val loss: 5.0590 |
|
lambada - Train loss: 4.4611, Val loss: 4.4602 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7400: |
|
Train loss: 3.9255, Val loss: 3.9260 |
|
wikitext-103-v1 - Train loss: 5.0381, Val loss: 5.0242 |
|
ptb - Train loss: 4.9413, Val loss: 5.0469 |
|
lambada - Train loss: 4.4586, Val loss: 4.4515 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7500: |
|
Train loss: 3.9298, Val loss: 3.9267 |
|
wikitext-103-v1 - Train loss: 5.0532, Val loss: 5.0335 |
|
ptb - Train loss: 4.9406, Val loss: 5.0502 |
|
lambada - Train loss: 4.4540, Val loss: 4.4542 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7600: |
|
Train loss: 3.9303, Val loss: 3.9294 |
|
wikitext-103-v1 - Train loss: 5.0450, Val loss: 5.0285 |
|
ptb - Train loss: 4.9450, Val loss: 5.0523 |
|
lambada - Train loss: 4.4614, Val loss: 4.4600 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7700: |
|
Train loss: 3.9257, Val loss: 3.9274 |
|
wikitext-103-v1 - Train loss: 5.0441, Val loss: 5.0200 |
|
ptb - Train loss: 4.9426, Val loss: 5.0469 |
|
lambada - Train loss: 4.4567, Val loss: 4.4560 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7800: |
|
Train loss: 3.9220, Val loss: 3.9184 |
|
wikitext-103-v1 - Train loss: 5.0421, Val loss: 5.0188 |
|
ptb - Train loss: 4.9329, Val loss: 5.0434 |
|
lambada - Train loss: 4.4480, Val loss: 4.4500 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 7900: |
|
Train loss: 3.9191, Val loss: 3.9195 |
|
wikitext-103-v1 - Train loss: 5.0436, Val loss: 5.0185 |
|
ptb - Train loss: 4.9382, Val loss: 5.0394 |
|
lambada - Train loss: 4.4588, Val loss: 4.4558 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8000: |
|
Train loss: 3.9208, Val loss: 3.9182 |
|
wikitext-103-v1 - Train loss: 5.0295, Val loss: 5.0208 |
|
ptb - Train loss: 4.9420, Val loss: 5.0425 |
|
lambada - Train loss: 4.4501, Val loss: 4.4505 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8100: |
|
Train loss: 3.9196, Val loss: 3.9235 |
|
wikitext-103-v1 - Train loss: 5.0310, Val loss: 5.0172 |
|
ptb - Train loss: 4.9346, Val loss: 5.0509 |
|
lambada - Train loss: 4.4465, Val loss: 4.4457 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8200: |
|
Train loss: 3.9149, Val loss: 3.9175 |
|
wikitext-103-v1 - Train loss: 5.0358, Val loss: 5.0145 |
|
ptb - Train loss: 4.9406, Val loss: 5.0437 |
|
lambada - Train loss: 4.4499, Val loss: 4.4472 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8300: |
|
Train loss: 3.9136, Val loss: 3.9233 |
|
wikitext-103-v1 - Train loss: 5.0356, Val loss: 5.0177 |
|
ptb - Train loss: 4.9338, Val loss: 5.0402 |
|
lambada - Train loss: 4.4442, Val loss: 4.4447 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8400: |
|
Train loss: 3.9188, Val loss: 3.9203 |
|
wikitext-103-v1 - Train loss: 5.0187, Val loss: 5.0067 |
|
ptb - Train loss: 4.9345, Val loss: 5.0414 |
|
lambada - Train loss: 4.4451, Val loss: 4.4424 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8500: |
|
Train loss: 3.9075, Val loss: 3.9216 |
|
wikitext-103-v1 - Train loss: 5.0210, Val loss: 5.0066 |
|
ptb - Train loss: 4.9259, Val loss: 5.0462 |
|
lambada - Train loss: 4.4508, Val loss: 4.4502 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8600: |
|
Train loss: 3.9094, Val loss: 3.9186 |
|
wikitext-103-v1 - Train loss: 5.0330, Val loss: 5.0151 |
|
ptb - Train loss: 4.9362, Val loss: 5.0413 |
|
lambada - Train loss: 4.4492, Val loss: 4.4454 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8700: |
|
Train loss: 3.9076, Val loss: 3.9102 |
|
wikitext-103-v1 - Train loss: 5.0290, Val loss: 5.0129 |
|
ptb - Train loss: 4.9317, Val loss: 5.0469 |
|
lambada - Train loss: 4.4461, Val loss: 4.4453 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8800: |
|
Train loss: 3.9088, Val loss: 3.9108 |
|
wikitext-103-v1 - Train loss: 5.0324, Val loss: 5.0109 |
|
ptb - Train loss: 4.9341, Val loss: 5.0367 |
|
lambada - Train loss: 4.4499, Val loss: 4.4466 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 8900: |
|
Train loss: 3.9142, Val loss: 3.9239 |
|
wikitext-103-v1 - Train loss: 5.0218, Val loss: 5.0109 |
|
ptb - Train loss: 4.9332, Val loss: 5.0430 |
|
lambada - Train loss: 4.4453, Val loss: 4.4456 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9000: |
|
Train loss: 3.9141, Val loss: 3.9147 |
|
wikitext-103-v1 - Train loss: 5.0364, Val loss: 5.0022 |
|
ptb - Train loss: 4.9214, Val loss: 5.0312 |
|
lambada - Train loss: 4.4454, Val loss: 4.4421 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9100: |
|
Train loss: 3.9051, Val loss: 3.9097 |
|
wikitext-103-v1 - Train loss: 5.0231, Val loss: 5.0112 |
|
ptb - Train loss: 4.9258, Val loss: 5.0377 |
|
lambada - Train loss: 4.4417, Val loss: 4.4405 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9200: |
|
Train loss: 3.8955, Val loss: 3.9112 |
|
wikitext-103-v1 - Train loss: 5.0082, Val loss: 4.9994 |
|
ptb - Train loss: 4.9236, Val loss: 5.0343 |
|
lambada - Train loss: 4.4442, Val loss: 4.4422 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9300: |
|
Train loss: 3.9086, Val loss: 3.9036 |
|
wikitext-103-v1 - Train loss: 5.0220, Val loss: 5.0079 |
|
ptb - Train loss: 4.9266, Val loss: 5.0379 |
|
lambada - Train loss: 4.4495, Val loss: 4.4489 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9400: |
|
Train loss: 3.9064, Val loss: 3.9140 |
|
wikitext-103-v1 - Train loss: 5.0141, Val loss: 5.0081 |
|
ptb - Train loss: 4.9166, Val loss: 5.0236 |
|
lambada - Train loss: 4.4470, Val loss: 4.4427 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9500: |
|
Train loss: 3.9090, Val loss: 3.9005 |
|
wikitext-103-v1 - Train loss: 5.0244, Val loss: 5.0046 |
|
ptb - Train loss: 4.9206, Val loss: 5.0293 |
|
lambada - Train loss: 4.4422, Val loss: 4.4434 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9600: |
|
Train loss: 3.9021, Val loss: 3.9091 |
|
wikitext-103-v1 - Train loss: 5.0207, Val loss: 4.9965 |
|
ptb - Train loss: 4.9315, Val loss: 5.0431 |
|
lambada - Train loss: 4.4427, Val loss: 4.4371 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9700: |
|
Train loss: 3.9026, Val loss: 3.9066 |
|
wikitext-103-v1 - Train loss: 5.0207, Val loss: 5.0015 |
|
ptb - Train loss: 4.9328, Val loss: 5.0470 |
|
lambada - Train loss: 4.4405, Val loss: 4.4364 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9800: |
|
Train loss: 3.9001, Val loss: 3.9033 |
|
wikitext-103-v1 - Train loss: 5.0229, Val loss: 4.9957 |
|
ptb - Train loss: 4.9290, Val loss: 5.0358 |
|
lambada - Train loss: 4.4406, Val loss: 4.4387 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 9900: |
|
Train loss: 3.9109, Val loss: 3.9027 |
|
wikitext-103-v1 - Train loss: 5.0154, Val loss: 4.9953 |
|
ptb - Train loss: 4.9229, Val loss: 5.0341 |
|
lambada - Train loss: 4.4481, Val loss: 4.4403 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|
|
Step 10000: |
|
Train loss: 3.9049, Val loss: 3.8999 |
|
wikitext-103-v1 - Train loss: 5.0177, Val loss: 5.0028 |
|
ptb - Train loss: 4.9259, Val loss: 5.0335 |
|
lambada - Train loss: 4.4393, Val loss: 4.4372 |
|
Saving checkpoint to out/ckpt_learned_default.pt |
|
|