An Expirement of the recent Titans Architecture Model

Note: Heavily Undertrained

Start from scratch with few fixes, hopefully better, this repo will be deleted once the new one is uploaded

Third Run(Coming soon with another 2k or 4k epochs of further training)

Goal: a loss below 2 or even below 1(but unrealistic with such low data quantity)

Second Run:

Training Details:

Parameters: 215.35 million parameters

Epochs: 2000 (crashed at 1500)

Dataset: HuggingFaceFW/fineweb-edu

  • subset: CC-MAIN-2016-26
  • split: train
  • total: 100%

Loss: 3.187483549118042

Code: titans-pytorch | version = "0.1.9"

Tokenizer: Gemma-2-9B

Infer test:

prompt:

researchers have found

model output:

in the 1900s and are still an important part of the world’s most popular work in a new book. They have the only real-day understanding of the United States and their children and the world, but the 15th century is also an early 900-year-old children.
The U.S. Department of Health and New York State University's (see the National Institute for Health), has been a major role in the history of the future. The American School of Research and the University of the National Science Foundation, published in 2007, and has been shown to be the first-based project, and the “K-ray.”
“I have used the best to have an estimated 3.3 million years in 2004, and it’s no one of the most important studies for the development of the public-related people in the 1990s.
In the 1970s, the role of the history of the community has been done. In 1972, the European Space Agency, and the first year 2013 research published in the 1919 issue of the World War and the 2005 U.S. in 1999, the U.S. Department of Medicine, in 1949, the public and the first three decades of students in 2010.
In the 1930s the U.S. National Laboratory and the 1920s, the first of 2008 to the National Association in 1984, the first of 1920-0000-2016-0005.
“If it is the last 1500 years, there were some of the most significant risk to the current power.”
The new study, which published the 18988 in the U.S. government, developed by 1962, when a series of men are not given.
The second of the most significant importance in the 1914, which had a higher than the 45 million people. The 2013 50, 1837, has been a highly long-term in 1623.
The researchers at the 1800-

First Run:

Training Details:

Parameters: 215.35 million parameters

Epochs: 1000

Dataset: HuggingFaceFW/fineweb-edu

  • subset: CC-MAIN-2016-26
  • split: train
  • total: 10%

Loss: 4.703743934631348

Code: titans-pytorch | version = "0.1.5"

Tokenizer: Gemma-2-9B

Infer test:

prompt:

Beer is

model output:

 a specific role in a result of 200,000 per 30.
The study is a part of a small study of a number of 8,000 per day.
The most important people and other 18 percent of the same-and-old water-day "in" and "The main cause of the next day of the 1100s.
The most of the other is to be a popular home that has the same 6th century. The idea is not the time it is also a few years. The first, a 650 square, which was the same time as an “15-3700s) that was the 114th century that was to be a few days.”
The most popular-term types of the National Academy of 2011.
The researchers began with a new study, or 51, in the 1984-1774. The study, in 2012, was a more than the 50,000.
In 2006, a new study published a 51,000 miles per 1949, and the 5013-502728 in 1015, when the 1951 is one of the two-3-10.5 miles for a few years, but the world's name has been a key for the time. The two, in 1947, in the 1969-2010, is not the only year in the same-term, the first half the first of the 1500-402,000 per 8162. The year, 35,000 people.
The American National Academy of 1869 is an early 3-800 million years of the 15, 1928, 400, 67, 2011. The 59644861
The 2000, 13:178
This study is an important 14-67-378. The first century was a number of the United States to be a similar-in-in-45.
The most popular
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Dataset used to train Lyte/Titans-MAC-test-bad-run-with-bug