Akarshan Biswas
qnixsynapse
AI & ML interests
NLP, models, quantization
Recent Activity
new activity
4 days ago
google/gemma-2-9b-it:Tool calling support in Gemma 2
liked
a Space
4 days ago
webml-community/attention-visualization
reacted
to
suayptalha's
post
with ๐
7 days ago
๐ Introducing ๐
๐ข๐ซ๐ฌ๐ญ ๐๐ฎ๐ ๐ ๐ข๐ง๐ ๐
๐๐๐ ๐๐ง๐ญ๐๐ ๐ซ๐๐ญ๐ข๐จ๐ง ๐จ๐ ๐ฆ๐ข๐ง๐๐๐ ๐๐จ๐๐๐ฅ๐ฌ from the paper ๐๐๐ซ๐ ๐๐๐๐ฌ ๐๐ฅ๐ฅ ๐๐ ๐๐๐๐๐๐?
๐ฅ I have integrated ๐ง๐๐ฑ๐ญ-๐ ๐๐ง๐๐ซ๐๐ญ๐ข๐จ๐ง ๐๐๐๐ฌ, specifically minGRU, which offer faster performance compared to Transformer architectures, into HuggingFace. This allows users to leverage the lighter and more efficient minGRU models with the "๐ญ๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ซ๐ฌ" ๐ฅ๐ข๐๐ซ๐๐ซ๐ฒ for both usage and training.
๐ป I integrated two main tasks: ๐๐ข๐ง๐๐๐๐
๐จ๐ซ๐๐๐ช๐ฎ๐๐ง๐๐๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง and ๐๐ข๐ง๐๐๐๐
๐จ๐ซ๐๐๐ฎ๐ฌ๐๐ฅ๐๐.
๐๐ข๐ง๐๐๐๐
๐จ๐ซ๐๐๐ช๐ฎ๐๐ง๐๐๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง:
You can use this class for ๐๐๐ช๐ฎ๐๐ง๐๐ ๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง tasks. I also trained a Sentiment Analysis model with stanfordnlp/imdb dataset.
๐๐ข๐ง๐๐๐๐
๐จ๐ซ๐๐๐ฎ๐ฌ๐๐ฅ๐๐:
You can use this class for ๐๐๐ฎ๐ฌ๐๐ฅ ๐๐๐ง๐ ๐ฎ๐๐ ๐ ๐๐จ๐๐๐ฅ tasks such as GPT, Llama. I also trained an example model with roneneldan/TinyStories dataset. You can fine-tune and use it!
๐ ๐๐ข๐ง๐ค๐ฌ:
Models: https://huggingface.co/collections/suayptalha/mingru-676fe8d90760d01b7955d7ab
GitHub: https://github.com/suayptalha/minGRU-hf
LinkedIn Post: https://www.linkedin.com/posts/suayp-talha-kocabay_mingru-a-suayptalha-collection-activity-7278755484172439552-wNY1
๐ฐ ๐๐ซ๐๐๐ข๐ญ๐ฌ:
Paper Link: https://arxiv.org/abs/2410.01201
I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.
Organizations
None yet
qnixsynapse's activity
Tool calling support in Gemma 2
2
#50 opened 27 days ago
by
qnixsynapse
Is this really an Instruct model?
#1 opened 4 months ago
by
qnixsynapse
[MODELS] Discussion
546
#372 opened 11 months ago
by
victor
[TOOLS] Community Discussion
27
#455 opened 8 months ago
by
victor
Wrong number of tensors; expected 292, got 291
6
#69 opened 5 months ago
by
KingBadger
[FEATURE] Tools
69
#470 opened 7 months ago
by
victor
Utterly based
1
#9 opened 6 months ago
by
llama-anon
Add IQ Quantization support with the help of imatrix and GPUs
8
#35 opened 9 months ago
by
qnixsynapse
Suggestion: Host Gemma2 using keras_nlp instead of transformers library for the time being
2
#498 opened 6 months ago
by
qnixsynapse
The best 8B in the planet right now. PERIOD!
2
#22 opened 9 months ago
by
cyberneticos
How many active parameters does this model have?
3
#6 opened 9 months ago
by
lewtun
7B or 8B?
4
#24 opened 11 months ago
by
amgadhasan
Which model is responsible for naming of the thread?
8
#402 opened 9 months ago
by
qnixsynapse
Consider adding <start_of_context> and <stop_of_context> or similar special tokens for context ingestion.
#13 opened 9 months ago
by
qnixsynapse
Number of parameters
7
#9 opened 9 months ago
by
HugoLaurencon
RMSNorm eps value is wrong
#20 opened 11 months ago
by
qnixsynapse
RMSNorm eps value is wrong
#19 opened 11 months ago
by
qnixsynapse
Loading the model
3
#3 opened over 1 year ago
by
PyrroAiakid
Looking for GGUF format for this model
1
#14 opened over 1 year ago
by
barha
Help needed to load model
19
#13 opened over 1 year ago
by
sanjay-dev-ds-28