26 33 76

Akarshan Biswas

qnixsynapse

qnixsynapse

AI & ML interests

NLP, models, quantization

Recent Activity

new activity 4 days ago

google/gemma-2-9b-it:Tool calling support in Gemma 2

liked a Space 4 days ago

webml-community/attention-visualization

reacted to suayptalha's post with 👀 7 days ago

🚀 Introducing 𝐅𝐢𝐫𝐬𝐭 𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐦𝐢𝐧𝐆𝐑𝐔 𝐌𝐨𝐝𝐞𝐥𝐬 from the paper 𝐖𝐞𝐫𝐞 𝐑𝐍𝐍𝐬 𝐀𝐥𝐥 𝐖𝐞 𝐍𝐞𝐞𝐝𝐞𝐝? 🖥 I have integrated 𝐧𝐞𝐱𝐭-𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐑𝐍𝐍𝐬, specifically minGRU, which offer faster performance compared to Transformer architectures, into HuggingFace. This allows users to leverage the lighter and more efficient minGRU models with the "𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬" 𝐥𝐢𝐛𝐫𝐚𝐫𝐲 for both usage and training. 💻 I integrated two main tasks: 𝐌𝐢𝐧𝐆𝐑𝐔𝐅𝐨𝐫𝐒𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 and 𝐌𝐢𝐧𝐆𝐑𝐔𝐅𝐨𝐫𝐂𝐚𝐮𝐬𝐚𝐥𝐋𝐌. 𝐌𝐢𝐧𝐆𝐑𝐔𝐅𝐨𝐫𝐒𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧: You can use this class for 𝐒𝐞𝐪𝐮𝐞𝐧𝐜𝐞 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 tasks. I also trained a Sentiment Analysis model with stanfordnlp/imdb dataset. 𝐌𝐢𝐧𝐆𝐑𝐔𝐅𝐨𝐫𝐂𝐚𝐮𝐬𝐚𝐥𝐋𝐌: You can use this class for 𝐂𝐚𝐮𝐬𝐚𝐥 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥 tasks such as GPT, Llama. I also trained an example model with roneneldan/TinyStories dataset. You can fine-tune and use it! 🔗 𝐋𝐢𝐧𝐤𝐬: Models: https://huggingface.co/collections/suayptalha/mingru-676fe8d90760d01b7955d7ab GitHub: https://github.com/suayptalha/minGRU-hf LinkedIn Post: https://www.linkedin.com/posts/suayp-talha-kocabay_mingru-a-suayptalha-collection-activity-7278755484172439552-wNY1 📰 𝐂𝐫𝐞𝐝𝐢𝐭𝐬: Paper Link: https://arxiv.org/abs/2410.01201 I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.

View all activity

Organizations

None yet

qnixsynapse's activity

New activity in google/gemma-2-9b-it 4 days ago

Tool calling support in Gemma 2

#50 opened 27 days ago by

qnixsynapse

New activity in bartowski/OLMoE-1B-7B-0924-Instruct-GGUF 4 months ago

Is this really an Instruct model?

#1 opened 4 months ago by

qnixsynapse

New activity in huggingchat/chat-ui 5 months ago

[MODELS] Discussion

546

#372 opened 11 months ago by

victor

[TOOLS] Community Discussion

#455 opened 8 months ago by

victor

New activity in meta-llama/Llama-3.1-8B-Instruct 5 months ago

Wrong number of tensors; expected 292, got 291

#69 opened 5 months ago by

KingBadger

New activity in huggingchat/chat-ui 5 months ago

[FEATURE] Tools

#470 opened 7 months ago by

victor

New activity in meta-llama/Llama-3.1-8B-Instruct 6 months ago

Utterly based

#9 opened 6 months ago by

llama-anon

New activity in ggml-org/gguf-my-repo 6 months ago

Add IQ Quantization support with the help of imatrix and GPUs

#35 opened 9 months ago by

qnixsynapse

New activity in huggingchat/chat-ui 6 months ago

Suggestion: Host Gemma2 using keras_nlp instead of transformers library for the time being

#498 opened 6 months ago by

qnixsynapse

New activity in meta-llama/Meta-Llama-3-8B-Instruct 9 months ago

The best 8B in the planet right now. PERIOD!

#22 opened 9 months ago by

cyberneticos

New activity in mistral-community/Mixtral-8x22B-v0.1 9 months ago

How many active parameters does this model have?

#6 opened 9 months ago by

lewtun

New activity in google/gemma-7b 9 months ago

7B or 8B?

#24 opened 11 months ago by

amgadhasan

New activity in huggingchat/chat-ui 9 months ago

Which model is responsible for naming of the thread?

#402 opened 9 months ago by

qnixsynapse

New activity in google/gemma-1.1-7b-it 9 months ago

Consider adding <start_of_context> and <stop_of_context> or similar special tokens for context ingestion.

#13 opened 9 months ago by

qnixsynapse

Number of parameters

#9 opened 9 months ago by

HugoLaurencon

New activity in TheBloke/Llama-2-7B-Chat-GGUF 11 months ago

RMSNorm eps value is wrong

#20 opened 11 months ago by

qnixsynapse

RMSNorm eps value is wrong

#19 opened 11 months ago by

qnixsynapse

New activity in TheBloke/llama2_70b_chat_uncensored-GGML over 1 year ago

Loading the model

#3 opened over 1 year ago by

PyrroAiakid

New activity in TheBloke/Llama-2-70B-Chat-GGML over 1 year ago

Looking for GGUF format for this model

#14 opened over 1 year ago by

barha

New activity in TheBloke/Llama-2-13B-chat-GGML over 1 year ago

Help needed to load model

#13 opened over 1 year ago by

sanjay-dev-ds-28