Stefano Fiorucci PRO

anakin87

AI & ML interests

Contributing to Haystack LLM framework šŸ—ļø. Language Models: orchestration, post-training, synthetic data...

Recent Activity

new activity about 4 hours ago
google/gemma-2-9b:Fine-tuning Hyperparameters
liked a dataset 1 day ago
mlabonne/orpo-dpo-mix-40k
replied to their post 2 days ago
ššžš° šˆš­ššš„š¢ššš§ š’š¦ššš„š„ š‹ššš§š š®ššš šž šŒšØššžš„š¬: š†šžš¦š¦šš ššžšØš šžš§šžš¬š¢š¬ šœšØš„š„šžšœš­š¢šØš§ šŸ’ŽšŸŒšŸ‡®šŸ‡¹ I am happy to release two new language models for the Italian Language! šŸ’Ŗ Gemma 2 9B Neogenesis ITA https://huggingface.co/anakin87/gemma-2-9b-neogenesis-ita Building on the impressive work by VAGO Solutions, I applied Direct Preference Optimization with a mix of Italian and English data. Using Spectrum, I trained 20% of model layers. šŸ“Š Evaluated on the Open ITA LLM leaderboard (https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), this model achieves strong performance. To beat it on this benchmark, you'd need a 27B model šŸ˜Ž šŸ¤ Gemma 2 2B Neogenesis ITA https://huggingface.co/anakin87/gemma-2-2b-neogenesis-ita This smaller variant is fine-tuned from the original Gemma 2 2B it by Google. Through a combination of Supervised Fine-Tuning and Direct Preference Optimization, I trained 25% of the layers using Spectrum. šŸ“ˆ Compared to the original model, it shows improved Italian proficiency, good for its small size. Both models were developed during the recent #gemma competition on Kaggle. šŸ““ Training code: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond šŸ™ Thanks @FinancialSupport and mii-llm for the help during evaluation.
View all activity

Articles

Organizations

deepset's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture Hugging Face Discord Community's profile picture

anakin87's activity

liked a Space about 1 month ago