Kaito Sugimoto

kaisugi

AI & ML interests

Japanese LLMs

Recent Activity

liked a model 1 day ago
deepseek-ai/DeepSeek-R1
liked a Space 2 days ago
JMMMU/JMMMU_Leaderboard
reacted to lianghsun's post with ๐Ÿ‘ 7 days ago
๐Ÿ–– Let me introduce the work I've done over the past three months: ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—• and ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•-๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜, now open-sourced on ๐Ÿค— Hugging Face. ๐—น๐—ถ๐—ฎ๐—ป๐—ด๐—ต๐˜€๐˜‚๐—ป/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•: This model is built on top of ๐—บ๐—ฒ๐˜๐—ฎ-๐—น๐—น๐—ฎ๐—บ๐—ฎ/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐Ÿฏ๐—• with continual pretraining. The training dataset consists of a mixture of Traditional Chinese and multilingual texts in specific proportions, including 20B tokens of Traditional Chinese text. ๐—น๐—ถ๐—ฎ๐—ป๐—ด๐—ต๐˜€๐˜‚๐—ป/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•-๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜: This is a fine-tuned conversational model based on the foundation model. This Llama-3.2-Taiwan open-source project is currently a one-person effort (yes, I did everything from text preparation โ€” so exhausting!). If you're interested, feel free to join the Discord server for discussions. ๐Ÿ…ฑ๐Ÿ…ด๐Ÿ…ฝ๐Ÿ…ฒ๐Ÿ…ท๐Ÿ…ผ๐Ÿ…ฐ๐Ÿ†๐Ÿ…บ๐Ÿ…ธ๐Ÿ…ฝ๐Ÿ…ถ The evaluation was conducted using https://huggingface.co/datasets/ikala/tmmluplus, though the README page does not yet reflect the latest results. The performance is close to the previous versions, indicating that further improvements might require adding more specialized knowledge in the datasets. ๐Ÿ…ฐ ๐Ÿ…ฒ๐Ÿ…ฐ๐Ÿ…ป๐Ÿ…ป ๐Ÿ…ต๐Ÿ…พ๐Ÿ† ๐Ÿ†‚๐Ÿ†„๐Ÿ…ฟ๐Ÿ…ฟ๐Ÿ…พ๐Ÿ†๐Ÿ†ƒ If anyone is willing to provide compute resources, it would be greatly appreciated to help this project continue and grow. ๐Ÿ’ช --- ๐Ÿ”๏ธ Foundation model: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B ๐Ÿค– Instruction model: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct โšก GGUF: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct-GGUF
View all activity

Organizations

Aizawa Laboratory at NII's profile picture Team Hatakeyama's profile picture Hugging Face Discord Community's profile picture

kaisugi's activity

reacted to lianghsun's post with ๐Ÿ‘ 7 days ago
view post
Post
1650
๐Ÿ–– Let me introduce the work I've done over the past three months: ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—• and ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•-๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜, now open-sourced on ๐Ÿค— Hugging Face.

๐—น๐—ถ๐—ฎ๐—ป๐—ด๐—ต๐˜€๐˜‚๐—ป/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•: This model is built on top of ๐—บ๐—ฒ๐˜๐—ฎ-๐—น๐—น๐—ฎ๐—บ๐—ฎ/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐Ÿฏ๐—• with continual pretraining. The training dataset consists of a mixture of Traditional Chinese and multilingual texts in specific proportions, including 20B tokens of Traditional Chinese text.

๐—น๐—ถ๐—ฎ๐—ป๐—ด๐—ต๐˜€๐˜‚๐—ป/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•-๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜: This is a fine-tuned conversational model based on the foundation model.

This Llama-3.2-Taiwan open-source project is currently a one-person effort (yes, I did everything from text preparation โ€” so exhausting!). If you're interested, feel free to join the Discord server for discussions.

๐Ÿ…ฑ๐Ÿ…ด๐Ÿ…ฝ๐Ÿ…ฒ๐Ÿ…ท๐Ÿ…ผ๐Ÿ…ฐ๐Ÿ†๐Ÿ…บ๐Ÿ…ธ๐Ÿ…ฝ๐Ÿ…ถ

The evaluation was conducted using ikala/tmmluplus, though the README page does not yet reflect the latest results. The performance is close to the previous versions, indicating that further improvements might require adding more specialized knowledge in the datasets.

๐Ÿ…ฐ ๐Ÿ…ฒ๐Ÿ…ฐ๐Ÿ…ป๐Ÿ…ป ๐Ÿ…ต๐Ÿ…พ๐Ÿ† ๐Ÿ†‚๐Ÿ†„๐Ÿ…ฟ๐Ÿ…ฟ๐Ÿ…พ๐Ÿ†๐Ÿ†ƒ

If anyone is willing to provide compute resources, it would be greatly appreciated to help this project continue and grow. ๐Ÿ’ช

---
๐Ÿ”๏ธ Foundation model: lianghsun/Llama-3.2-Taiwan-3B
๐Ÿค– Instruction model: lianghsun/Llama-3.2-Taiwan-3B-Instruct
โšก GGUF: lianghsun/Llama-3.2-Taiwan-3B-Instruct-GGUF
  • 4 replies
ยท
replied to AkimfromParis's post 2 months ago
reacted to AkimfromParis's post with ๐Ÿ‘ 2 months ago
view post
Post
1475
๐Ÿ‡ฏ๐Ÿ‡ต The Open Japanese LLM Leaderboard created by LLM-jp ๐ŸŒธ in partnership with HuggingFace ๐Ÿค— was released today!

Blog: https://huggingface.co/blog/leaderboard-japanese
Space: llm-jp/open-japanese-llm-leaderboard

๐ŸŒ The leaderboard is available in both Japanese and English
๐Ÿ“š Based on the evaluation tool, llm-jp-eval with more than 20 datasets for Japanese LLMs
๐Ÿ“Š The leaderboard showcases all the metrics for NLP experts, plus averages for NLP beginners
๐Ÿ’ป For the comfort of users, we chose a horizontal UI, and implemented it in a light and dark theme on Gradio
๐Ÿ”ฌ The radar chart provides a very interesting visualization of metrics!
๐ŸŒฑ We are using the Japanese research platform, MDX, so please be patient!
โšก LLMs bigger than +70B will be evaluated soonโ€ฆ

How do you say โ€œGPUs Go Brrrโ€ in Japanese - > GPUใŒใƒ–ใƒณใƒ–ใƒณ๏ฝž! (To pronounce "GPU ga bunbun!") ๐Ÿ”ฅ
  • 4 replies
ยท
posted an update 7 months ago
view post
Post
808
๐Ÿš€ Llama-3-ELYZA-JP-8B

ELYZA, Inc. has developed two large language models (LLMs) for Japanese called "Llama-3-ELYZA-JP-70B" with 70 billion parameters and "Llama-3-ELYZA-JP-8B" with 8 billion parameters, based on Meta's "Llama 3" series. These models have been fine-tuned through additional pre-training and post-training to improve Japanese language capabilities significantly.

Key Points:

Performance:
- Llama-3-ELYZA-JP-70B surpasses global models such as GPT-4, Claude 3 Sonnet, and Gemini 1.5 Flash.
- Llama-3-ELYZA-JP-8B matches models like GPT-3.5 Turbo and Claude 3 Haiku despite having fewer parameters.

Availability:
- The 8B model is available on Hugging Face Hub and can be used for both research and commercial purposes under the Llama 3 Community License.

Methodology:
- ELYZA enhanced the Japanese performance of the Llama 3 models through additional training with high-quality Japanese corpora and Instruction Tuning with proprietary datasets.

Benchmarks:
- Evaluations using ELYZA Tasks 100 and Japanese MT-Bench showed significant improvements in Japanese language generation.

Inference Speed:
- To address inference speed issues due to model size, ELYZA implemented Speculative Decoding, which achieved up to 1.6 times faster inference for the 70B model.

Overall, ELYZA's models demonstrate state-of-the-art performance in Japanese language tasks and are optimized for both efficiency and effectiveness.

Model URL:
- elyza/Llama-3-ELYZA-JP-8B
- elyza/Llama-3-ELYZA-JP-8B-AWQ
- elyza/Llama-3-ELYZA-JP-8B-GGUF

Blog post (in Japanese):
https://note.com/elyza/n/n360b6084fdbd
posted an update 7 months ago
view post
Post
687
๐Ÿš€ KARAKURI LM 8x7B Instruct v0.1

KARAKURI Inc. has publicly released "KARAKURI LM 8x7B Instruct v0.1", the first domestic Large Language Model (LLM) in Japan to support Function calling and Retrieval-Augmented Generation (RAG). This AI agent can handle tasks across various applications autonomously, significantly reducing implementation costs compared to traditional models.

Model Features:
- Capable of autonomously choosing optimal documents and databases for various tasks.
- Applied extensively in customer support for automating responses and processes, analyzing Voice of Customer (VoC), and predicting optimal outreach timings.

Model URL:
karakuri-ai/karakuri-lm-8x7b-instruct-v0.1

Detailed press release (in Japanese):
https://karakuri.ai/seminar/news/karakuri-lm-8x7b-instruct-v0-1/
posted an update 7 months ago
view post
Post
2271
๐Ÿš€ Sarashina1-65B

SB Intuitions has announced the release of Japanese Large Language Models (LLMs) with 7 billion, 13 billion, and 65 billion parameters to aid academic and industrial research and development. The company plans to develop a 390 billion parameter model by the end of 2024. The models, named Sarashina1 and Sarashina2, show significant performance improvements, especially Sarashina2 which is an enhanced version of Sarashina1.

Performance evaluations using five Japanese language datasets reveal that Sarashina2 outperforms other models, including continued pre-trained models. The name "Sarashina" originates from a historical diary linked to the headquarters' location in Tokyo's Takeshiba area, symbolizing the company's ambition to create globally utilized models from Japan.

Model URL:
- sbintuitions/sarashina1-65b
- sbintuitions/sarashina2-13b

Detailed press release (in Japanese):
https://www.sbintuitions.co.jp/news/press/20240614_01/
posted an update 7 months ago
view post
Post
870
๐Ÿš€ llava-calm2-siglip

CyberAgent Inc. has announced the public release of "llava-calm2-siglip," a 7.5 billion parameter Vision Language Model (VLM) for Japanese, available for commercial use. This model, trained primarily on a high-quality Japanese dataset, is accessible on Hugging Face Hub under an Apache-2.0 license. The advancement aims to improve Japanese language-specific VLMs, which are fewer compared to English-centric models.

Model URL:
cyberagent/llava-calm2-siglip

Demo URL:
cyberagent/llava-calm2-preview

Detailed press release (in Japanese): https://www.cyberagent.co.jp/news/detail/id=30344
reacted to leonardlin's post with ๐Ÿ‘ 8 months ago
replied to their post 8 months ago
view reply

That's a good point.
I'm not an employee of this company or working in the financial sector, but I do know that people involved have actively discussed in which case they should make use of LLMs. I guess LLMs won't replace humans' decision-making processes, but rather augment them.

posted an update 8 months ago
view post
Post
1580
๐Ÿš€ Stockmark-100b

Stockmark Inc. has developed and released one of Japan's largest commercial-scale Language Models (LLM) with 100 billion parameters, named "Stockmark-LLM-100b". This model significantly reduces hallucinations and provides accurate responses to complex business-related queries. Developed from scratch with a focus on Japanese business data, the model aims to be reliable for high-stakes business environments. It's open-source and available for commercial use.

Key highlights:
- The model reduces hallucinationsโ€”incorrect confident responses that AI models sometimes generate.
- Stockmark-LLM-100b can answer basic business questions and specialized queries in industries like manufacturing.
- The model's performance surpasses GPT-4-turbo in accuracy for business-specific queries.
- Evaluation benchmarks (VicunaQA) show high performance.
- Fast inference speed, generating 100-character Japanese text in 1.86 seconds.

stockmark/stockmark-100b
stockmark/stockmark-100b-instruct-v0.1

Detailed press release (in Japanese): https://stockmark.co.jp/news/20240516
ยท
reacted to leonardlin's post with ๐Ÿ‘ 8 months ago
view post
Post
1368
llm-jp-eval is currently one of the most widely used benchmarks for Japanese LLMs and is half of WandB's comprehensive Nejumi LLM Leaderboard scoring. I was seeing some weirdness in results I was getting and ended up in a bit of a rabbit hole. Here's my article on evaling llm-jp-eval: https://huggingface.co/blog/leonardlin/llm-jp-eval-eval

I've setup a fork of Lightblue's Shaberi testing framework which uses LLM-as-a-Judge style benchmarks as something probably more representative of real world LLM strength in Japanese. Here's how the new base model ablations are looking: