1 2 1

Le Duc Khai

leduckhai

leduckhai

AI & ML interests

None yet

Recent Activity

updated a model about 2 months ago

leduckhai/ViT5-VietMedSum

updated a dataset about 2 months ago

leduckhai/VietMed-Sum

View all activity

Organizations

leduckhai's activity

updated a model about 2 months ago

leduckhai/ViT5-VietMedSum

Summarization • Updated Nov 9, 2024 • 22

updated a dataset about 2 months ago

leduckhai/VietMed-Sum

Viewer • Updated Nov 9, 2024 • 106k • 60 • 1

updated a dataset 3 months ago

leduckhai/MultiMed

Viewer • Updated Sep 28, 2024 • 48.4k • 36

authored 2 papers 6 months ago

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

Paper • 2210.13397 • Published Oct 24, 2022

Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in the HYKIST Project

Paper • 2309.15869 • Published Sep 26, 2023

New activity in leduckhai/VietMed-Sum 6 months ago

[bot] Conversion to Parquet

#1 opened 6 months ago by

parquet-converter

authored 2 papers 6 months ago

Medical Spoken Named Entity Recognition

Paper • 2406.13337 • Published Jun 19, 2024

Real-time Speech Summarization for Medical Conversations

Paper • 2406.15888 • Published Jun 22, 2024 • 1

upvoted a paper 6 months ago

Real-time Speech Summarization for Medical Conversations

Paper • 2406.15888 • Published Jun 22, 2024 • 1

updated a dataset 7 months ago

leduckhai/VietMed-NER

Viewer • Updated Jun 21, 2024 • 9.27k • 112

reacted to merve's post with 🔥 7 months ago

Post

4342

Florence-2 is a new vision foundation model capable of a wide variety of tasks 🤯
Demo 👉🏻 gokaygokay/Florence-2
Collection 👉🏻 microsoft/florence-6669f44df0d87d9c3bfb76de

This model can handle tasks that vary from OCR to semantic segmentation.

The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.

The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.

You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder 🤓📉
They have released fine-tuned models too, you can find them in the collection above 🤗

3 replies

authored a paper 7 months ago

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Paper • 2404.05659 • Published Apr 8, 2024 • 2

liked a dataset 8 months ago

leduckhai/VietMed

Preview • Updated May 25, 2024 • 117 • 15

updated a dataset 8 months ago

leduckhai/VietMed

Preview • Updated May 25, 2024 • 117 • 15

upvoted a paper 9 months ago

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Paper • 2404.05659 • Published Apr 8, 2024 • 2