DEWA SAHU

Dewa

AI & ML interests

Distributed Machine Learning , ML infra , LLM , Reinforcement Learning , Diffusion Models

Recent Activity

Organizations

None yet

Dewa's activity

replied to Sentdex's post 12 months ago
reacted to Sentdex's post with 🤗❤️ 12 months ago
view post
Post
Hi, welcome to my first post here!

I am slowly wrangling about 5 years of reddit comments (2015-2020). It's a total of billions samples that can be filtered as comment-reply pairs, chains of discussion, filtered by subreddit, up/down votes, controversy, sentiment, and more.

Any requests or ideas for curated datasets from here? I'll also tinker with uploading the entire dataset potentially in chunks or something, but it's quite a few terabytes in total, so I'll need to break it up still. I have some ideas for datasets I personally want too, but curious if anyone has something they'd really like to see that sounds interesting too.
·
replied to Sentdex's post 12 months ago
view reply

Can give a column describing the tone of the text.. ex. sarcastic angry happy -- would come handy when anybody try to finetune a llm to generate sarcastic content or more human like content