Shrirang Mahajan
NotShrirang
AI & ML interests
Deep Learning, LLMs, Machine Learning, Generative AI
Recent Activity
liked
a model
4 days ago
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
liked
a model
4 days ago
deepseek-ai/DeepSeek-V3
liked
a model
4 days ago
deepseek-ai/DeepSeek-R1
Organizations
NotShrirang's activity
Text Generation
β’
Updated
β’
127k
β’
421
deepseek-ai/DeepSeek-V3
Text Generation
β’
Updated
β’
840k
β’
β’
2.98k
deepseek-ai/DeepSeek-R1
Text Generation
β’
Updated
β’
755k
β’
β’
5.82k
commented on
Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)
12 days ago
@ariG23498 Hey, I read the article again, and it feels a lot easier to read. Kudos to your quick response! I know changing something you have put efforts into, is not easy.
Thanks for directly aligning with my preference!
Loss: π
reacted to
burtenshaw's
post with π₯
13 days ago
Post
39978
Weβre launching a FREE and CERTIFIED course on Agents!
We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents.
Here's what you'll learn:
- Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions.
- Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors.
- Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents.
- Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents.
Audience
This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents.
Enroll today and start building the next generation of AI agent applications!
https://bit.ly/hf-learn-agents
We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents.
Here's what you'll learn:
- Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions.
- Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors.
- Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents.
- Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents.
Audience
This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents.
Enroll today and start building the next generation of AI agent applications!
https://bit.ly/hf-learn-agents
commented on
Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)
13 days ago
Great explanation! How they were able to convert an optimization problem into a differentiable equation is just amazing!
I was recently trying to understand what DPO does under the hood and I watched this video by
@hkproj
. Great work!
Also, just filling in for newbies like me:
- The maximization equation in 3rd step in Reformulating the RLHF Objective
We divide the maximization equation with βΞ² and because of the - sign, it becomes minimization problem. - In (Introducing the Partition Function), Z(x) is a normalization constant. I wasn't able to understand how this term Z(x) came into picture and how it is substituted. So I asked ChatGPT and I got this!
This makes little bit of sense, but I have not verified whether this is correct or not. - There are some helpful steps in "Mathematical Derivations" section in the DPO paper: https://arxiv.org/pdf/2305.18290
Sadly this interface doesn't allow me to directly select text and use it to comment on it like ChatGPT interface does. Maybe a UI feature? :)
upvoted
an
article
13 days ago
Article
Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)
By
β’
β’
13upvoted
a
paper
22 days ago
Notifications from parquet-converter
1
#1 opened over 1 year ago
by
parquet-converter