Saarthak2002
commited on
Upload 3 files
Browse files- analytics_vidhya_data.csv +24 -0
- app.py +105 -0
- requirements.txt +6 -0
analytics_vidhya_data.csv
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Course Title,Course Description ,Tags,Category,Level of Difficulty,Instructor Name,Duration,Rating,Certification,Prerequisite,Course url
|
2 |
+
Introduction to Generative AI,"This course provides a comprehensive understanding of generative AI, including text and image generation techniques. By the end, you will learn to use generative AI tools to create diverse content and implement these techniques in real-world projects through practical exercises","Generative AI, Text Generation, Image Generation, AI",AI & Machine Learning,Intermediate,N/A,N/A,N/A,Yes,N/A,https://courses.analyticsvidhya.com/courses/introduction-to-generative-ai
|
3 |
+
Introduction to AI & ML,"Artificial Intelligence (AI) and Machine Learning (ML) are disrupting industries globally. This course provides an overview of AI and ML concepts, current trends, and their impact on businesses. It helps learners understand how AI and ML work and why they are essential in today's competitive landscape.","AI, Machine Learning, Data Science, Industry Disruption",AI & Machine Learning,Beginner,Kunal Jain,2 Hours,4.8,Yes,Curiosity about AI & ML and a working Internet connection,https://courses.analyticsvidhya.com/courses/introduction-to-ai-ml
|
4 |
+
Introduction to Business Analytics,"Business analytics is in high demand as organizations are increasingly integrating data science solutions. This course covers the basics of business analytics, its applications, and tools. Learners will explore how business analytics solves business problems and how they can start their journey in this field.","Business Analytics, Data Science, Career",Business Analytics,Beginner,Kunal Jain,1 Hour,4.6,Yes,Curiosity about business analytics and a working Internet connection,https://courses.analyticsvidhya.com/courses/introduction-to-analytics
|
5 |
+
Getting Started with Large Language Model,This course covers the fundamentals of Large Language Models (LLMs) and natural language processing (NLP) applications using PyTorch. Learners will gain hands-on experience building and fine-tuning LLMs to generate human-like text. The course includes curated resources and exercises to help professionals and students become adept in NLP.,"NLP, PyTorch, LLMs, Generative AI",Large Language Models,Beginner,Kunal Jain,1.2 Hours,4.5,Yes,"Knowledge of deep learning, interest in NLP, and a working Internet connection",https://courses.analyticsvidhya.com/courses/getting-started-with-llms
|
6 |
+
Building LLM Applications using Prompt Engineering ,"This course will guide you�through building your first Retrieval-Augmented Generation (RAG) system�using LlamaIndex.�You will start with data ingestion by loading a file into the system, followed by indexing the data for efficient retrieval. Next, you will set up retrieval configurations and use a response synthesizer to combine data into a coherent response. Finally, you will employ a query engine to generate responses. By the end of this course, you will have a solid understanding of these processes and be able to build an RAG system using LlamaIndex code effectively.","RAG, LlamaIndex, AI, Deep Learning",Data Science,Intermediate,"Dr. Prashant Sahu, Ph.D",1.1 Hours,4.8,Yes,"Knowledge of Deep Learning, LLM",https://courses.analyticsvidhya.com/courses/building-first-rag-systems-using-llamaindex
|
7 |
+
MidJourney: From Inspiration to Implementation ,"This course provides a practical understanding of MidJourney tools. By the end, you will be able to utilize MidJourney for creative projects and explore alternative tools. Learn how to draw inspiration, use MidJourney's features, and understand its applications.","MidJourney, Creative Projects, Visual Storytelling",Image Generation,Intermediate,Sandeep Singh ,33 mins,4.6,Yes,Basic knowledge of Stable Diffusion and Prompt Engineering,https://courses.analyticsvidhya.com/courses/midjourney_from_inspiration_to_implementation
|
8 |
+
Building Your first RAG System using LlamaIndex ,"This course will guide you through building your first Retrieval-Augmented Generation (RAG) system using LlamaIndex. It covers data ingestion, indexing, retrieval configurations, response synthesis, and query engine usage. By the end of the course, you will have a solid understanding of building RAG systems and applying them to real-world scenarios.","RAG System, LlamaIndex, Data Ingestion, NLP","AI/ML, Natural Language Processin",Intermediate,"Dr. Prashant Sahu, Ph.D",1.1 Hours,4.8,Yes,"Knowledge of Deep Learning, Understanding of Large Language Models",https://courses.analyticsvidhya.com/courses/building-first-rag-systems-using-llamaindex
|
9 |
+
Exploring Stability.AI ,"This course provides hands-on experience with Stability.AI tools. Learn to deploy SD WebUI, customize settings, and use Automatic1111 WebUI on RunPod GPU environments. Gain skills in installation, setup, generation, and fine-tuning of SD WebUI, enabling you to fully harness Stability.AI's capabilities for your projects.","Stability.AI, SD WebUI, Generative AI, Automatic1111","AI/ML, Generative AI",Beginner,Sandeep Singh ,1 Hour,4.5,Yes,"Basic ML, DL, Python knowledge, Understanding of RunPod Environment",https://courses.analyticsvidhya.com/courses/exploring-stability-ai
|
10 |
+
Introduction to Python,"This course is designed for beginners with no coding or Data Science background. Learn the fundamentals of Python for data science, covering libraries like Pandas for data analysis and Matplotlib for data visualization. By the end, you'll be equipped to start using Python for data analysis and machine learning.","Python, Data Science, Pandas, Numpy, Matplotlib, Seaborn",Data Science,Beginner,N/A,1 Hour,4.8,Yes,"No coding experience required, Basic understanding of Python",https://courses.analyticsvidhya.com/courses/introduction-to-data-science
|
11 |
+
The Working of Neural Networks ,"This course is a beginner's guide to understanding neural networks and their workings, including forward propagation, loss functions, optimization techniques, and backpropagation. By the end, learners will be able to build advanced deep learning models using the PyTorch framework. It is recommended to complete an advanced Machine Learning course before taking this.","Neural Networks, Deep Learning, PyTorch, AI, Machine Learning",Deep Learning,Beginner,N/A,N/A,N/A,No,"Advanced Machine Learning course suggested, basic deep learning",https://courses.analyticsvidhya.com/courses/The%20Working%20of%20Neural%20Networks
|
12 |
+
Understanding Linear Regression ,"This course covers the foundational concepts of linear regression, helping learners understand and build predictive models using a business case study. It covers the significance of slope and intercept, model building, and understanding using both descriptive and predictive approaches.","Linear Regression, Machine Learning, Predictive Modeling",Machine Learning,Beginner,N/A,N/A,N/A,No,Basic understanding of regression techniques,https://courses.analyticsvidhya.com/courses/free-understanding-linear-regression
|
13 |
+
Building a Text Classification Model with Natural Language Processing ,"This course covers the essentials of Natural Language Processing (NLP) using PyTorch. It provides hands-on experience in building text classification models, text preprocessing techniques, and practical applications of NLP in real-world scenarios. Suitable for professionals and beginners alike.","NLP, Text Classification, PyTorch, AI, Text Analysis",Natural Language Processing,Intermediate,Apoorv Vishnoi,70 mins,4.7,Yes,Basic understanding of Python and AI concepts,https://courses.analyticsvidhya.com/courses/free-building-textclassification-natural-language-processing
|
14 |
+
The A to Z of Unsupervised ML ,"This free course covers popular unsupervised machine learning models such as clustering algorithms and DBSCAN. You will apply these models to a real-world business problem and learn how to uncover hidden patterns in datasets without labels. Ideal for professionals and aspiring students aiming to expand their skill set in exploratory data analysis, dimensionality reduction, and insights discovery.","Unsupervised Learning, Clustering, DBSCAN, k-means, Machine Learning",Machine Learning,Intermediate,N/A,52 mins,4.6,Yes,Basic understanding of machine learning concepts,https://courses.analyticsvidhya.com/courses/free-unsupervised-ml-guide
|
15 |
+
Bagging and Boosting ML Algorithms ,"This free course provides a hands-on understanding of Bagging and Boosting techniques in machine learning. You'll learn to implement and tune ensemble methods like Random Forest, AdaBoost, and Gradient Boosting to enhance model performance and predictive accuracy. Perfect for professionals and aspiring students looking to apply advanced ML techniques to solve complex real-world business problems.","Bagging, Boosting, Random Forest, AdaBoost, Machine Learning",Machine Learning,Intermediate,N/A,N/A,N/A,No,"Basic ML knowledge (Regression, Decision Trees), Python",https://courses.analyticsvidhya.com/courses/bagging-boosting-ML-Algorithms
|
16 |
+
Data Preprocessing on a Real-World Problem Statement ,"This free course offers practical knowledge on data preprocessing techniques, preparing datasets for machine learning models. You'll learn to handle missing values, detect and treat outliers, and combine multiple data tables for efficient modeling. Ideal for beginners looking to step into Data Science and professionals seeking to improve data cleaning and preparation skills.","Data Preprocessing, EDA, Missing Values, Outlier Detection",Data Science,Beginner,N/A,N/A,N/A,No,"Basic ML theory, Python, Jupyter Notebook or any IDE",https://courses.analyticsvidhya.com/courses/data-preprocessing
|
17 |
+
Introduction to Business Analytics,"This course provides an introduction to Business Analytics, covering key topics like its growing popularity, applications, and how to start learning from scratch. It explores tools, techniques, and applications of Business Analytics to solve real-world problems across various industries and roles, making it an essential field for forward-thinking organizations.","Business Analytics, Data Science",Business Analytics,Beginner,Kunal Jain,1 Hour,4.6,Yes,None,https://courses.analyticsvidhya.com/courses/introduction-to-analytics
|
18 |
+
Microsoft Excel: Formulas & Functions,"This course provides an in-depth understanding of Microsoft Excel�s vast array of formulas and functions. Covering everything from basic arithmetic to advanced LookUp functions (VLookUp, HLookUp), logical functions, and more, it is designed to help anyone master Excel for data analysis. Whether you're a beginner or looking to sharpen your skills, the course will equip you with tools essential for business analytics and real-world projects.","Microsoft Excel, Data Analysis, Formula",Data Analysis,Beginner,N/A,3 Hours,4.8,Yes,None,https://courses.analyticsvidhya.com/courses/microsoft-excel-formulas-functions
|
19 |
+
Tableau for Beginner,"This course introduces Tableau as the tool of choice for business intelligence, analytics, and data visualization. You�ll learn to build impactful visualizations using Tableau�s intuitive drag-and-drop interface. The course covers various charts (bar, line, pie) and introduces geospatial analysis with map visualizations. It is perfect for beginners looking to get started with data visualization and business intelligence.","Tableau, Data Visualization, Analytics",Business Intelligence,Beginner,N/A,15 mins,4.6,Yes,None,https://courses.analyticsvidhya.com/courses/tableau-for-beginners
|
20 |
+
Loan Prediction Practice Problem (Using Python),"This course is designed for people who want to solve binary classification problems using Python. By working on a real-life case study of Dream Housing Finance, you will learn how to automate the loan eligibility process based on customer details. The course covers data analysis, feature engineering, model building using logistic regression with stratified k-folds cross-validation, and evaluation metrics for classification problems.","Loan Prediction, Classification, Python, ML","Data Science, Machine Learning",Intermediate,N/A,N/A,4.7,Yes,Familarity with Python,https://courses.analyticsvidhya.com/courses/loan-prediction-practice-problem-using-python
|
21 |
+
Twitter Sentiment Analysis,"This course focuses on sentiment analysis using Python, which is a technique to extract the attitude or opinion of a text and categorize it as positive, negative, or neutral. Sentiment analysis has applications in product feedback, social media analysis, and more. You will learn data cleaning, feature extraction, and model building using algorithms like Logistic Regression, SVM, RandomForest, and XGBoost. By the end of the course, you will have hands-on experience with Twitter sentiment analysis.","Sentiment Analysis, Twitter, Python, ML","Data Science, NLP",Intermediate,N/A,N/A,4.7,Yes,Familarity with Python,https://courses.analyticsvidhya.com/courses/twitter-sentiment-analysis
|
22 |
+
Introduction to Web Scraping using Python,"This course covers the fundamentals of web scraping, showcasing how to perform it using Python libraries such as BeautifulSoup and Scrapy. It explains the need for web scraping in data science projects and provides hands-on experience in extracting data, images, and emails from websites. You will learn the components and procedures for web scraping, as well as how to handle page loads and work with popular Python libraries.","Web Scraping, Python, Data Science","Data Science, Web Scraping",Beginner,N/A,N/A,4.7,Yes,Basic Knowledge of Python,https://courses.analyticsvidhya.com/courses/introduction-to-web-scraping
|
23 |
+
Big Mart Sales Prediction Using R,"This course teaches the essentials of sales prediction, focusing on the Big Mart Sales Prediction Challenge. It covers skills and techniques required for solving regression problems using R. Participants will engage in hypothesis generation, univariate and bivariate analysis, feature engineering, and model building using methods like Linear Regression, Random Forest, and XGBoost.","Sales Prediction, R, Regression","Data Science, Machine Learning",Intermediate,N/A,N/A,4.6,Yes,Familarity with R,https://courses.analyticsvidhya.com/courses/big-mart-sales-prediction-using-r
|
24 |
+
Time Series Forecasting using Python,"Learn time series analysis and forecasting using Python. This course covers ARIMA, Holt�s Winter, and other methods for real-life industry use cases. Participants will develop skills in tuning parameters, handling seasonality, and evaluating models.","Time Series, ARIMA, Exponential Smoothing","Data Science, Machine Learning",Intermediate,N/A,N/A,4.6,Yes,Familarity with Python,https://courses.analyticsvidhya.com/courses/creating-time-series-forecast-using-python
|
app.py
ADDED
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import pandas as pd
|
2 |
+
from sentence_transformers import SentenceTransformer, util
|
3 |
+
from transformers import pipeline
|
4 |
+
import torch
|
5 |
+
import gradio as gr
|
6 |
+
|
7 |
+
# Load the dataset
|
8 |
+
df = pd.read_csv(r"C:\Users\Saarthak\Desktop\Saarthak_assignment\analytics_vidhya_data.csv", encoding='ISO-8859-1')
|
9 |
+
|
10 |
+
# Load the pre-trained model for embeddings (using SentenceTransformers)
|
11 |
+
model = SentenceTransformer('multi-qa-mpnet-base-dot-v1')
|
12 |
+
|
13 |
+
# Combine title and description to create a full text for each course
|
14 |
+
df['full_text'] = df.iloc[:,0] + " " + df.iloc[:,1] + " " + df['Instructor Name'] + " " + str(df['Rating']) + " " + df['Category']
|
15 |
+
|
16 |
+
# Convert full course texts into embeddings
|
17 |
+
course_embeddings = model.encode(df['full_text'].tolist(), convert_to_tensor=True)
|
18 |
+
|
19 |
+
# Function to expand the query using paraphrasing
|
20 |
+
def expand_query(query):
|
21 |
+
paraphraser = pipeline('text2text-generation', model='Vamsi/T5_Paraphrase_Paws')
|
22 |
+
expanded_queries = paraphraser(query, num_return_sequences=3, max_length=50, do_sample=True)
|
23 |
+
return [q['generated_text'] for q in expanded_queries]
|
24 |
+
|
25 |
+
# Function to search for the most relevant courses
|
26 |
+
def search_courses(query, level_filter=None, category_filter=None, top_k=3):
|
27 |
+
# Step 1: Expand the query using paraphrasing
|
28 |
+
expanded_queries = expand_query(query)
|
29 |
+
|
30 |
+
# Step 2: Initialize an array to store all similarities
|
31 |
+
all_similarities = []
|
32 |
+
|
33 |
+
for expanded_query in expanded_queries:
|
34 |
+
# Convert each expanded query into an embedding
|
35 |
+
query_embedding = model.encode(expanded_query, convert_to_tensor=True)
|
36 |
+
|
37 |
+
# Compute cosine similarities between the query embedding and course embeddings
|
38 |
+
similarities = util.pytorch_cos_sim(query_embedding, course_embeddings)[0]
|
39 |
+
|
40 |
+
# Append to the list of all similarities
|
41 |
+
all_similarities.append(similarities)
|
42 |
+
|
43 |
+
# Step 3: Convert the list of tensors to a single tensor by taking the maximum similarity for each course
|
44 |
+
aggregated_similarities = torch.max(torch.stack(all_similarities), dim=0)[0]
|
45 |
+
|
46 |
+
# Step 4: Apply filters
|
47 |
+
filtered_df = df.copy()
|
48 |
+
if level_filter:
|
49 |
+
filtered_df = filtered_df[filtered_df['Level of Difficulty'] == level_filter]
|
50 |
+
if category_filter:
|
51 |
+
filtered_df = filtered_df[filtered_df['Category'] == category_filter]
|
52 |
+
|
53 |
+
if filtered_df.empty:
|
54 |
+
return "<p>No matching courses found.</p>"
|
55 |
+
|
56 |
+
# Recalculate similarities for the filtered data
|
57 |
+
filtered_similarities = aggregated_similarities[filtered_df.index]
|
58 |
+
|
59 |
+
# Step 5: Get top_k most similar courses
|
60 |
+
top_results = filtered_similarities.topk(k=min(top_k, len(filtered_similarities)))
|
61 |
+
|
62 |
+
# Prepare the output as clickable links
|
63 |
+
results = []
|
64 |
+
for idx in top_results.indices:
|
65 |
+
idx = int(idx)
|
66 |
+
course_title = filtered_df.iloc[idx]['Course Title']
|
67 |
+
course_description = filtered_df.iloc[idx,1]
|
68 |
+
course_url = filtered_df.iloc[idx,-1]
|
69 |
+
|
70 |
+
|
71 |
+
# Format the result as a clickable hyperlink using raw HTML
|
72 |
+
course_link = f'<a href="{course_url}" target="_blank">{course_title}</a>'
|
73 |
+
results.append(f"<strong>{course_link}</strong><br>{course_description}<br><br>")
|
74 |
+
|
75 |
+
# Combine all results into an HTML formatted list
|
76 |
+
return "<ol>" + "".join([f"<li>{result}</li>" for result in results]) + "</ol>"
|
77 |
+
|
78 |
+
# Create Gradio UI
|
79 |
+
def create_gradio_interface():
|
80 |
+
with gr.Blocks() as demo:
|
81 |
+
gr.Markdown("# 📚 Analytics Vidhya Free Courses")
|
82 |
+
gr.Markdown("Enter your query and use filters to narrow down the search.")
|
83 |
+
|
84 |
+
# Input elements
|
85 |
+
query = gr.Textbox(label="🔍 Search for a course", placeholder="Enter course topic or description")
|
86 |
+
|
87 |
+
# Filters (in a collapsible form)
|
88 |
+
with gr.Accordion("🔍 Filters", open=False):
|
89 |
+
level_filter = gr.Dropdown(choices=["Beginner", "Intermediate", "Advanced"], label="📚 Course Level", multiselect=False)
|
90 |
+
category_filter = gr.Dropdown(choices=["Data Science", "Machine Learning", "Deep Learning", "AI", "NLP"], label="📂 Category", multiselect=False)
|
91 |
+
|
92 |
+
# Search button
|
93 |
+
search_button = gr.Button("Search")
|
94 |
+
|
95 |
+
# Output HTML for displaying results
|
96 |
+
output = gr.HTML(label="Search Results")
|
97 |
+
|
98 |
+
# On button click, trigger the search function
|
99 |
+
search_button.click(fn=search_courses, inputs=[query, level_filter, category_filter], outputs=output)
|
100 |
+
|
101 |
+
return demo
|
102 |
+
|
103 |
+
# Launch Gradio interface
|
104 |
+
demo = create_gradio_interface()
|
105 |
+
demo.launch(share=True, debug=True)
|
requirements.txt
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pandas
|
2 |
+
sentence-transformers
|
3 |
+
torch
|
4 |
+
transformers
|
5 |
+
gradio
|
6 |
+
sentencepiece
|