File size: 5,975 Bytes
ea6602a
1a0dd7a
 
ea6602a
 
 
 
 
03becb4
ea6602a
5c1c468
ea6602a
 
1a0dd7a
2e601c1
5c1c468
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e601c1
03becb4
2e601c1
03becb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
title: 🎹🥁🎸DeepResearchEvaluator
emoji: 🎹🥁🎸
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: true
license: mit
short_description: Deep Research Evaluator for Long Horizon Learning Tasks
---

# 🎵', '🎶', '🎸', '🎹', '🎺', '🎷', '🥁', '🎻

A Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.

Key Topics and Related Papers:

Long-Horizon Task Planning in Robotics:

"MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model"
Authors: Yike Wu, Jiatao Zhang, Nan Hu, LanLing Tang, Guilin Qi, Jun Shao, Jie Ren, Wei Song
This paper introduces a method that decomposes complex tasks at multiple levels to enhance planning capabilities using open-source large language models. 
ARXIV

"ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning"
Authors: Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, Lei Ma
The study presents a framework that improves LLM-based planning through an iterative self-refinement process, enhancing feasibility and correctness in task plans. 
ARXIV

Skill-Based Reinforcement Learning:

"Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks"
Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu
This research focuses on building multi-task agents in open-world environments by learning basic skills and planning over them to accomplish long-horizon tasks efficiently. 
ARXIV

"SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks"
Authors: Yongyan Wen, Siyuan Li, Rongchang Zuo, Lei Yuan, Hangyu Mao, Peng Liu
The paper proposes a framework that integrates a differentiable decision tree within the high-level policy to generate skill embeddings, enhancing explainability in decision-making for complex tasks. 
ARXIV

Neuro-Symbolic Approaches:

"Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation"
Authors: Jie-Jing Shao, Hao-Ran Hao, Xiao-Wen Yang, Yu-Feng Li
This work introduces a framework that combines data-driven learning and symbolic-based reasoning to enable long-horizon planning through abductive imitation learning. 
ARXIV

"CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning"
Authors: [Authors not specified]
The study presents a method that utilizes large language models to translate constraints into formal specifications, facilitating long-horizon task and motion planning. 
ARXIV

Evaluation Frameworks for AI Models:

"ASI: Accuracy-Stability Index for Evaluating Deep Learning Models"
Authors: Wei Dai, Daniel Berleant
The paper introduces the Accuracy-Stability Index (ASI), a quantitative measure that incorporates both accuracy and stability for assessing deep learning models. 
ARXIV

"Benchmarks for Deep Off-Policy Evaluation"
Authors: Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine
This research provides a collection of policies that, in conjunction with existing offline datasets, can be used for benchmarking off-policy evaluation in deep learning. 
ARXIV

These topics and papers contribute to the development of AI systems capable of understanding research literature and applying the acquired knowledge to complex, long-horizon tasks, thereby advancing the field of artificial intelligence.

---


Features:

🎯 Core Configuration & Setup

Configures Streamlit page with title "🚲BikeAI🏆 Claude/GPT Research"


🔑 API Setup & Clients

Initializes OpenAI, Anthropic, and HuggingFace API clients with environment variables


📝 Session State Management

Manages conversation history, transcripts, file editing states, and model selections


🧠 get_high_info_terms()

Extracts meaningful keywords from text while filtering common stop words


🏷️ clean_text_for_filename()

Sanitizes text to create valid filenames by removing special characters


📄 generate_filename()

Creates intelligent filenames based on content and timestamps


💾 create_file()

Saves prompt and response content to files with smart naming


🔗 get_download_link()

Generates base64-encoded download links for files


🎤 clean_for_speech()

Prepares text for speech synthesis by removing special characters


🗣️ speech_synthesis_html()


Creates HTML for browser-based speech synthesis


🔊 edge_tts_generate_audio()


Generates MP3 audio files using Edge TTS


🎵 speak_with_edge_tts()


Wrapper for Edge TTS audio generation


🎧 play_and_download_audio()


Creates audio player interface with download option


📸 process_image()


Analyzes images using GPT-4V


🎙️ process_audio()


Transcribes audio using Whisper


🎥 process_video()


Extracts frames from video files


🤖 process_video_with_gpt()


Analyzes video frames using GPT-4V


📚 parse_arxiv_refs()


Parses research paper references into structured format


🔍 perform_ai_lookup()


Searches and processes arXiv papers with audio summaries


📁 create_zip_of_files()


Bundles multiple files into a zip with smart naming


📂 load_files_for_sidebar()


Organizes files by timestamp for sidebar display


🏷️ extract_keywords_from_md()


Pulls keywords from markdown files for organization


📊 display_file_manager_sidebar()


Creates interactive sidebar for file management


🎬 main()


Orchestrates overall application flow and UI components