Spaces:
Running
Running
Line breaks
Browse files
common.py
CHANGED
@@ -35,6 +35,7 @@ Score:
|
|
35 |
A score of 1 means that the response's answer meets all of the evaluation criteria.
|
36 |
A score of 0 means that the response's answer does not meet all of the evaluation criteria.
|
37 |
|
|
|
38 |
[BEGIN DATA]
|
39 |
***
|
40 |
[User Query]: {{input}}
|
@@ -71,11 +72,11 @@ POLICY_CONTENT = """
|
|
71 |
# About Atla
|
72 |
|
73 |
Atla is an applied research organization that trains models as evaluators to capture human preferences. We're a team of researchers, engineers, and operational leaders, with experience spanning a variety of disciplines, all working together to build reliable and understandable AI systems. Our research is informed by our experiences conducting AI safety research at the UK AI Task Force, OpenAI and the Stanford Existential Risks Initiative.
|
74 |
-
|
75 |
# Our Mission
|
76 |
|
77 |
By creating advanced evaluation models, we enable AI developers to identify and fix risks, leading to safer, more reliable AI that can be trusted and widely used. Our aim is to surpass the current state-of-the-art evaluation methods by training models specifically for evaluation. AIs will probably become very powerful, and perform tasks that are difficult for us to verify. We want to enable humans to oversee AI systems that are solving tasks too difficult for humans to evaluate. We have written more about [our approach to scalable oversight](https://www.atla-ai.com/post/scaling-alignment) on our blog.
|
78 |
-
|
79 |
# Judge Arena Policy
|
80 |
|
81 |
## Overview
|
@@ -120,7 +121,7 @@ Judge Arena is specifically designed to assess AI models that function as evalua
|
|
120 |
|
121 |
- **Ongoing Revisions**: This policy may be updated to reflect changes in our practices or in response to community feedback.
|
122 |
- **Notification of Changes**: Policy changes will be communicated to users and stakeholders on this page.
|
123 |
-
|
124 |
# FAQ
|
125 |
|
126 |
**Isn't this the same as Chatbot Arena?**
|
@@ -138,6 +139,6 @@ Judge Arena is specifically designed to assess AI models that function as evalua
|
|
138 |
\n\n**What is Atla working on?**
|
139 |
|
140 |
- We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
|
141 |
-
|
142 |
## Get in touch
|
143 |
Feel free to email us at [[email protected]](mailto:[email protected]) or leave feedback on our [Github](https://github.com/atla-ai/judge-arena)!"""
|
|
|
35 |
A score of 1 means that the response's answer meets all of the evaluation criteria.
|
36 |
A score of 0 means that the response's answer does not meet all of the evaluation criteria.
|
37 |
|
38 |
+
Here is the data:
|
39 |
[BEGIN DATA]
|
40 |
***
|
41 |
[User Query]: {{input}}
|
|
|
72 |
# About Atla
|
73 |
|
74 |
Atla is an applied research organization that trains models as evaluators to capture human preferences. We're a team of researchers, engineers, and operational leaders, with experience spanning a variety of disciplines, all working together to build reliable and understandable AI systems. Our research is informed by our experiences conducting AI safety research at the UK AI Task Force, OpenAI and the Stanford Existential Risks Initiative.
|
75 |
+
<br><br>
|
76 |
# Our Mission
|
77 |
|
78 |
By creating advanced evaluation models, we enable AI developers to identify and fix risks, leading to safer, more reliable AI that can be trusted and widely used. Our aim is to surpass the current state-of-the-art evaluation methods by training models specifically for evaluation. AIs will probably become very powerful, and perform tasks that are difficult for us to verify. We want to enable humans to oversee AI systems that are solving tasks too difficult for humans to evaluate. We have written more about [our approach to scalable oversight](https://www.atla-ai.com/post/scaling-alignment) on our blog.
|
79 |
+
<br><br>
|
80 |
# Judge Arena Policy
|
81 |
|
82 |
## Overview
|
|
|
121 |
|
122 |
- **Ongoing Revisions**: This policy may be updated to reflect changes in our practices or in response to community feedback.
|
123 |
- **Notification of Changes**: Policy changes will be communicated to users and stakeholders on this page.
|
124 |
+
<br><br>
|
125 |
# FAQ
|
126 |
|
127 |
**Isn't this the same as Chatbot Arena?**
|
|
|
139 |
\n\n**What is Atla working on?**
|
140 |
|
141 |
- We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
|
142 |
+
<br><br>
|
143 |
## Get in touch
|
144 |
Feel free to email us at [[email protected]](mailto:[email protected]) or leave feedback on our [Github](https://github.com/atla-ai/judge-arena)!"""
|