Adding the Open Portuguese LLM Leaderboard Evaluation Results

fec851d verified 3 months ago

5.38 kB

	---
	language:
	- en
	license: mit
	tags:
	- ORPO
	datasets:
	- mlabonne/orpo-dpo-mix-40k
	model-index:
	- name: Barcenas-14b-Phi-3-medium-ORPO
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 73.2
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 65.79
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 51.03
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 92.6
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 71.45
	name: pearson
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 69.06
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 84.6
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 73.55
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 67.01
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
	name: Open Portuguese LLM Leaderboard
	---
	Barcenas-14b-Phi-3-medium-ORPO

	Model trained with the innovative ORPO method, based on the robust VAGOsolutions/SauerkrautLM-Phi-3-medium.

	The model was trained with the dataset: mlabonne/orpo-dpo-mix-40k, which combines diverse data sources to enhance conversational capabilities and contextual understanding.

	Made with ❤️ in Guadalupe, Nuevo Leon, Mexico 🇲🇽


	# Open Portuguese LLM Leaderboard Evaluation Results

	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|72.03\|
	\|ENEM Challenge (No Images)\| 73.20\|
	\|BLUEX (No Images) \| 65.79\|
	\|OAB Exams \| 51.03\|
	\|Assin2 RTE \| 92.60\|
	\|Assin2 STS \| 71.45\|
	\|FaQuAD NLI \| 69.06\|
	\|HateBR Binary \| 84.60\|
	\|PT Hate Speech Binary \| 73.55\|
	\|tweetSentBR \| 67.01\|