Mistral-Instruct can't count characters

#79
by KeithCu - opened

I've been playing around with Mistral, and think it would be an awesome LLM for LibreOffice, for everything from helping with grammar checking, to fixing spreadsheet errors, to helping format a resume to look more professional, and even being a research assistant.

However, it fails at a few easy commands. Here are my prompts and results.

I want to check your ability to count characters. Given the following sentence "She go to the store." Assuming the first character is index 0, what is the character index and length of the word "go" in the sentence?

Mistral-Instruct: The character index of the word "go" in the sentence "She goes to the store" would be from index 12 to index 15 (inclusive). This means that there are a total of 4 characters in the word "go".

ParisNeo created a dataset for this: https://huggingface.co/datasets/ParisNeo/Word_in_Sentence_Database

And for grammar checking:

Please perform a grammar check on the sentence at the end and provide a list of errors.
For each error, describe the problem, suggest a fix, and include the character index of the start and length of the mistaken letters suggested to be replaced with the fix.
Make sure the fix is minimal (the fewest letters that need to be changed) and if fixing one error makes another go away, don’t include the subsequent error.
Put into a JSON format so it can be easily loaded and used by others.
Sentence for Grammar Check: "She go to the store."

Mistral-Instruct:
{
"Errors": [
{
"Problem": "Subject-verb agreement",
"Fix": "goes",
"StartIndex": 0,
"Length": 3
},
{
"Problem": "Missing article before noun",
"Fix": "a",
"StartIndex": 4,
"Length": 1
}
],
"CorrectedSentence": "She goes to the store."
}

I'm not an expert, nor am I associated with Mistral :D But I don't think Mistral (or GPT-4, or GPT-3.5) could handle this prompt the way you are describing it, since it involves counting characters, and LLMs infamously do not understand characters well at all. Their view of the world is tokens.

I just asked ChatGPT 3.5:

reverse the string "Even gpt 3.5 can't reverse strings very well"

with the response:

Sure, here is the reversed string:

"llew yrev sgnirver t'nac 5.3 tpg nevE"

I asked ChatGPT 4, and amazingly it got it right, and for a second I thought they somehow upgraded ChatGPT to understand on a character-by-character basis. But then I saw the little "analysis" button and clicked it and realized that ChatGPT decided to execute Python to reverse the string :D

GPT 3.5 and 4, and Claude (usually) handle it without any problems. However, nearly every other one screws it up! It's like I found a bug in the entire LLM world ;-)

They definitely understand characters from any programming experience, but I think this sort of stuff is outside their training. Usually it is to write code to calculate where a word is in a sentence, but in this case, when it's changing part of a sentence, I need it to be able to tell me where.

It's possible to get LLMs to use Python to calculate word indexes, but if there are multiple uses of the same word in a sentence, it could more confusing to figure out what it is talking about.

Otherwise I'm just so impressed at Mistral's ability to be a helper for an app like LibreOffice (which has ~100M users) without any fine-tuning.

I did find a workaround is to have it generate a new complete sentence, and then the app figures out what to change.

Please perform a grammar check on the sentence at the end and return the simplest error you find.
Describe the problem, and create an improved sentence to fix that error.
Put into a JSON format so it can be easily loaded and used by others.
Sentence for Grammar Check: "She go to the store."

If I just have it (hopefully without hallucination!) return the entire improved sentence, the app can figure out what minimal characters to replace. (You would never replace the entire sentence if only some of it changed because it could have different formatting in different spots.)

With that change, it generally seems to do a pretty great job as a grammar checker.

LibreOffice has integrated Python support, here's a simple Python grammar checker that plugs into LibreOffice and could be used as a baseline in case anyone wants to work on it: https://cgit.freedesktop.org/libreoffice/lightproof/

Sign up or log in to comment