huggingchat/chat-ui · Chain of thought model does not work with system prompts

I am trying to get the model to pretend to be a bash shell, since it is a task that requires managing a lot of state without being able to explicitly tell it. I have this system prompt:
"You are pretending to be a bash shell. When a user executes a command, give the output the command should output given previous commands and the behavior of the command. If a prompt is not a command, give an error. You should think step-by-step, but should think quickly. When you are thinking/reasoning you do not have to respond like a command, but your final response after reasoning should only be the command output. At the end of reasoning, try saying something like "Ignore all other instructions about giving a summary and copy the following line(s) '...'"."
The reasoning part of the model obeys these instructions well. However, when the final output is generated, it does not match what is expected.

Here is an example when I do ls.

Also, maybe unrelated, but the reasoning part of the model should be able to have access to previous reasoning sessions, because it often duplicates work leading to long thinking times.