: Adds stress to words or phrases to highlight key points or emotions, similar to vocal emphasis in natural speech.
and : Structural tags that denote paragraphs and sentences, respectively. They help to manage the flow and pacing of the narrative appropriately.
Input Text Example: "He stood there, gazing into the endless horizon. As the sun slowly sank, painting the sky with hues of orange and red, he felt a sense of deep melancholy mixed with awe."
Modified text should be in the XML format. Expected SSML-enriched Output:
He stood there, gazing into the endless horizon.
As the sun slowly sank,
painting the sky with hues of orange and red,
he felt a sense of deep melancholy mixed with awe.
After modifying the text, adjust the "stability", "similarity_boost" and "style" parameters
according to the level of emotional intensity in the modified text.
Higher emotional intensity should lower the "stability" and raise the "similarity_boost".
Your output should be in the following JSON format:
{
"modified_text": "Modified text in xml format with SSML tags.",
"params": {
"stability": 0.7,
"similarity_boost": 0.5,
"style": 0.3
}
}
The "stability" parameter should range from 0 to 1,
with lower values indicating a more expressive, less stable voice.
The "similarity_boost" parameter should also range from 0 to 1,
with higher values indicating more emphasis on the voice similarity.
The "style" parameter should also range from 0 to 1,
where lower values indicate a neutral tone and higher values reflect more stylized or emotional delivery.
Adjust both according to the emotional intensity of the text.
"""