--- language: - en --- # TalkBank Batchalign CHATUtterance CHATUtterance is a series of Bert-derivative models designed for the task of Utterance Segmentation released by the TalkBank project, which is trained on the the utterance diarization samples given by [The Michigan Corpus of Academic Spoken English](https://ca.talkbank.org/access/MICASE.html). ## Usage The models can be used directly as a Bert-class token classification model following the [instructions from Huggingface](https://huggingface.co/docs/transformers/tasks/token_classification). Feel free to inspect [this file](https://github.com/TalkBank/batchalign/blob/73ec04761ed3ee2eba04ba0cf14dc898f88b72f7/baln/utokengine.py#L85-L94) for a sense of what the classes means. Alternatively, to get the full analysis possible with the model, it is best combined with the TalkBank Batchalign suite of analysis software, [available here](https://github.com/talkbank/batchalign2), using `transcribe` mode. Target labels: - `0`: regular form - `1`: start of utterance/capitalized word - `2`: end of declarative utterance (end this utterance with a `.`) - `3`: end of interrogative utterance (end this utterance with a `?`) - `4`: end of exclamatory utterance (end this utterance with a `!`) - `5`: break in the utterance; depending on orthography one can insert a `,`