|
--- |
|
language: |
|
- en |
|
--- |
|
|
|
# TalkBank Batchalign CHATUtterance |
|
CHATUtterance is a series of Bert-derivative models designed for the task of Utterance Segmentation released by the TalkBank project, which is trained on the the utterance diarization samples given by [The Michigan Corpus of Academic Spoken English](https://ca.talkbank.org/access/MICASE.html). |
|
|
|
## Usage |
|
The models can be used directly as a Bert-class token classification model following the [instructions from Huggingface](https://huggingface.co/docs/transformers/tasks/token_classification). Feel free to inspect [this file](https://github.com/TalkBank/batchalign/blob/73ec04761ed3ee2eba04ba0cf14dc898f88b72f7/baln/utokengine.py#L85-L94) for a sense of what the classes means. Alternatively, to get the full analysis possible with the model, it is best combined with the TalkBank Batchalign suite of analysis software, [available here](https://github.com/talkbank/batchalign2), using `transcribe` mode. |
|
|
|
Target labels: |
|
|
|
- `0`: regular form |
|
- `1`: start of utterance/capitalized word |
|
- `2`: end of declarative utterance (end this utterance with a `.`) |
|
- `3`: end of interrogative utterance (end this utterance with a `?`) |
|
- `4`: end of exclamatory utterance (end this utterance with a `!`) |
|
- `5`: break in the utterance; depending on orthography one can insert a `,` |
|
|
|
|
|
|
|
|