talkbank
/

CHATUtterance-en

Token Classification

Inference Endpoints

Model card Files Files and versions Community

CHATUtterance-en / README.md

jemoka's picture

Update README.md

764ec3f verified 11 months ago

|

1.33 kB

	---
	language:
	- en
	---

	# TalkBank Batchalign CHATUtterance
	CHATUtterance is a series of Bert-derivative models designed for the task of Utterance Segmentation released by the TalkBank project, which is trained on the the utterance diarization samples given by [The Michigan Corpus of Academic Spoken English](https://ca.talkbank.org/access/MICASE.html).

	## Usage
	The models can be used directly as a Bert-class token classification model following the [instructions from Huggingface](https://huggingface.co/docs/transformers/tasks/token_classification). Feel free to inspect [this file](https://github.com/TalkBank/batchalign/blob/73ec04761ed3ee2eba04ba0cf14dc898f88b72f7/baln/utokengine.py#L85-L94) for a sense of what the classes means. Alternatively, to get the full analysis possible with the model, it is best combined with the TalkBank Batchalign suite of analysis software, [available here](https://github.com/talkbank/batchalign2), using `transcribe` mode.

	Target labels:

	- `0`: regular form
	- `1`: start of utterance/capitalized word
	- `2`: end of declarative utterance (end this utterance with a `.`)
	- `3`: end of interrogative utterance (end this utterance with a `?`)
	- `4`: end of exclamatory utterance (end this utterance with a `!`)
	- `5`: break in the utterance; depending on orthography one can insert a `,`