tomaarsen
/

span-marker-bert-base-ncbi-disease

Token Classification

named-entity-recognition

Model card Files Files and versions Metrics Training metrics Community

span-marker-bert-base-ncbi-disease / README.md

tomaarsen's picture

tomaarsen HF staff

Upload refined README

87f7994 over 1 year ago

|

history blame contribute delete

3.8 kB


	---
	license: apache-2.0
	library_name: span-marker
	tags:
	- span-marker
	- token-classification
	- ner
	- named-entity-recognition
	pipeline_tag: token-classification
	widget:
	- text: "X-Linked adrenoleukodystrophy (ALD) is a genetic disease associated with demyelination of the central nervous system, adrenal insufficiency, and accumulation of very long chain fatty acids in tissue and body fluids."
	example_title: "Example 1"
	- text: "Canavan disease is inherited as an autosomal recessive trait that is caused by the deficiency of aspartoacylase (ASPA)."
	example_title: "Example 2"
	- text: "However, both models lack other frequent DM symptoms including the fibre-type dependent atrophy, myotonia, cataract and male-infertility."
	example_title: "Example 3"
	model-index:
	- name: SpanMarker w. bert-base-cased on NCBI Disease by Tom Aarsen
	results:
	- task:
	type: token-classification
	name: Named Entity Recognition
	dataset:
	type: ncbi_disease
	name: NCBI Disease
	split: test
	revision: acd0e6451198d5b615c12356ab6a05fff4610920
	metrics:
	- type: f1
	value: 0.8813
	name: F1
	- type: precision
	value: 0.8661
	name: Precision
	- type: recall
	value: 0.8971
	name: Recall
	datasets:
	- ncbi_disease
	language:
	- en
	metrics:
	- f1
	- recall
	- precision
	---

	# SpanMarker for Disease Named Entity Recognition

	This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [ncbi_disease](https://huggingface.co/datasets/ncbi_disease) dataset. In particular, this SpanMarker model uses [bert-base-cased](https://huggingface.co/bert-base-cased) as the underlying encoder. See [train.py](train.py) for the training script.

	## Metrics

	This model achieves the following results on the testing set:
	- Overall Precision: 0.8661
	- Overall Recall: 0.8971
	- Overall F1: 0.8813
	- Overall Accuracy: 0.9837

	## Labels

	\| Label \| Examples \|
	\|-----------\|--------------\|
	\| DISEASE \| "ataxia-telangiectasia", "T-cell leukaemia", "C5D", "neutrophilic leukocytosis", "pyogenic infection" \|

	## Usage

	To use this model for inference, first install the `span_marker` library:

	```bash
	pip install span_marker
	```

	You can then run inference with this model like so:

	```python
	from span_marker import SpanMarkerModel

	# Download from the 🤗 Hub
	model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-ncbi-disease")
	# Run inference
	entities = model.predict("Canavan disease is inherited as an autosomal recessive trait that is caused by the deficiency of aspartoacylase (ASPA).")
	```

	See the [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) repository for documentation and additional information on this library.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Overall Precision \| Overall Recall \| Overall F1 \| Overall Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-----------------:\|:--------------:\|:----------:\|:----------------:\|
	\| 0.0038 \| 1.41 \| 300 \| 0.0059 \| 0.8141 \| 0.8579 \| 0.8354 \| 0.9818 \|
	\| 0.0018 \| 2.82 \| 600 \| 0.0054 \| 0.8315 \| 0.8720 \| 0.8513 \| 0.9840 \|


	### Framework versions

	- SpanMarker 1.2.4
	- Transformers 4.31.0
	- Pytorch 1.13.1+cu117
	- Datasets 2.14.3
	- Tokenizers 0.13.2