DanL commited on
Commit
73ddbab
·
1 Parent(s): fff6725

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -3
README.md CHANGED
@@ -13,8 +13,12 @@ model-index:
13
 
14
  # scientific-challenges-and-directions
15
 
16
- We present a novel resource to help scientists and medical professionals discover challenges and potential directions across scientific literature, focusing on a broad corpus pertaining to the COVID-19 pandemic and related historical research.
17
 
 
 
 
 
18
  * Please cite our paper if you use our datasets or models in your project. See the [BibTeX](#citation).
19
  * Feel free to [email us](#contact-us).
20
  * Also, check out [our search engine](https://challenges.apps.allenai.org/), as an example application.
@@ -22,9 +26,9 @@ We present a novel resource to help scientists and medical professionals discove
22
  ## Model description
23
  This model is a fine-tuned version of [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) on the X dataset, designed for multi-label text classification.
24
 
25
- ## Training and evaluation data
26
 
27
- More information needed
 
28
 
29
  ## Training procedure
30
 
@@ -58,3 +62,24 @@ The achieves the following results on the test set:
58
  - Pytorch 1.10.0+cu111
59
  - Datasets 1.17.0
60
  - Tokenizers 0.10.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  # scientific-challenges-and-directions
15
 
16
+ We present a novel resource to help scientists and medical professionals discover challenges and potential directions across scientific literature, focusing on a broad corpus pertaining to the COVID-19 pandemic and related historical research. At a high level, the _challenges_ and _directions_ are defined as follows:
17
 
18
+ * **Challenge**: A sentence mentioning a problem, difficulty, flaw, limitation, failure, lack of clarity, or knowledge gap.
19
+ * **Research direction**: A sentence mentioning suggestions or needs for further research, hypotheses, speculations, indications or hints that an issue is worthy of exploration.
20
+
21
+ * This model here is described in our paper: [A Search Engine for Discovery of Scientific Challenges and Directions](https://arxiv.org/abs/2108.13751) (though we've upgraded the infrastructure since the paper release so there are slight differences in the resulrs).
22
  * Please cite our paper if you use our datasets or models in your project. See the [BibTeX](#citation).
23
  * Feel free to [email us](#contact-us).
24
  * Also, check out [our search engine](https://challenges.apps.allenai.org/), as an example application.
 
26
  ## Model description
27
  This model is a fine-tuned version of [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) on the X dataset, designed for multi-label text classification.
28
 
 
29
 
30
+ ## Training and evaluation data
31
+ The scientific-challenges-and-directions model is trained based on a dataset that is a collection of 2894 sentences and their surrounding contexts, from 1786 full-text papers in the CORD-19 corpus, labeled for classification of challenges and directions by expert annotators with biomedical and bioNLP backgrounds. For full details on the train/test/split of the data see section 3.1 in our [paper](https://arxiv.org/abs/2108.13751)
32
 
33
  ## Training procedure
34
 
 
62
  - Pytorch 1.10.0+cu111
63
  - Datasets 1.17.0
64
  - Tokenizers 0.10.3
65
+
66
+ ## Citation
67
+
68
+ If using our dataset and models, please cite:
69
+
70
+ ```
71
+ @misc{lahav2021search,
72
+ title={A Search Engine for Discovery of Scientific Challenges and Directions},
73
+ author={Dan Lahav and Jon Saad Falcon and Bailey Kuehl and Sophie Johnson and Sravanthi Parasa and Noam Shomron and Duen Horng Chau and Diyi Yang and Eric Horvitz and Daniel S. Weld and Tom Hope},
74
+ year={2021},
75
+ eprint={2108.13751},
76
+ archivePrefix={arXiv},
77
+ primaryClass={cs.CL}
78
+ }
79
+ ```
80
+
81
+ ## Contact us
82
+
83
+ Please don't hesitate to reach out.
84
+
85