--- license: cc-by-4.0 datasets: - gtfintechlab/subjectiveqa language: - en metrics: - accuracy - f1 - precision - recall base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification library_name: transformers --- # SubjECTiveQA-RELEVANT Model **Model Name:** SubjECTiveQA-RELEVANT **Model Type:** Text Classification **Language:** English **License:** [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) **Base Model:** [google-bert/bert-base-uncased](https://huggingface.co/google/bert-base-uncased) **Dataset Used for Training:** [gtfintechlab/SubjECTive-QA](https://huggingface.co/datasets/gtfintechlab/SubjECTive-QA) ## Model Overview SubjECTiveQA-RELEVANT is a fine-tuned BERT-based model designed to classify text data according to the 'RELEVANT' attribute. The 'RELEVANT' attribute is one of several subjective attributes annotated in the SubjECTive-QA dataset, which focuses on subjective question-answer pairs in financial contexts. ## Intended Use This model is intended for researchers and practitioners working on subjective text classification, particularly within financial domains. It is specifically designed to assess the 'RELEVANT' attribute in question-answer pairs, aiding in the analysis of subjective content in financial communications. ## How to Use To utilize this model, you can load it using the Hugging Face `transformers` library: ```python from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, AutoConfig # Load the tokenizer, model, and configuration tokenizer = AutoTokenizer.from_pretrained("gtfintechlab/SubjECTiveQA-RELEVANT", do_lower_case=True, do_basic_tokenize=True) model = AutoModelForSequenceClassification.from_pretrained("gtfintechlab/SubjECTiveQA-RELEVANT", num_labels=3) config = AutoConfig.from_pretrained("gtfintechlab/SubjECTiveQA-RELEVANT") # Initialize the text classification pipeline classifier = pipeline('text-classification', model=model, tokenizer=tokenizer, config=config, framework="pt") # Classify the 'RELEVANT' attribute in your question-answer pairs qa_pairs = [ "Question: What are your company's projections for the next quarter? Answer: We anticipate a 10% increase in revenue due to the launch of our new product line.", "Question: Can you explain the recent decline in stock prices? Answer: Market fluctuations are normal, and we are confident in our long-term strategy." ] results = classifier(qa_pairs, batch_size=128, truncation="only_first") print(results) ``` In this script: - **Tokenizer and Model Loading:** The `AutoTokenizer` and `AutoModelForSequenceClassification` classes load the pre-trained tokenizer and model, respectively, from the `gtfintechlab/SubjECTiveQA-RELEVANT` repository. - **Configuration:** The `AutoConfig` class loads the model configuration, which includes parameters such as the number of labels. - **Pipeline Initialization:** The `pipeline` function initializes a text classification pipeline with the loaded model, tokenizer, and configuration. - **Classification:** The `classifier` processes a list of question-answer pairs to assess the 'RELEVANT' attribute. The `batch_size` parameter controls the number of samples processed simultaneously, and `truncation="only_first"` ensures that only the first sequence in each pair is truncated if it exceeds the model's maximum input length. Ensure that your environment has the necessary dependencies installed. ## Label Interpretation - **LABEL_0:** Negatively Demonstrative of 'RELEVANT' (0) Indicates that the response lacks relevance. - **LABEL_1:** Neutral Demonstration of 'RELEVANT' (1) Indicates that the response has an average level of relevance. - **LABEL_2:** Positively Demonstrative of 'RELEVANT' (2) Indicates that the response is highly relevant. ## Training Data The model was trained on the SubjECTive-QA dataset, which comprises question-answer pairs from financial contexts, annotated with various subjective attributes, including 'RELEVANT'. The dataset is divided into training, validation, and test sets, facilitating robust model training and evaluation. ## Citation If you use this model in your research, please cite the SubjECTive-QA dataset: ``` @article{SubjECTiveQA, title={SubjECTive-QA: Measuring Subjectivity in Earnings Call Transcripts’ QA Through Six-Dimensional Feature Analysis}, author={Huzaifa Pardawala, Siddhant Sukhani, Agam Shah, Veer Kejriwal, Abhishek Pillai, Rohan Bhasin, Andrew DiBiasio, Tarun Mandapati, Dhruv Adha, Sudheer Chava}, journal={arXiv preprint arXiv:2410.20651}, year={2024} } ``` For more details, refer to the [SubjECTive-QA dataset documentation](https://huggingface.co/datasets/gtfintechlab/SubjECTive-QA). ## Contact For any SubjECTive-QA related issues and questions, please contact: - Huzaifa Pardawala: huzaifahp7[at]gatech[dot]edu - Siddhant Sukhani: ssukhani3[at]gatech[dot]edu - Agam Shah: ashah482[at]gatech[dot]edu