SocialCompUW
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -37,10 +37,13 @@ The dataset was split 80-10-10 across the train (N=2180), validation (N=272), an
|
|
37 |
|
38 |
To get started, you should initialize the model using AutoTokenizer and AutoModelForSequenceClassification classes. For the tokenizer, set "use_fast" parameter to False, the max_len to 1024, padding to "max_length," and truncation to True. For the model, set the "num_labels" parameter to 3.
|
39 |
|
40 |
-
Next, with a YouTube video dataset with metadata, please concatenate each video's title, description, transcripts, and tags in the following manner:
|
|
|
41 |
input = 'VIDEO TITLE: ' + title + '\nVIDEO DESCRIPTION: ' + description + '\nVIDEO TRANSCRIPT: ' + transcript + '\nVIDEO TAGS: ' + tags
|
42 |
|
43 |
-
Thus, each video in your dataset should have its input metadata formatted in the structure above. Finally, run the input into a tokenizer and feed the tokenized input into the model to obtain one of three predicted labels. Use the logit function to obtain the label:
|
|
|
|
|
44 |
|
45 |
## Training Data
|
46 |
|
|
|
37 |
|
38 |
To get started, you should initialize the model using AutoTokenizer and AutoModelForSequenceClassification classes. For the tokenizer, set "use_fast" parameter to False, the max_len to 1024, padding to "max_length," and truncation to True. For the model, set the "num_labels" parameter to 3.
|
39 |
|
40 |
+
Next, with a YouTube video dataset with metadata, please concatenate each video's title, description, transcripts, and tags in the following manner:
|
41 |
+
|
42 |
input = 'VIDEO TITLE: ' + title + '\nVIDEO DESCRIPTION: ' + description + '\nVIDEO TRANSCRIPT: ' + transcript + '\nVIDEO TAGS: ' + tags
|
43 |
|
44 |
+
Thus, each video in your dataset should have its input metadata formatted in the structure above. Finally, run the input into a tokenizer and feed the tokenized input into the model to obtain one of three predicted labels. Use the logit function to obtain the label:
|
45 |
+
|
46 |
+
_, pred_idx = outputs.logits.max(dim=1)
|
47 |
|
48 |
## Training Data
|
49 |
|