|
Description: |
|
|
|
- trained on text classification of type of net zero target. Text is from company ESG reports, data is labelled by Net Zero Tracker. |
|
- text was truncated to 128 tokens before tokenization. |
|
- |
|
|
|
|
|
|
|
Problems: |
|
- keeps outputting the same label regardless of input |
|
- The text column is quite unstructured, varies in lenghth, some include/don't include URL, some include excerpts from ESG report, etc... |
|
- truncation might have resulted in loss of data |
|
- should try text generation task instead |
|
- too many labels makes model behave poorly. |
|
|
|
|
|
|
|
Moving Forward: |
|
|
|
- better text preprocessing, remove urls, etc... |
|
- change task to text generation. Might perform better (This means ClimateBert cannot be used as base model.) |
|
- |