It looks like the config file at '/tmp/tmpjk2l9r6n' is not a valid JSON file.

by burcukoc - opened Apr 3, 2023

Apr 3, 2023

Hi,

I am trying to run the model and I’m getting this error when running select & load model.

Downloading (…)ptain-1337/CrudeBERT:
55.6k/? [00:00<00:00, 1.99MB/s]

JSONDecodeError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/transformers/configuration_utils.py in _get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
657 # Load config dict
--> 658 config_dict = cls._dict_from_json_file(resolved_config_file)
659 config_dict["_commit_hash"] = commit_hash

8 frames
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/transformers/configuration_utils.py in _get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
659 config_dict["_commit_hash"] = commit_hash
660 except (json.JSONDecodeError, UnicodeDecodeError):
--> 661 raise EnvironmentError(
662 f"It looks like the config file at '{resolved_config_file}' is not a valid JSON file."
663 )

OSError: It looks like the config file at '/tmp/tmpjk2l9r6n' is not a valid JSON file.

Can you help me with that? Thank you.

Captain-1337

Owner Apr 3, 2023

Hi,

I believe that error is caused by my side.
I'll upload the files for JSON and config again.
Please let me know if that works for you.

Best Regards,

Captain-1337

Owner Apr 3, 2023

Hopefully it will work now :)

rjwm

Apr 24, 2023

hi Captain-1337! Read your paper, looks cool, I'd like to play with your model, however I'm seeing this on the Hosted Inference API: Can't load tokenizer using from_pretrained, please update its configuration: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Captain-1337

Owner Apr 24, 2023

Dear @rjwm

I appreciate your interest.
I'll try to fix it by Wednesday and notify you once done.

Best Regards

Rominahashami

Mar 26, 2024

Hi,

I would like to apply this model to my dataset, but I Can't load tokenizer using from_pretrained, and get the same error:
stat: path should be string, bytes, os.PathLike or integer, not NoneType

can you please help me?
best regards

Captain-1337

Owner Mar 26, 2024

•

edited Mar 26, 2024

Hi,

I believe that error is unrelated to the tokenizer.
Try to set the path the following way:

sys.path.append('C:/Users/USERNAME/Desktop/finbert')
project_dir = Path.cwd().parent

path = project_dir/'Language_Models'/'CrudeBERT'

Warmest regards

Captain-1337

Owner Oct 1, 2024

•

edited Oct 1, 2024

Here is a quick guide on how you can use CrudeBERT

Step one:

Download the two files (crude_bert_config.json and crude_bert_model.bin)
from https://huggingface.co/Captain-1337/CrudeBERT/tree/main

Step two:

Create a Jupyter Notebook script in the same folder where the files are stored and include the code mentioned below:

Code:

import torch
from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer
import numpy as np
import pandas as pd

List of example headlines

headlines = [
"Major Explosion, Fire at Oil Refinery in Southeast Philadelphia",
"PETROLEOS confirms Gulf of Mexico oil platform accident",
"CASUALTIES FEARED AT OIL ACCIDENT NEAR IRANS BORDER",
"EIA Chief expects Global Oil Demand Growth 1 M B/D to 2011",
"Turkey Jan-Oct Crude Imports +98.5% To 57.9M MT",
"China’s crude oil imports up 78.30% in February 2019",
"Russia Energy Agency: Sees Oil Output put Flat In 2005",
"Malaysia Oil Production Steady This Year At 700,000 B/D",
"ExxonMobil:Nigerian Oil Output Unaffected By Union Threat",
"Yukos July Oil Output Flat On Mo, 1.73M B/D - Prime-Tass",
"2nd UPDATE: Mexico’s Oil Output Unaffected By Hurricane",
"UPDATE: Ecuador July Oil Exports Flat On Mo At 337,000 B/D",
"China February Crude Imports -16.0% On Year",
"Turkey May Crude Imports down 11.0% On Year",
"Japan June Crude Oil Imports decrease 10.9% On Yr",
"Iran’s Feb Oil Exports +20.9% On Mo at 1.56M B/D - Official",
"Apache announces large petroleum discovery in Philadelphia",
"Turkey finds oil near Syria, Iraq border"
]
example_headlines = pd.DataFrame(headlines, columns=["Headline"])

config_path = './crude_bert_config.json'
model_path = './crude_bert_model.bin'

Load the configuration

config = AutoConfig.from_pretrained(config_path)

Create the model from the configuration

model = AutoModelForSequenceClassification.from_config(config)

Load the model's state dictionary

state_dict = torch.load(model_path)

Inspect keys, if "bert.embeddings.position_ids" is unexpected, remove or adjust it

state_dict.pop("bert.embeddings.position_ids", None)

Load the adjusted state dictionary into the model

model.load_state_dict(state_dict, strict=False) # Using strict=False to ignore non-critical mismatches

Load the tokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

Define the prediction function

def predict_to_df(texts, model, tokenizer):
model.eval()
data = []
for text in texts:
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
softmax_scores = torch.nn.functional.softmax(logits, dim=-1)
pred_label_id = torch.argmax(softmax_scores, dim=-1).item()
class_names = ['positive', 'negative', 'neutral']
predicted_label = class_names[pred_label_id]
data.append([text, predicted_label])
df = pd.DataFrame(data, columns=["Headline", "Classification"])
return df

Create DataFrame

example_headlines = pd.DataFrame(headlines, columns=["Headline"])

Apply classification

result_df = predict_to_df(example_headlines['Headline'].tolist(), model, tokenizer)
result_df

Step three:

Execute the cells of the Jupyter Notebook.

If you face any difficulties or have other questions, contact me here or on LinkedIn.

FYI: I took the example headlines from one of our recent publications:

So, your classification output should reflect this as well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

It looks like the config file at '/tmp/tmpjk2l9r6n' is not a valid JSON file.

Downloading (…)ptain-1337/CrudeBERT:55.6k/? [00:00<00:00, 1.99MB/s]

Step one:

Step two:

Code:

List of example headlines

Load the configuration

Create the model from the configuration

Load the model's state dictionary

Inspect keys, if "bert.embeddings.position_ids" is unexpected, remove or adjust it

Load the adjusted state dictionary into the model

Load the tokenizer

Define the prediction function

Create DataFrame

Apply classification

Step three:

Downloading (…)ptain-1337/CrudeBERT:
55.6k/? [00:00<00:00, 1.99MB/s]