๐Ÿž๐Ÿœ๐Ÿ›My Submission has Failed [REPORT HERE]

#2
by picekl - opened
Bohemian Visual Recognition Alliance org

Before writing to this thread! Ask yourself the following questions and write just if you answer YES to all of them.

  1. Did my code run in a local environment?
  2. Did I use python 3.10?
  3. Did you push your code to HugginFace as a model?
  4. Can your model process 10,000 images in 60 minutes in a similar environment like t4-small?
  5. Did you use any non-standard library? [if yes, go to a different thread and request it]

โ—Please provide your submission ID while asking for an Error Log โ—

picekl pinned discussion
This comment has been hidden
This comment has been hidden
Bohemian Visual Recognition Alliance org
This comment has been hidden
This comment has been hidden
Bohemian Visual Recognition Alliance org
This comment has been hidden
Bohemian Visual Recognition Alliance org
This comment has been hidden
This comment has been hidden

I reduced the batch size because I wasn't sure what was failing. This is genuinely disheartening that huggingface is so terrible and that the competition had so many issues as a result. It was a fun project that was well motivated but unfortunately did not come to fruition. Let me know when you open the late submission. I hope in the future another competition platform can be considered instead.

Can I get the stack trace for a45d4a7e-be41-475f-917c-a3612b574669? Apologies.

Bohemian Visual Recognition Alliance org

Can I get the stack trace for a45d4a7e-be41-475f-917c-a3612b574669? Apologies.

   raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

We do not have the budget for the GPU inference, so we had to switch it to CPU only.

LP

Very strange. I'll have to do some digging. I explicitly set the device to "cpu" since I read your note elsewhere about the CPU restriction.

Bohemian Visual Recognition Alliance org

@jack-etheredge ,

Try this one:

my_model = net.load_state_dict(torch.load('classifier.pt', map_location=torch.device('cpu')))

LP

@picekl Looks like that probably did it. I guess in the future I need to make sure my device setting makes its way to setting the map_location param.

If I have a dollar for everytime huggingfail dies on me I would have $20 by now. Can you check why this run failed? @picekl

74758e89-d7aa-46ba-a8ea-030a318eb03f

I made sure everything is on CPU

Bohemian Visual Recognition Alliance org
โ€ข
edited Jun 2, 2024

If I have a dollar for everytime huggingfail dies on me I would have $20 by now. Can you check why this run failed? @picekl

74758e89-d7aa-46ba-a8ea-030a318eb03f

I made sure everything is on CPU

Hi @chychiu , and welcome back! Looks like this is still the same problem as previously.

  0%|          | 1/XXXXX. [00:49<182:27:29, 49.00s/it]2024-06-01 21:58:27.565 | ERROR    | __main__:generate_submission_file:72 - Subprocess didn't terminate successfully

Can you contact me via email? I might have a solution, but I do not want to bother the rest of the people.

Best,
Lukas

Sign up or log in to comment