Transformers.js error: can't find model.onnx_data file

#22
by timsek - opened

I'm having trouble initializing this model using Transformers.js / Node and the example that was provided by Xenova in the readme.

When loading the model using dtype: 'q4', the model.onnxq4 file will download to the cache, but then node crashes with no error message.

When I remove dtype altogether, I get an error from onnx runtime that it cannot locate the model.onnx_data file. I checked in the .cache and transformers only downloads the model.onnx file. As a potential solution, I downloaded the repo and all the onnx models and placed the model.onnx_data file into the cache manually, and even tried pointing the cache at a the repo folder, but the error persists:

Error: Exception during initialization: filesystem error: in file_size: No such file or directory ["model.onnx_data"]
     at new OnnxruntimeSessionHandler (/Users/timsekiguchi/recall/node_modules/onnxruntime-node/dist/backend.js:27:92)
     at Immediate.<anonymous> (/Users/timsekiguchi/recall/node_modules/onnxruntime-node/dist/backend.js:64:29)
     at process.processImmediate (node:internal/timers:483:21)

Any suggestions of where to go from here, or what might be throwing? Has the implementation included by Xenova in the readme been tested?

Thank you!

Jina AI org

Hey @timsek , sorry for the late reply. Adding @Xenova to the loop

All good! Thanks for replying. There was no error messages being generated by Transformers, but I believe it was an OOM error when trying to load the fp32 model. At this point I can't remember exactly. I ended up getting JINA to work by using ONNX runtime js directly. I ran into a couple issues though that prevented me from using it further:

  1. My use case is to generate embeddings for image and text separately, but the current ONNX models require both text and image input. This required me to pass dummy text / images when generating one or the other.
  2. I'm using a local machine with electron to generate embeddings. I did some performance comparisons, and on my mac m1 max, it was taking ~1700ms per image to generate embeddings, which unfortunately isn't feasible for a library of 50K+ images. I don't know if it's in the cards for Jina, but I would love to use a more optimized model that trades off accuracy for speed. For comparison, when running the standard ViT-B-32 model, inference time is 8-12ms per image
Jina AI org

For 2 although jina-clip-v2 is a ViT-L14 model, this doesnt justify the huge difference in runtimes. Have you tried the fp16 model?

Sign up or log in to comment