Transformers.js error: can't find model.onnx_data file
I'm having trouble initializing this model using Transformers.js / Node and the example that was provided by Xenova in the readme.
When loading the model using dtype: 'q4'
, the model.onnxq4 file will download to the cache, but then node crashes with no error message.
When I remove dtype
altogether, I get an error from onnx runtime that it cannot locate the model.onnx_data
file. I checked in the .cache and transformers only downloads the model.onnx
file. As a potential solution, I downloaded the repo and all the onnx models and placed the model.onnx_data
file into the cache manually, and even tried pointing the cache at a the repo folder, but the error persists:
Error: Exception during initialization: filesystem error: in file_size: No such file or directory ["model.onnx_data"]
at new OnnxruntimeSessionHandler (/Users/timsekiguchi/recall/node_modules/onnxruntime-node/dist/backend.js:27:92)
at Immediate.<anonymous> (/Users/timsekiguchi/recall/node_modules/onnxruntime-node/dist/backend.js:64:29)
at process.processImmediate (node:internal/timers:483:21)
Any suggestions of where to go from here, or what might be throwing? Has the implementation included by Xenova in the readme been tested?
Thank you!
All good! Thanks for replying. There was no error messages being generated by Transformers, but I believe it was an OOM error when trying to load the fp32 model. At this point I can't remember exactly. I ended up getting JINA to work by using ONNX runtime js directly. I ran into a couple issues though that prevented me from using it further:
- My use case is to generate embeddings for image and text separately, but the current ONNX models require both text and image input. This required me to pass dummy text / images when generating one or the other.
- I'm using a local machine with electron to generate embeddings. I did some performance comparisons, and on my mac m1 max, it was taking ~1700ms per image to generate embeddings, which unfortunately isn't feasible for a library of 50K+ images. I don't know if it's in the cards for Jina, but I would love to use a more optimized model that trades off accuracy for speed. For comparison, when running the standard ViT-B-32 model, inference time is 8-12ms per image
For 1 you can use zero-sized tensors https://huggingface.co/jinaai/jina-clip-v2/discussions/12#67445e1ae8ad555f8d307322
For 2 although jina-clip-v2 is a ViT-L14 model, this doesnt justify the huge difference in runtimes. Have you tried the fp16 model?