Possibility to use on CUDA?

#12
by mstachow - opened

CPU generation is fairly slow. Can I move this model to CUDA?

You can do it like this

from transformers import AutoProcessor, AutoModel

processor = AutoProcessor.from_pretrained("suno/bark-small")
model = AutoModel.from_pretrained("suno/bark-small").to("cuda")

inputs = processor(
    text=["Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."],
    return_tensors="pt",
)
inputs = inputs.to("cuda")
speech_values = model.generate(**inputs, do_sample=True)

from IPython.display import Audio

sampling_rate = model.generation_config.sample_rate
Audio(speech_values.cpu().numpy().squeeze(), rate=sampling_rate)

Thank you!

I tried this and I get this error, any help please! :


AssertionError Traceback (most recent call last)
Cell In[15], line 4
1 from transformers import AutoProcessor, AutoModel
3 processor = AutoProcessor.from_pretrained("suno/bark-small")
----> 4 model = AutoModel.from_pretrained("suno/bark-small").to("cuda")
6 inputs = processor(
7 text=["Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."],
8 return_tensors="pt",
9 )
10 inputs = inputs.to("cuda")

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py:2014, in PreTrainedModel.to(self, *args, **kwargs)
2009 raise ValueError(
2010 ".to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the"
2011 " model has already been set to the correct devices and casted to the correct dtype."
2012 )
2013 else:
-> 2014 return super().to(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:1145, in Module.to(self, *args, **kwargs)
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
-> 1145 return self._apply(convert)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
--> 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(...)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
--> 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(...)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:820, in Module._apply(self, fn)
816 # Tensors stored in modules are graph leaves, and we don't want to
817 # track autograd history of param_applied, so we have to use
818 # with torch.no_grad():
819 with torch.no_grad():
--> 820 param_applied = fn(param)
821 should_use_set_data = compute_should_use_set_data(param, param_applied)
822 if should_use_set_data:

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:1143, in Module.to..convert(t)
1140 if convert_to_format is not None and t.dim() in (4, 5):
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
-> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\cuda_init_.py:239, in _lazy_init()
235 raise RuntimeError(
236 "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
237 "multiprocessing, you must use the 'spawn' start method")
238 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 239 raise AssertionError("Torch not compiled with CUDA enabled")
240 if _cudart is None:
241 raise AssertionError(
242 "libcudart functions unavailable. It looks like you have a broken build?")

AssertionError: Torch not compiled with CUDA enabled

I tried running this command to see if CUDA is enabled :

python -c "import torch; print(torch.cuda.is_available())"

I get "False".

I tried installing CUDA with pip installs but I get the same error?

What's your os? Did you install the CUDA version of torch?

Windows 11.

Im not sure I have installed loads of dependencies. I did check and it was on my system. Anyway you could share commands that I can install the correct version of cuda plus any dependencies im missing?

It's not CUDA I think it's torch. If you just do pip install torch it isn't the right version. Google pytorch install with CUDA and you will get to the right website with exactly the code you need

Ok ill try that thanks!

I am getting the same assertion error.
I did what you said and I then checked my version and I get this below (note : I am still getting the assertion error) :

Name: torch
Version: 2.0.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: C:\Users\Greg\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: filelock, jinja2, networkx, sympy, typing-extensions
Required-by: accelerate, fairscale, optimum, speechbrain, torchaudio, torchvision

Here is full error :

C:\Users\Greg\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm

AssertionError Traceback (most recent call last)
Cell In[1], line 4
1 from transformers import AutoProcessor, AutoModel
3 processor = AutoProcessor.from_pretrained("suno/bark-small")
----> 4 model = AutoModel.from_pretrained("suno/bark-small").to("cuda")
6 inputs = processor(
7 text=["Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."],
8 return_tensors="pt",
9 )
10 inputs = inputs.to("cuda")

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py:2014, in PreTrainedModel.to(self, *args, **kwargs)
2009 raise ValueError(
2010 ".to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the"
2011 " model has already been set to the correct devices and casted to the correct dtype."
2012 )
2013 else:
-> 2014 return super().to(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:1145, in Module.to(self, *args, **kwargs)
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
-> 1145 return self._apply(convert)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
--> 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(...)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
--> 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(...)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:820, in Module._apply(self, fn)
816 # Tensors stored in modules are graph leaves, and we don't want to
817 # track autograd history of param_applied, so we have to use
818 # with torch.no_grad():
819 with torch.no_grad():
--> 820 param_applied = fn(param)
821 should_use_set_data = compute_should_use_set_data(param, param_applied)
822 if should_use_set_data:

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py:1143, in Module.to..convert(t)
1140 if convert_to_format is not None and t.dim() in (4, 5):
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
-> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\cuda_init_.py:239, in _lazy_init()
235 raise RuntimeError(
236 "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
237 "multiprocessing, you must use the 'spawn' start method")
238 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 239 raise AssertionError("Torch not compiled with CUDA enabled")
240 if _cudart is None:
241 raise AssertionError(
242 "libcudart functions unavailable. It looks like you have a broken build?")

AssertionError: Torch not compiled with CUDA enabled

"Torch not compiled with CUDA enabled" means your torch version is not compatible with your cuda version or your nvidia driver. download a gpu version torch instead of a cpu version. https://download.pytorch.org/whl/torch_stable.html
use "nvidia-smi" and "nvcc -V" cmd to check cuda and nvidia driver

When adding voice_preset argument to the processor, this script throws a runtime error:
```

RuntimeError Traceback (most recent call last)
in <cell line: 9>()
7 inputs = inputs.to('cuda')
8
----> 9 speech_output = model.generate(**inputs, do_sample=True).cpu().numpy().squeeze()
10 Audio(speech_output, rate=sampling_rate)

6 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2549 # remove once script supports set_grad_enabled
2550 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 2551 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
2552
2553

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
```

Sign up or log in to comment