hubertsiuzdak commited on
Commit
b67b9e8
·
verified ·
1 Parent(s): 31ec661

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -1,3 +1,55 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # [WIP] SNAC 🍿
6
+
7
+ Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compressess 44.1 kHz audio into discrete codes at a low bitrate.
8
+
9
+ See GitHub repository: https://github.com/hubertsiuzdak/snac/
10
+
11
+ ## Overview
12
+
13
+ SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently,
14
+ covering a broader time span.
15
+
16
+ This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to
17
+ audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a
18
+ consistent structure of an audio track for ~3 minutes.
19
+
20
+ ## Usage
21
+
22
+ Install it using:
23
+
24
+ ```bash
25
+ pip install snac
26
+ ```
27
+
28
+ A pretrained model that compresses audio into discrete codes at a 2.2 kbps bitrate is available
29
+ at [Hugging Face](https://huggingface.co/hubertsiuzdak/snac). It uses 4 RVQ levels with token rates of 12.5, 25, 50, and
30
+ 100 Hz.
31
+
32
+ To encode (and reconstruct) audio with SNAC in Python, use the following code:
33
+
34
+ ```python
35
+ import torch
36
+ from snac import SNAC
37
+
38
+ model = SNAC.from_pretrained("hubertsiuzdak/snac").eval().cuda()
39
+ audio = torch.randn(1, 1, 44100).cuda() # B, 1, T
40
+
41
+ with torch.inference_mode():
42
+ audio_hat, _, codes, _, _ = model(audio)
43
+ ```
44
+
45
+ ⚠️ Note that `codes` is a list of token sequences of variable lengths, each corresponding to a different temporal
46
+ resolution.
47
+
48
+ ```
49
+ >>> [code.shape[1] for code in codes]
50
+ [13, 26, 52, 104]
51
+ ```
52
+
53
+ ## Acknowledgements
54
+
55
+ Module definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).