hubertsiuzdak
/

snac_44khz

Inference Endpoints

Model card Files Files and versions Community

hubertsiuzdak commited on Feb 27, 2024

Commit

4bb02aa

·

verified ·

1 Parent(s): e715b4f

Update README.md

Files changed (1) hide show

README.md +8 -12

README.md CHANGED Viewed

@@ -1,10 +1,12 @@
 ---
 license: mit
 ---
-# [WIP] SNAC 🍿
-Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compressess 44.1 kHz audio into discrete codes at a low bitrate.
 See GitHub repository: https://github.com/hubertsiuzdak/snac/
@@ -13,9 +15,8 @@ See GitHub repository: https://github.com/hubertsiuzdak/snac/
 SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently,
 covering a broader time span.
-This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to
-audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a
-consistent structure of an audio track for ~3 minutes.
 ## Usage
@@ -24,11 +25,6 @@ Install it using:
 ```bash
 pip install snac
 ```
-A pretrained model that compresses audio into discrete codes at a 2.2 kbps bitrate is available
-at [Hugging Face](https://huggingface.co/hubertsiuzdak/snac). It uses 4 RVQ levels with token rates of 12.5, 25, 50, and
-100 Hz.
 To encode (and reconstruct) audio with SNAC in Python, use the following code:
 ```python
@@ -47,9 +43,9 @@ resolution.
 ```
 >>> [code.shape[1] for code in codes]
-[13, 26, 52, 104]
 ```
 ## Acknowledgements
-Module definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).

 ---
 license: mit
+tags:
+- audio
 ---
+# SNAC 🍿
+Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compressess audio into discrete codes at a low bitrate.
 See GitHub repository: https://github.com/hubertsiuzdak/snac/
 SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently,
 covering a broader time span.
+This model compresses 44 kHz audio into discrete codes at a 2.6 kbps bitrate. It uses 4 RVQ levels with token rates of 14, 29, 57, and
+115 Hz.
 ## Usage
 ```bash
 pip install snac
 ```
 To encode (and reconstruct) audio with SNAC in Python, use the following code:
 ```python
 ```
 >>> [code.shape[1] for code in codes]
+[16, 32, 64, 128]
 ```
 ## Acknowledgements
+Module definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).