Small README update for clarification
Browse files
README.md
CHANGED
@@ -103,8 +103,8 @@ python main.py --config-path <path_to_config> --port <port_number> --host <host_
|
|
103 |
- `--quant-text-enc`: Quantize the T5 text encoder to the given dtype (`qint4`, `qfloat8`, `qint2`, `qint8`, `bf16`), if `bf16`, will not quantize (default: `qfloat8`).
|
104 |
- `--quant-ae`: Quantize the autoencoder with float8 linear layers, otherwise will use bfloat16 (default: False).
|
105 |
- `--offload-flow`: Offload the flow model to the CPU when not being used to save memory (default: False).
|
106 |
-
- `--no-offload-ae`: Disable offloading the autoencoder to the CPU when not being used to increase e2e inference speed (default: True).
|
107 |
-
- `--no-offload-text-enc`: Disable offloading the text encoder to the CPU when not being used to increase e2e inference speed (default: True).
|
108 |
- `--prequantized-flow`: Load the flow model from a prequantized checkpoint, which reduces the size of the checkpoint by about 50% & reduces startup time (default: False).
|
109 |
|
110 |
## Examples
|
|
|
103 |
- `--quant-text-enc`: Quantize the T5 text encoder to the given dtype (`qint4`, `qfloat8`, `qint2`, `qint8`, `bf16`), if `bf16`, will not quantize (default: `qfloat8`).
|
104 |
- `--quant-ae`: Quantize the autoencoder with float8 linear layers, otherwise will use bfloat16 (default: False).
|
105 |
- `--offload-flow`: Offload the flow model to the CPU when not being used to save memory (default: False).
|
106 |
+
- `--no-offload-ae`: Disable offloading the autoencoder to the CPU when not being used to increase e2e inference speed (default: True [implies it will offload, setting this flag sets it to False]).
|
107 |
+
- `--no-offload-text-enc`: Disable offloading the text encoder to the CPU when not being used to increase e2e inference speed (default: True [implies it will offload, setting this flag sets it to False]).
|
108 |
- `--prequantized-flow`: Load the flow model from a prequantized checkpoint, which reduces the size of the checkpoint by about 50% & reduces startup time (default: False).
|
109 |
|
110 |
## Examples
|