boris commited on
Commit
5542365
·
1 Parent(s): 0e8338d

doc: update README

Browse files

Former-commit-id: ed59fc60cf2ffe870f02931c96dc114a7c87737b

Files changed (1) hide show
  1. README.md +62 -28
README.md CHANGED
@@ -1,42 +1,76 @@
1
- ## DALL-E Mini - Generate image from text
 
 
 
 
 
 
 
 
2
 
3
- ## Tentative Strategy of training (proposed by Luke and Suraj)
4
 
5
- ### Data:
6
- * [Conceptual 12M](https://github.com/google-research-datasets/conceptual-12m) Dataset (already loaded and preprocessed in TPU VM by Luke).
7
- * [YFCC100M Subset](https://github.com/openai/CLIP/blob/main/data/yfcc100m.md)
8
- * [Coneptual Captions 3M](https://github.com/google-research-datasets/conceptual-captions)
9
 
10
- ### Architecture:
11
- * Use the Taming Transformers VQ-GAN (with 16384 tokens)
12
- * Use a seq2seq (language encoder --> image decoder) model with a pretrained non-autoregressive encoder (e.g. BERT) and an autoregressive decoder (like GPT).
13
 
14
- ### Remaining Architecture Questions:
15
- * Whether to freeze the text encoder?
16
- * Whether to finetune the VQ-GAN?
17
- * Which text encoder to use (e.g. BERT, RoBERTa, etc.)?
18
- * Hyperparameter choices for the decoder (e.g. positional embedding, initialization, etc.)
19
 
20
- ## TODO
21
 
22
- * experiment with flax/jax and setup of the TPU instance that we should get shortly
23
- * work on dataset loading - [see suggested datasets](https://discuss.huggingface.co/t/dall-e-mini-version/7324/4)
24
- * Optionally create the OpenAI YFCC100M subset (see [this post](https://discuss.huggingface.co/t/dall-e-mini-version/7324/30?u=boris))
25
- * work on text/image encoding
26
- * concatenate inputs (not sure if we need fixed length for text or use a special token separating text & image)
27
- * adapt training script
28
- * create inference function
29
- * integrate CLIP for better results (only if we have the time)
30
- * work on a demo (streamlit or colab or maybe just HF widget)
31
- * document (set up repo on model hub per instructions, start on README writeup…)
32
- * help with coordinating activities & progress
33
 
 
34
 
35
- ## Dependencies Installation
36
- You should create a new python virtual environment and install the project dependencies inside the virtual env. You need to use the `-f` (`--find-links`) option for `pip` to be able to find the appropriate `libtpu` required for the TPU hardware:
 
 
 
 
 
 
 
 
 
37
 
38
  ```
39
  $ pip install -r requirements.txt -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
40
  ```
41
 
42
  If you use `conda`, you can create the virtual env and install everything using: `conda env update -f environments.yaml`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Dalle Mini
3
+ emoji: 🎨
4
+ colorFrom: red
5
+ colorTo: blue
6
+ sdk: streamlit
7
+ app_file: app/app.py
8
+ pinned: false
9
+ ---
10
 
11
+ # DALL-E Mini
12
 
13
+ _Generate images from a text prompt_
 
 
 
14
 
15
+ TODO: add some cool example
 
 
16
 
17
+ ## [Create my own images with the demo →](TODO)
 
 
 
 
18
 
19
+ ## How does it work?
20
 
21
+ Refer to [our report](TODO).
 
 
 
 
 
 
 
 
 
 
22
 
23
+ ## Development
24
 
25
+ This section is for the adventurous people wanting to look into the code.
26
+
27
+ ### Dependencies Installation
28
+
29
+ The root folder and associated `requirements.txt` is only for the app.
30
+
31
+ You will find necessary requirements in each sub-section.
32
+
33
+ You should create a new python virtual environment and install the project dependencies inside the virtual env. You need to use the `-f` (`--find-links`) option for `pip` to be able to find the appropriate `libtpu` required for the TPU hardware.
34
+
35
+ Adapt the installation to your own hardware and follow library installation instructions.
36
 
37
  ```
38
  $ pip install -r requirements.txt -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
39
  ```
40
 
41
  If you use `conda`, you can create the virtual env and install everything using: `conda env update -f environments.yaml`
42
+
43
+ ### Training of VQGAN
44
+
45
+ The VQGAN was trained using [taming-transformers](https://github.com/CompVis/taming-transformers).
46
+
47
+ We recommend using the latest version available.
48
+
49
+ ### Conversion of VQGAN to JAX
50
+
51
+ Use [patil-suraj/vqgan-jax](https://github.com/patil-suraj/vqgan-jax).
52
+
53
+ ### Training of Seq2Seq
54
+
55
+ Refer to `seq2seq` folder (some parameters may have been hardcoded for convenience when training on our TPU VM).
56
+
57
+ ### Inference
58
+
59
+ Refer to the demo notebooks.
60
+ TODO: add links
61
+
62
+ ## Authors
63
+
64
+ - [Boris Dayma](https://github.com/borisdayma)
65
+ - [Suraj Patil](https://github.com/patil-suraj)
66
+ - [Pedro Cuenca](https://github.com/pcuenca)
67
+ - [Khalid Saifullah](https://github.com/khalidsaifullaah)
68
+ - [Tanishq Abraham](https://github.com/tmabraham)
69
+ - [Phúc Lê Khắc](https://github.com/lkhphuc)
70
+ - [Luke Melas](https://github.com/lukemelas)
71
+ - [Ritobrata Ghosh](https://github.com/ghosh-r)
72
+
73
+ ## Acknowledgements
74
+
75
+ - 🤗 Hugging Face for organizing [the FLAX/JAX community week](https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects)
76
+ - Google Cloud team for providing access to TPU's