MaartenGr commited on
Commit
2cc15b6
1 Parent(s): 0a9c604

Update docs

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -11,6 +11,10 @@ pipeline_tag: text-classification
11
  This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
  BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
 
 
 
 
 
14
  ## Usage
15
 
16
  To use this model, please install BERTopic:
@@ -28,6 +32,12 @@ topic_model = BERTopic.load("MaartenGr/Wikipedia")
28
  topic_model.get_topic_info()
29
  ```
30
 
 
 
 
 
 
 
31
  ## Topic overview
32
 
33
  * Number of topics: 2377
 
11
  This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
  BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
 
14
+ * Trained on ~1_000_000 Wikipedia pages (first paragraph of each page).
15
+ * Data was retrieved from: https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings
16
+
17
+
18
  ## Usage
19
 
20
  To use this model, please install BERTopic:
 
32
  topic_model.get_topic_info()
33
  ```
34
 
35
+ ## Topics 2D
36
+
37
+ The top 50 topics visualized and reduced to 2-dimensional space using cuML's UMAP:
38
+
39
+ !["visualization.png"](visualization.png)
40
+
41
  ## Topic overview
42
 
43
  * Number of topics: 2377