china-only-mar11 / README.md

Add BERTopic model

d17e577 verified 10 months ago

3.59 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# china-only-mar11

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("Thang203/china-only-mar11")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 20
	* Number of training documents: 847

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| language - llms - models - data - large \| 21 \| -1_language_llms_models_data \|
	\| 0 \| visual - image - multimodal - models - language \| 205 \| 0_visual_image_multimodal_models \|
	\| 1 \| embodied - driving - navigation - robot - robotic \| 142 \| 1_embodied_driving_navigation_robot \|
	\| 2 \| recommendation - user - recommendations - systems - behavior \| 16 \| 2_recommendation_user_recommendations_systems \|
	\| 3 \| agents - social - bots - interactions - ai agents \| 16 \| 3_agents_social_bots_interactions \|
	\| 4 \| rl - reinforcement learning - reinforcement - learning - policy \| 15 \| 4_rl_reinforcement learning_reinforcement_learning \|
	\| 5 \| molecular - design - property - prediction - gnns \| 17 \| 5_molecular_design_property_prediction \|
	\| 6 \| code - code generation - generation - software - programming \| 11 \| 6_code_code generation_generation_software \|
	\| 7 \| medical - knowledge - medical knowledge - llms - language \| 73 \| 7_medical_knowledge_medical knowledge_llms \|
	\| 8 \| extraction - information extraction - event - information - relation \| 16 \| 8_extraction_information extraction_event_information \|
	\| 9 \| safety - llms - robustness - instructions - assurance \| 15 \| 9_safety_llms_robustness_instructions \|
	\| 10 \| reasoning - prompting - cot - llms - chainofthought \| 14 \| 10_reasoning_prompting_cot_llms \|
	\| 11 \| knowledge - language - knowledge graph - web - kg \| 52 \| 11_knowledge_language_knowledge graph_web \|
	\| 12 \| question - answering - commonsense - question answering - knowledge \| 17 \| 12_question_answering_commonsense_question answering \|
	\| 13 \| models - language - model - training - language models \| 18 \| 13_models_language_model_training \|
	\| 14 \| dialogue - dialog - models - responses - model \| 104 \| 14_dialogue_dialog_models_responses \|
	\| 15 \| detection - fake - news - detectors - texts \| 31 \| 15_detection_fake_news_detectors \|
	\| 16 \| chatgpt - sentiment - evaluation - sentiment analysis - human \| 16 \| 16_chatgpt_sentiment_evaluation_sentiment analysis \|
	\| 17 \| chinese - evaluation - models - language - language models \| 22 \| 17_chinese_evaluation_models_language \|
	\| 18 \| translation - arabic - languages - language - models \| 26 \| 18_translation_arabic_languages_language \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: english
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: 20
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True
	* zeroshot_min_similarity: 0.7
	* zeroshot_topic_list: None

	## Framework versions

	* Numpy: 1.25.2
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.5
	* Pandas: 1.5.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.6.1
	* Transformers: 4.38.2
	* Numba: 0.58.1
	* Plotly: 5.15.0
	* Python: 3.10.12