|
|
|
--- |
|
tags: |
|
- bertopic |
|
library_name: bertopic |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# cnn_dailymail_123_3000_1500_train |
|
|
|
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. |
|
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. |
|
|
|
## Usage |
|
|
|
To use this model, please install BERTopic: |
|
|
|
``` |
|
pip install -U bertopic |
|
``` |
|
|
|
You can use the model as follows: |
|
|
|
```python |
|
from bertopic import BERTopic |
|
topic_model = BERTopic.load("KingKazma/cnn_dailymail_123_3000_1500_train") |
|
|
|
topic_model.get_topic_info() |
|
``` |
|
|
|
## Topic overview |
|
|
|
* Number of topics: 57 |
|
* Number of training documents: 3000 |
|
|
|
<details> |
|
<summary>Click here for an overview of all topics.</summary> |
|
|
|
| Topic ID | Topic Keywords | Topic Frequency | Label | |
|
|----------|----------------|-----------------|-------| |
|
| -1 | said - one - police - people - year | 10 | -1_said_one_police_people | |
|
| 0 | league - player - cup - goal - game | 1070 | 0_league_player_cup_goal | |
|
| 1 | police - said - home - murder - found | 320 | 1_police_said_home_murder | |
|
| 2 | court - mr - said - year - sex | 142 | 2_court_mr_said_year | |
|
| 3 | obama - president - republicans - house - republican | 113 | 3_obama_president_republicans_house | |
|
| 4 | plane - flight - passenger - airport - aircraft | 89 | 4_plane_flight_passenger_airport | |
|
| 5 | hospital - care - family - baby - mr | 59 | 5_hospital_care_family_baby | |
|
| 6 | fashion - dress - style - look - collection | 57 | 6_fashion_dress_style_look | |
|
| 7 | mr - minister - cameron - party - labour | 50 | 7_mr_minister_cameron_party | |
|
| 8 | weight - diet - food - fat - school | 49 | 8_weight_diet_food_fat | |
|
| 9 | mars - space - climate - nasa - mission | 43 | 9_mars_space_climate_nasa | |
|
| 10 | apple - ipad - iphone - app - apples | 41 | 10_apple_ipad_iphone_app | |
|
| 11 | shark - dolphin - fish - coast - water | 39 | 11_shark_dolphin_fish_coast | |
|
| 12 | teacher - school - student - said - state | 37 | 12_teacher_school_student_said | |
|
| 13 | murray - wimbledon - win - champion - match | 36 | 13_murray_wimbledon_win_champion | |
|
| 14 | race - prix - hamilton - gold - world | 33 | 14_race_prix_hamilton_gold | |
|
| 15 | dog - animal - owner - dogs - tiger | 32 | 15_dog_animal_owner_dogs | |
|
| 16 | syrian - syria - isis - islamic - force | 32 | 16_syrian_syria_isis_islamic | |
|
| 17 | storm - weather - lava - snow - said | 32 | 17_storm_weather_lava_snow | |
|
| 18 | chocolate - sale - cent - online - caramel | 32 | 18_chocolate_sale_cent_online | |
|
| 19 | afghanistan - afghan - pakistan - herat - taliban | 32 | 19_afghanistan_afghan_pakistan_herat | |
|
| 20 | music - band - halen - song - album | 30 | 20_music_band_halen_song | |
|
| 21 | beach - island - resort - park - hotel | 29 | 21_beach_island_resort_park | |
|
| 22 | mcilroy - golf - round - shot - hole | 27 | 22_mcilroy_golf_round_shot | |
|
| 23 | text - data - nsa - credit - email | 26 | 23_text_data_nsa_credit | |
|
| 24 | show - film - movie - actor - griffiths | 26 | 24_show_film_movie_actor | |
|
| 25 | putin - russian - russia - ukraine - moscow | 26 | 25_putin_russian_russia_ukraine | |
|
| 26 | art - artist - work - painting - pinata | 25 | 26_art_artist_work_painting | |
|
| 27 | economy - eurozone - european - euro - debt | 24 | 27_economy_eurozone_european_euro | |
|
| 28 | north - kim - korea - korean - jong | 24 | 28_north_kim_korea_korean | |
|
| 29 | ebola - virus - liberia - africa - outbreak | 22 | 29_ebola_virus_liberia_africa | |
|
| 30 | bike - speed - road - driver - cyclist | 22 | 30_bike_speed_road_driver | |
|
| 31 | car - accident - driver - scene - crash | 20 | 31_car_accident_driver_scene | |
|
| 32 | price - london - house - home - property | 20 | 32_price_london_house_home | |
|
| 33 | al - qaeda - yemen - us - yemeni | 20 | 33_al_qaeda_yemen_us | |
|
| 34 | mrs - police - murder - greaves - mr | 20 | 34_mrs_police_murder_greaves | |
|
| 35 | per - cent - people - age - average | 19 | 35_per_cent_people_age | |
|
| 36 | philpott - court - berry - husband - dewani | 18 | 36_philpott_court_berry_husband | |
|
| 37 | facebook - photo - user - instagram - cuddle | 17 | 37_facebook_photo_user_instagram | |
|
| 38 | vaccine - meningitis - disease - flu - princeton | 17 | 38_vaccine_meningitis_disease_flu | |
|
| 39 | bear - lion - gorilla - cub - zoo | 16 | 39_bear_lion_gorilla_cub | |
|
| 40 | brain - drug - alzheimers - memory - patient | 16 | 40_brain_drug_alzheimers_memory | |
|
| 41 | prince - royal - queen - duchess - duke | 16 | 41_prince_royal_queen_duchess | |
|
| 42 | boat - ship - river - vessel - ferry | 15 | 42_boat_ship_river_vessel | |
|
| 43 | china - chinese - chinas - organ - hong | 14 | 43_china_chinese_chinas_organ | |
|
| 44 | egypt - election - egyptian - mubarak - protest | 13 | 44_egypt_election_egyptian_mubarak | |
|
| 45 | mexico - mexican - cartel - mexicos - drug | 13 | 45_mexico_mexican_cartel_mexicos | |
|
| 46 | cia - assange - snowden - us - interrogation | 13 | 46_cia_assange_snowden_us | |
|
| 47 | police - hartman - hore - store - maitua | 13 | 47_police_hartman_hore_store | |
|
| 48 | israeli - israel - palestinian - gaza - hamas | 12 | 48_israeli_israel_palestinian_gaza | |
|
| 49 | pension - tax - scheme - energy - cent | 12 | 49_pension_tax_scheme_energy | |
|
| 50 | council - neighbour - village - site - shed | 12 | 50_council_neighbour_village_site | |
|
| 51 | occupy - protester - york - cosby - mayor | 11 | 51_occupy_protester_york_cosby | |
|
| 52 | mould - allergic - allergy - reaction - hand | 11 | 52_mould_allergic_allergy_reaction | |
|
| 53 | boko - haram - nigeria - sudan - isis | 11 | 53_boko_haram_nigeria_sudan | |
|
| 54 | disaster - building - tsunami - people - quake | 11 | 54_disaster_building_tsunami_people | |
|
| 55 | castro - sloot - der - ariel - aruba | 11 | 55_castro_sloot_der_ariel | |
|
|
|
</details> |
|
|
|
## Training hyperparameters |
|
|
|
* calculate_probabilities: True |
|
* language: english |
|
* low_memory: False |
|
* min_topic_size: 10 |
|
* n_gram_range: (1, 1) |
|
* nr_topics: None |
|
* seed_topic_list: None |
|
* top_n_words: 10 |
|
* verbose: False |
|
|
|
## Framework versions |
|
|
|
* Numpy: 1.22.4 |
|
* HDBSCAN: 0.8.33 |
|
* UMAP: 0.5.3 |
|
* Pandas: 1.5.3 |
|
* Scikit-Learn: 1.2.2 |
|
* Sentence-transformers: 2.2.2 |
|
* Transformers: 4.31.0 |
|
* Numba: 0.56.4 |
|
* Plotly: 5.13.1 |
|
* Python: 3.10.6 |
|
|