TeresaK commited on
Commit
5d4054c
·
verified ·
1 Parent(s): a0713be

Upload 35 files

Browse files
.gitattributes CHANGED
@@ -1,35 +1,2 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ database/document_store.pkl filter=lfs diff=lfs merge=lfs -text
2
+ data/inc_df_v6_small_4.csv filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.gitignore ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DS Store
2
+ data/.DS_Store
3
+ .DS_Store
4
+
5
+ __pycache__/
6
+
7
+ # sandbox
8
+ sandbox.py
9
+
10
+ # Enviroment Files
11
+ .env
12
+ .incenv
13
+
14
+ archive/
15
+
16
+ .env_example
17
+ .pre-commit-config.yaml
18
+ incenv/
19
+
20
+
21
+
.streamlit/config.toml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ [theme]
2
+ primaryColor="#b50d1c"
3
+ backgroundColor="#FFFFFF"
4
+ secondaryBackgroundColor="#eceded"
5
+ textColor="#262730"
6
+ font="sans serif"
7
+ base="light"
README.md CHANGED
@@ -1,12 +1,82 @@
1
- ---
2
- title: NegotiateAI
3
- emoji: 🏆
4
- colorFrom: green
5
- colorTo: green
6
- sdk: streamlit
7
- sdk_version: 1.38.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # INC Plastic Treaty App
2
+
3
+ ## Set up Development
4
+
5
+ ### Install poetry environment
6
+
7
+ Please install [poetry]("https://python-poetry.org/docs/") first before executing the following commands
8
+
9
+ ```bash
10
+ poetry install
11
+ ``````
12
+
13
+ #### Install pre-commit hooks
14
+
15
+ ```bash
16
+ pre-commit install
17
+ pre-commit
18
+ ``````
19
+
20
+
21
+ #### Start app
22
+
23
+ - excute the following command with the respective version
24
+
25
+ ```bash
26
+ streamlit run src/app/v1/app.py
27
+ ``````
28
+
29
+ ## Data Overview
30
+
31
+ - taxonomy_related data
32
+ - ```data/authors_taxonomy.json```: raw countries taxonomy
33
+ - ```data/draft_cat_taxonomy.json```: draft cat taxonomy.json
34
+ - ```data/authors_filter.json```: processed taxonomy for frontend filtering
35
+ - ```data/draftcat_taxonomy_filter.json```: processed taxonomy for frontend filtering
36
+ - ```data/inc_df_v6_small.csv```: processed data of scraping
37
+ - ```data/inc_df.csv```: data for document Storage
38
+ - ```data/taxonomies.txt```: raw taxonomies collection
39
+ - application related data
40
+ - ```database/document_store.pkl```: Document Store
41
+ - ```database/document_store.pkl```: Meta data of the document store with countries and draft labs as columns
42
+
43
+ ## Code Structure
44
+
45
+ ### Data Preprocessing
46
+
47
+ - ```src/data_processing/document_store_data.py```: Generates the data for the document store with last processing steps
48
+ - ```src/data_processing/get_meta_data_filter.py```: Generates the meta data from document store for the filtering in the frontend
49
+ - ```src/data_processing/taxonomy_processing.py```: Transforms the taxonomies for the filters in the frontend.
50
+
51
+ ### Frontend
52
+
53
+ - ```src/app/```: versions of app
54
+ - ```src/utils```: The utils folder contains functions used in the frontend application. Please check the imports to see which functions are used.
55
+ - ```styles```: css styles for apps
56
+ - ```.streamlit```: Basic Theme of Frontend
57
+ - Some Settings and Changes related to the Spaces App have to be done directly in the respective Streamlit App.
58
+
59
+ ### Backend
60
+
61
+ - ```src/document_store/document_store.py```: Generates the document store
62
+ - ```src/rag/pipeline.py```: RAG Pipeline
63
+ - ```src/rag/prompt/prompt_template.yaml```: Prompt Template for RAG
64
+
65
+ ## Changes Document Storage
66
+
67
+ - **Step 1**: Delete the ```database/document_store.pkl``` and the ```database/meta_data.csv``` files or give them a new name if you want to keep them until a successful change of the document storage.
68
+ - **Step 2**: Run the script ```data_processing/document_store_data.py```. This changes update the ```data/inc_df.csv``` data. If you need to make changes in the ```data/inc_df_small_v6.csv```, they have to made either manually or you have to implement a new script with the changes.
69
+ - **Step 3***: Run the script ```data_processing/get_meta_data_filter.py```. This will save a new document store and the meta_data from the document store.
70
+ - **Step 4**: If the changes affect also the taxonomy then you need to update the taxonomies as well. To do so, first update manually ```data/authors_taxonomy.json``` and ```data/draftcat_taxonomy.json```. Then run the script ```src/data_processing/taxonomy_processing.py```.
71
+ - Frequent Bugs after changes of the data:
72
+ - new country or draft lab category or changes in the naming. Solution: Check the taxonomy files and update the ```src/data_processing/taxonomy_processing.py```
73
+ - countries with special characters. Solution: Check ```data/inc_df_small_v6.csv``` if the ```data_processing/document_store_data.py``` fails. Check the ```database/meta_data.py```if the frontend application fails after changes and the taxonomies + filters.
74
+
75
+ ## Changes App
76
+
77
+ - Please always check changes in the app locally before pushing to the respective Spaces App.
78
+ - To check the Spaces App locally you can clone it like Git Repositories. Do avoid making changes directly in the interface of the Spaces App.
79
+ - Also if you copy changes from git to Spaces only copy the files where you have made changes. You need to make some adjustment before you copy the files:
80
+ - Please remove all OPENAI_KEYS from the app.py file and the the pipeline.py files.
81
+ - Make sure you added the following at the ```OPENAI_API_KEY = os.environ.get("OPEN_API_KEY")``` in the pipeline.py file
82
+ - Remove from app.py and pipeline.py ```src``` from imports. Otherwise you will get a ModuleNotFoundError.
data/authors_filter.json ADDED
@@ -0,0 +1,518 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Members - Countries": [
3
+ "Algeria",
4
+ "Angola",
5
+ "Antigua And Barbuda",
6
+ "Argentina",
7
+ "Armenia",
8
+ "Australia",
9
+ "Azerbaijan",
10
+ "Bahrain",
11
+ "Bangladesh",
12
+ "Barbados",
13
+ "Benin",
14
+ "Bosnia And Herzegovina",
15
+ "Brazil",
16
+ "Burkina Faso",
17
+ "Cambodia",
18
+ "Cameroon",
19
+ "Canada",
20
+ "Chile",
21
+ "China",
22
+ "Colombia",
23
+ "Comoros",
24
+ "Cook Islands",
25
+ "Costa Rica",
26
+ "Cuba",
27
+ "Cote D'Ivoire",
28
+ "Democratic Republic Of The Congo",
29
+ "Dominican Republic",
30
+ "Ecuador",
31
+ "Egypt",
32
+ "El Salvador",
33
+ "Equatorial Guinea",
34
+ "Eritrea",
35
+ "Eswatini",
36
+ "Ethiopia",
37
+ "Fiji",
38
+ "France",
39
+ "Gabon",
40
+ "Gambia",
41
+ "Georgia",
42
+ "Germany",
43
+ "Ghana",
44
+ "Grenada",
45
+ "Guatemala",
46
+ "Guinea",
47
+ "Guinea Bissau",
48
+ "Honduras",
49
+ "Iceland",
50
+ "India",
51
+ "Indonesia",
52
+ "Iraq",
53
+ "Islamic Republic Of Iran",
54
+ "Israel",
55
+ "Italy",
56
+ "Jamaica",
57
+ "Japan",
58
+ "Jordan",
59
+ "Kazakhstan",
60
+ "Kenya",
61
+ "Kiribati",
62
+ "Kuwait",
63
+ "Libya",
64
+ "Madagascar",
65
+ "Malawi",
66
+ "Malaysia",
67
+ "Maldives",
68
+ "Mali",
69
+ "Marshall Islands",
70
+ "Mauritius",
71
+ "Mexico",
72
+ "Micronesia",
73
+ "Moldova",
74
+ "Monaco",
75
+ "Mongolia",
76
+ "Montenegro",
77
+ "Morocco",
78
+ "Nauru",
79
+ "Nepal",
80
+ "New Zealand",
81
+ "Niger",
82
+ "Nigeria",
83
+ "North Macedonia",
84
+ "Norway",
85
+ "Oman",
86
+ "Pakistan",
87
+ "Palau",
88
+ "Panama",
89
+ "Papua New Guinea",
90
+ "Paraguay",
91
+ "Peru",
92
+ "Philippines",
93
+ "Qatar",
94
+ "Republic Of Congo",
95
+ "Republic Of Korea",
96
+ "Republic Of Moldova",
97
+ "Romania",
98
+ "Russian Federation",
99
+ "Rwanda",
100
+ "Samoa",
101
+ "Saudi Arabia",
102
+ "Senegal",
103
+ "Seychelles",
104
+ "Sierra Leone",
105
+ "Singapore",
106
+ "Solomon Islands",
107
+ "Somalia",
108
+ "South Africa",
109
+ "Sri Lanka",
110
+ "State Of Palestine",
111
+ "Switzerland",
112
+ "Syrian Arab Republic",
113
+ "Thailand",
114
+ "Timor-Leste",
115
+ "Togo",
116
+ "Tonga",
117
+ "Trinidad And Tobago",
118
+ "Tunisia",
119
+ "Turkey",
120
+ "Tuvalu",
121
+ "Uganda",
122
+ "Ukraine",
123
+ "United Arab Emirates",
124
+ "United Kingdom Of Great Britain And Northern Ireland",
125
+ "United Republic Of Tanzania",
126
+ "United States Of America",
127
+ "Uruguay",
128
+ "Venezuela",
129
+ "Vietnam",
130
+ "Yemen",
131
+ "Zambia",
132
+ "Zimbabwe"
133
+ ],
134
+ "Members - International and Regional State Associations": [
135
+ "Asia-Pacific States",
136
+ "European Union (EU) And Its 27 Member States",
137
+ "Federated States Of Micronesia",
138
+ "Gulf Cooperation Council",
139
+ "High Ambition Coalition",
140
+ "Latin American And Caribbean Group (GRULAC)",
141
+ "Like Minded Countries",
142
+ "Pacific Small Island Developing States (PSIDS)",
143
+ "Alliance Of Small Island States (AOSIS)",
144
+ "The Federated States Of Micronesia",
145
+ "The Group Of African States",
146
+ "Western European And Other States",
147
+ "Coordinating Body on the Seas of East Asia (COBSEA)"
148
+ ],
149
+ "Intergovernmental Negotiation Committee": [
150
+ "Executive Secretary Of The INC",
151
+ "Intergovernmental Negotiation Committee"
152
+ ],
153
+ "Observers and Other Participants": [
154
+ "10YFP Secretariat (One Planet Network)",
155
+ "ABIPLAST - Brazilian Association Of The Plastic Industry",
156
+ "ACTS Organization",
157
+ "ASTM International (ASTM)",
158
+ "AWTAD Anti-Corruption Organization",
159
+ "Aaina",
160
+ "Action Et Education Pour Tous (AEPT)",
161
+ "Action On Smoking And Health (ASH)",
162
+ "Africa Climate And Environment Foundation (ACEF)",
163
+ "African Alliance For Health Research Economic Development",
164
+ "African Environmental Network",
165
+ "African Petroleum Producers\u2019 Organization (APPO)",
166
+ "Aguaclara",
167
+ "Alianza Basura Cero",
168
+ "Alianza Latinoamericana De Asociaciones De La Industria De Alimentos Y Bebidas (ALAIAB)",
169
+ "All-China Environment Federation",
170
+ "Alliance Panafricaine Multi-Acteurs Sur La Pollution Plastique",
171
+ "Alliance Pour Le Controle Du Tabac En Afrique (ACTA)",
172
+ "American Chemistry Council",
173
+ "American Fuel and Petrochemical Manufacturers (AFPM)",
174
+ "And Colectivo Jaguares De Nuestra Madre Tierra",
175
+ "Aotearoa Plastic Pollution Alliance (APPA)",
176
+ "Arctic Monitoring And Assessment Programme (AMAP)",
177
+ "Armenian Women For Health And Healthy Environment",
178
+ "Arnika",
179
+ "Asia Indigenous Peoples Pact And Indigenous Peoples",
180
+ "Asian Marine Conservation Association",
181
+ "Asociaci\u00f3n Ecol\u00f3gica Santo Tom\u00e1s",
182
+ "Asociaci\u00f3n Sustentar",
183
+ "Association For Promoting Sustainability In Campuses And Communities (APSCC)",
184
+ "Association For Rural Area Social Modification, Improvement And Nestling (ARASMIN)",
185
+ "Association For Supporting The SDGs For The United Nations",
186
+ "Association Institute Of Total Environment (INTEV)",
187
+ "Association Nationale Du Civisme",
188
+ "Association Of Leadership You-Lean",
189
+ "Association Of Plastic Recyclers",
190
+ "Association Of Solidarity Through Humanitarian Imperative Actions International (ASHIA)",
191
+ "Association Pour L'Integration Et La Developpement Durable Au Burundi",
192
+ "Australian Packaging Covenant Organisation (APCO)",
193
+ "Azul",
194
+ "BAN Toxics",
195
+ "BVRio",
196
+ "Barranquilla + 20",
197
+ "Basel Action Network (BAN)",
198
+ "Beijing Greenovation Institute For Public Welfare Development (GHub)",
199
+ "Brazilian Chemical Industry Association",
200
+ "Break Free From Plastic",
201
+ "Breathe Free Detroit",
202
+ "Bridgers Association Cameroon",
203
+ "Bureau Of International Recycling",
204
+ "Business And Industry Major Group Presented By World Business Council For Sustainable Development",
205
+ "Business Coalition For A Global Plastics Treaty Convened By The Ellen MacArthur Foundation And WWF, In Collaboration With Aligned Businesses And Financial Institutions, And Supported By NGO Partners",
206
+ "CAIRPLAS, C\u00e1mara Argentina De La Industria De Reciclados Plasticos",
207
+ "CDP",
208
+ "CEMPRE Colombia \u2013 Compromiso Empresarial Para El Reciclaje",
209
+ "CESTA",
210
+ "Carrizo Comecrudo Tribe",
211
+ "Center For Biological Diversity",
212
+ "Center For Coalfield Justice",
213
+ "Center For International Environmental Law (CIEL)",
214
+ "Center For International Law And Governance",
215
+ "Center For Islamic Studies Of Universitas Nasional, Jakarta",
216
+ "Center For Oceanic Awareness, Research, And Education (COARE)",
217
+ "Center For Public Health And Environmental Development (CEPHED)",
218
+ "Centre De Recherche Et D\u2019Education Pour Le D\u00e9veloppement (CREPD)",
219
+ "Centre D\u2019Accompagnement Des Alternatives Locales De D\u00e9veloppement (Caald)",
220
+ "Centre For Environmental Justice",
221
+ "Centre For Human Rights And Climate Change Research",
222
+ "Centre For Science And Environment (CSE)",
223
+ "Centre International De Droit Compar\u00e9 De L\u2019environnement",
224
+ "Centre International De Droit Compar\u00e9 De L\u2019environnement",
225
+ "Chemical And Allied Industries\u2019 Association (CAIA)",
226
+ "Chia Funkuin Foundation",
227
+ "Children And Youth Major Group (CYMG)",
228
+ "Children\u2019s Environmental Health Foundation",
229
+ "China Biodiversity Conservation And Green Development Foundation (CBCGDF)",
230
+ "Circular Economy For Flexible Packaging In Europe Initiative (CEFLEX)",
231
+ "Citeo",
232
+ "Citizen Consumer And Civic Action Group (CAG)",
233
+ "Civil Society And Rights Holder Coalition",
234
+ "Civil Society Organizations In Africa",
235
+ "Civil Society Organizations In Asia Pacific",
236
+ "Civil Society Organizations In Latin America",
237
+ "Co-habiter",
238
+ "Comision Centro Americana",
239
+ "Comit\u00e9 National Contre Le Tabagisme",
240
+ "Community Action Against Plastic Waste (CAPws)",
241
+ "Congregations Of St Joseph",
242
+ "Consciente Colectivo",
243
+ "Consumer Goods Forum",
244
+ "Consumers International",
245
+ "Contact Group 1",
246
+ "Contact Group 2",
247
+ "Convention On Biological Diversity (CBD)",
248
+ "Corporate Accountability",
249
+ "Council For Scientific And Industrial Research",
250
+ "C\u00e1mara De La Industria Qu\u00edmica Y Petroqu\u00edmica (CIQyP\u00ae), Miembro Del Concejo Internacional De Asociaciones Qu\u00edmicas (ICCA) Y EURECA",
251
+ "Danimer Scientific",
252
+ "EDANA And INDA",
253
+ "EPS Branchen-en Del Af Plastindustrien (Denmark)",
254
+ "EPS Industry Alliance",
255
+ "EURECA",
256
+ "Earth Day",
257
+ "Earth Law Center",
258
+ "Ecoplas",
259
+ "Ecoproject Partnership",
260
+ "Ellen MacArthur Foundation",
261
+ "Endocrine Society",
262
+ "Engineers Australia",
263
+ "Entidad Especializada En Pl\u00e1sticos Y Medio Ambiente Para Una Econom\u00eda Circular (ECOPLAS)",
264
+ "Entidades Unidas Reafirmando La Econom\u00eda Circular En Argentina (EURECA)",
265
+ "Environmental And Social Development Organization (ESDO)",
266
+ "Environmental Coalition On Standards (ECOS)",
267
+ "Environmental Development Association (FASEEL)",
268
+ "Environmental Investigation Agency",
269
+ "European Bioplastics (EUBP)",
270
+ "European Manufacturers Of EPS",
271
+ "European Manufacturers Of Expanded Polystyrene (EUMEPS)",
272
+ "Expanded Polystyrene Australia",
273
+ "Fauna and Flora International",
274
+ "Fauna and Flora International And Zoological Society Of London (ZSL)",
275
+ "Fenceline Watch",
276
+ "First Modern Agro Tools Common Initiative Group ( FI.MO.AT.C.I.G)",
277
+ "Fondation De La Mer",
278
+ "Food And Agriculture Organization Of The United Nations (FAO)",
279
+ "Food And Livestock Initiative (FLI Asbl)",
280
+ "Forum On Trade, Environment and The SDGs (TESS)",
281
+ "Foundation Of Fokus Nexus Tiga (Nexus3 Foundation)",
282
+ "French Water Partnership (Partenariat Fran\u00e7ais Pour L\u2019eau)",
283
+ "Friends World Committee For Consultation (FWCC)",
284
+ "Fronteras Comunes",
285
+ "Fundacion Avina",
286
+ "Fundaci\u00f3n Ambiente Y Recursos Naturales",
287
+ "Fundaci\u00f3n Interamericana Del Coraz\u00f3n (FIC)",
288
+ "GRID-Arendal",
289
+ "Galapagos Conservation Trust",
290
+ "Gallifrey Foundation",
291
+ "Geneva Cities Hub (GCH)",
292
+ "Gerakan Indonesia Diet Kantong Plastik (GIDKP) - The Indonesia Plastic Bag Diet Movement",
293
+ "Global Alliance For Incinerator Alternatives (GAIA)",
294
+ "Global Alliance On Health And Pollution (GAHP)",
295
+ "Global Cement And Concrete Association",
296
+ "Global Council For Science And The Environment (GCSE)",
297
+ "Global Plastics Policy Centre, University Of Portsmouth",
298
+ "Global Youth Coalition On Plastic Pollution (GYCPP)",
299
+ "Green Africa Youth Organization",
300
+ "Greenpeace International",
301
+ "GroundWork South Africa (GroundWorkSA)",
302
+ "Haitelmex Foundation",
303
+ "Hasiru Dala In Collaboration With Eleven Other Civil Society Organizations",
304
+ "Health And Environment Justice Support (HEJSupport)",
305
+ "Health Care Without Harm (HCWH)",
306
+ "Healthy Hospitals Project - PHS",
307
+ "Human Rights Watch",
308
+ "ICLEI - Local Governments For Sustainability",
309
+ "India Institute For Critical Action Centre In Movement (CACIM)",
310
+ "India Water Foundation",
311
+ "India Youth For Society",
312
+ "Indigenous Caucus",
313
+ "Indigenous Peoples And Their Communities Major Group",
314
+ "Indigenous Peoples Representatives",
315
+ "Indonesian Centre For Environmental Law (ICEL)",
316
+ "Innovazing Vision",
317
+ "Institute For Sustainable Development And Research (ISDR)",
318
+ "Integrated Strategies Forum",
319
+ "Interamerican Heart Foundation",
320
+ "International Air Transport Association",
321
+ "International Alliance Of Waste Pickers (IAWP)",
322
+ "International Alliance Of Waste-pickers",
323
+ "International Atomic Energy Agency (IAEA)",
324
+ "International Center Of Comparative Environmental Law (CIDCE)",
325
+ "International Centre For Environmental Education And Community Development (ICENECDEV)",
326
+ "International Chamber Of Commerce (ICC)",
327
+ "International Council Of Beverages Associations (ICBA)",
328
+ "International Council Of Chemical Associations (ICCA)",
329
+ "International Knowledge Hub Against Plastic Pollution (IKHAPP)",
330
+ "International Labour Office",
331
+ "International Labour Organization (ILO)",
332
+ "International Maritime Organization (IMO)",
333
+ "International Medical Crisis Response Alliance (IMCRA)",
334
+ "International Movement For Advancement Of Education Culture Social and Economic Development (IMAESED)",
335
+ "International Network For Bamboo And Rattan (INBAR)",
336
+ "International Organization For Standardization (ISO)",
337
+ "International Pollutants Elimination Network (IPEN)",
338
+ "International Science Council (ISC)",
339
+ "International Society Of Doctors For The Environment (ISDE)",
340
+ "International Solid Waste Association (ISWA)",
341
+ "International Trade Union Confederation (ITUC)",
342
+ "International Union For Conservation Of Nature And Natural Resources (IUCN)",
343
+ "Inuit Circumpolar Council (ICC)",
344
+ "Japan Clean Ocean Material Alliance (CLOMA)",
345
+ "King Abdullah Petroleum Studies And Research Center (KAPSARC)",
346
+ "Krityanand UNESCO Club, Jamshedpur",
347
+ "La Grande Puissance De Dieu",
348
+ "Latin American Organizations - Alianza Basura Cero",
349
+ "Ligue Camerounaise Des Droits De L'Homme",
350
+ "Litter4tokens South Africa NPO",
351
+ "Local And Subnational Government Working Group",
352
+ "Local Authorities",
353
+ "Loop",
354
+ "Major Alliance Education Centre (MAEC)",
355
+ "Major Group For Children And Youth",
356
+ "MarViva",
357
+ "Marine Ecosystems Protected Areas (MEPA) Trust",
358
+ "Members Of Microplastics Working Group",
359
+ "Mexican Network Of Ecological Action",
360
+ "Minderoo Foundation",
361
+ "Ministry Of Environment And Wildlife - Southwest State Of Somalia",
362
+ "Moms Clean Air Force",
363
+ "Multifaith Action Group On Pollution",
364
+ "NGO Major Group",
365
+ "NORCE On Behalf Of The North Atlantic Microplastic Centre (NAMC)",
366
+ "National Old Folks Of Liberia (NOFOL)",
367
+ "National Retail Association (NRA)",
368
+ "Natural Resources Defense Council (NRDC)",
369
+ "Neste",
370
+ "Nexus For Health, Environment, And Development (Nexus3) Foundation",
371
+ "Nipe Fagio",
372
+ "No Balloon Release Australia",
373
+ "No More Butts",
374
+ "Norwegian Academy Of International Law (NAIL)",
375
+ "Norwegian Institute For Water Research",
376
+ "Norwegian Institute For Water Research (NIVA)",
377
+ "Norwegian Research Centre (NORCE)",
378
+ "ONG Jeunesse Active De Guin\u00e9e (JAG)",
379
+ "Occidental Arts And Ecology Center (OAEC)",
380
+ "Ocean Conservancy",
381
+ "Ocean Recovery Alliance",
382
+ "Ocean. Now",
383
+ "OceanCare",
384
+ "OceanCare Global Ghost Gear Initiative",
385
+ "Office Of The UN High Commissioner For Human Rights (OHCHR)",
386
+ "OpenOceans Global",
387
+ "Organisation For Economic Co-operation And Development (OECD)",
388
+ "Organization Of Arab Petroleum Exporting Countries (OAPEC)",
389
+ "Organization Of The Petroleum Exporting Countries (OPEC)",
390
+ "Our Sea Of East Asia Network (OSEAN)",
391
+ "Out For Sustainability",
392
+ "PCX Solutions (HOPEx Environment Group, Inc)",
393
+ "Pacific Environment And Resources Center (Pacific Environment)",
394
+ "Pan American Neuroendocrine Society",
395
+ "Partnerships For Change",
396
+ "Paryavaran Mitra",
397
+ "PetStar",
398
+ "Planeteer Alliance And Captain Planet Foundation",
399
+ "Plastalliance - Alliance Plasturgie Et Composites Du Futur",
400
+ "Plastic Change",
401
+ "Plastic Free Foundation",
402
+ "Plastic Free Future",
403
+ "Plastic Oceans Australasia",
404
+ "Plastic Pollution Coalition",
405
+ "Plastics Federation Of South Africa",
406
+ "Plastics Industry Association",
407
+ "PlasticsEurope",
408
+ "Plasticulture",
409
+ "ProDelphinus",
410
+ "Public Services International (PSI)",
411
+ "RAPAL",
412
+ "Rapal Uruguay",
413
+ "Recycling Partnership",
414
+ "Red De Acci\u00f3n Ecol\u00f3gica De M\u00e9xico",
415
+ "Red De Acci\u00f3n Por Los Derechos Ambientales",
416
+ "Red Mexicana De Accion Ecologica (Accion Ecologica)",
417
+ "Regions4 Sustainable Development",
418
+ "Reloop Platform",
419
+ "Resolve",
420
+ "Royal Society Of Chemistry",
421
+ "Samo Foundation",
422
+ "Sanid Organization For Relief And Development (SORD)",
423
+ "Sasakawa Peace Foundation",
424
+ "Saudi Green Building Forum",
425
+ "Sciaena",
426
+ "Scientists\u2019 Coalition For An Effective Plastics Treaty (Scientists\u2019 Coalition)",
427
+ "Secretariat For The Pacific Regional Environment Programme",
428
+ "Secretariat Of The Basel, Rotterdam And Stockholm Conventions",
429
+ "Secretariat Of The Convention On The Protection And Use Of Transboundary Watercourses And International Lakes (Water Convention)",
430
+ "Secretariat Of The Pacific Regional Environment Programme (SPREP)",
431
+ "Secretariat Of The WHO Framework Convention On Tobacco Control",
432
+ "Secretariats Of The Basel, Rotterdam And Stockholm Conventions",
433
+ "Shenzhen Zero Waste",
434
+ "Smoke Free Partnership, A Member Of The Stop Tobacco Pollution Alliance (STPA)",
435
+ "Sociedad Peruana De Derecho Ambiental (Peruvian Society Of Environmental Law)",
436
+ "Somali Sustainable Development Organisation (SOSDO)",
437
+ "Somali Youth Development Foundation (SYDF)",
438
+ "South Asia Cooperative Environment Programme",
439
+ "Stand Earth",
440
+ "Stevenson Holistic Care Foundation (SHCF)",
441
+ "Stichting CEFLEX \u2013 The Circular Economy For Flexible Packaging Initiative",
442
+ "Stiftelsen Stockholm International Water Institute",
443
+ "Styrenics Industry",
444
+ "Sustainable Coastlines Charitable Trust",
445
+ "Sustainable Environment Food And Agriculture Initiative",
446
+ "Swedish Society For Nature Conservation (SSNC)",
447
+ "Systemiq",
448
+ "T/A Plastics SA",
449
+ "Take 3 For The Sea",
450
+ "Taller Ecologista",
451
+ "Tangaroa Blue Foundation",
452
+ "Tearfund",
453
+ "Thailand",
454
+ "The Australian Marine Conservation Society",
455
+ "The Center For Oceanic Awareness, Research, And Education (COARE)",
456
+ "The Descendants Project",
457
+ "The Fletcher School",
458
+ "The Global Organization For PHA (GO!PHA)",
459
+ "The Nature Conservancy",
460
+ "The Ocean Cleanup",
461
+ "The Pew Charitable Trusts",
462
+ "The Sea Cleaners",
463
+ "The Society Of Native Nations",
464
+ "The Terracycle Foundation",
465
+ "The Vinyl Institute",
466
+ "Toxics Link",
467
+ "Toxisphera, Mingas Por El Mar",
468
+ "Trade Unions Major Group",
469
+ "Trash Hero World",
470
+ "Trash4tokens NGO",
471
+ "Tufts University",
472
+ "U.S. Council For International Business (USCIB)",
473
+ "UN Women\u2019s Major Group",
474
+ "UNESCO Association - Guwahati",
475
+ "UNESCO Chair For Ocean Sustainability",
476
+ "Udisha",
477
+ "Unbutton Fashion",
478
+ "Unions Workers And Wastepickers",
479
+ "United Nations Association of Spain and the Government of Catalonia",
480
+ "United Nations Conference On Trade And Development (UNCTAD)",
481
+ "United Nations Development Programme (UNDP)",
482
+ "United Nations Economic Commission For Europe (UNECE)",
483
+ "United Nations Global Compact",
484
+ "United Nations Human Settlements Programme (UN-Habitat)",
485
+ "United Nations Industrial Development Organization (UNIDO)",
486
+ "United Nations Institute For Training And Research (UNITAR)",
487
+ "United Nations Office For Disaster Risk Reduction (UNDRR)",
488
+ "United Nations Office On Drugs And Crime (UNODC)",
489
+ "United States Council For Business (USCIB)",
490
+ "University Of Wollongong",
491
+ "Unplastify",
492
+ "Verra",
493
+ "Vital Strategies",
494
+ "WWF-Australia",
495
+ "Waste Free Oceans",
496
+ "Whole World",
497
+ "William Ruto",
498
+ "Women In Informal Employment Globalizing And Organizing (WIEGO)",
499
+ "Women Working Group",
500
+ "Wonjin Institute For Occupational And Environmental Health (WIOEH)",
501
+ "Workers And Trade Unions Major Group",
502
+ "Working Group On Marine Litter (WGML) Of Coordinating Body On The Seas Of East Asia (COBSEA)",
503
+ "World Against Single Use Plastic (WASUP)",
504
+ "World Business Council For Sustainable Development (WBCSD)",
505
+ "World Economic Forum And Global Plastic Action Partnership (GPAP)",
506
+ "World Health Organisation",
507
+ "World Health Organization, Including The Secretariat Of The WHO Framework Convention On Tobacco Control",
508
+ "World Plastics Council (WPC)",
509
+ "World Welfare Association",
510
+ "World Wide Fund For Nature (WWF)",
511
+ "Wrap",
512
+ "Youth Alive Uganda",
513
+ "Youth Focus Group",
514
+ "Yunus Environment Hub",
515
+ "Zero Waste Europe",
516
+ "Zoological Society Of London (ZSL)"
517
+ ]
518
+ }
data/authors_taxonomy.json ADDED
@@ -0,0 +1,520 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Members": {
3
+ "Countries": [
4
+ "Algeria",
5
+ "Angola",
6
+ "Antigua And Barbuda",
7
+ "Argentina",
8
+ "Armenia",
9
+ "Australia",
10
+ "Azerbaijan",
11
+ "Bahrain",
12
+ "Bangladesh",
13
+ "Barbados",
14
+ "Benin",
15
+ "Bosnia And Herzegovina",
16
+ "Brazil",
17
+ "Burkina Faso",
18
+ "Cambodia",
19
+ "Cameroon",
20
+ "Canada",
21
+ "Chile",
22
+ "China",
23
+ "Colombia",
24
+ "Comoros",
25
+ "Cook Islands",
26
+ "Costa Rica",
27
+ "Cuba",
28
+ "Cote D'Ivoire",
29
+ "Democratic Republic Of The Congo",
30
+ "Dominican Republic",
31
+ "Ecuador",
32
+ "Egypt",
33
+ "El Salvador",
34
+ "Equatorial Guinea",
35
+ "Eritrea",
36
+ "Eswatini",
37
+ "Ethiopia",
38
+ "Fiji",
39
+ "France",
40
+ "Gabon",
41
+ "Gambia",
42
+ "Georgia",
43
+ "Germany",
44
+ "Ghana",
45
+ "Grenada",
46
+ "Guatemala",
47
+ "Guinea",
48
+ "Guinea Bissau",
49
+ "Honduras",
50
+ "Iceland",
51
+ "India",
52
+ "Indonesia",
53
+ "Iraq",
54
+ "Islamic Republic Of Iran",
55
+ "Israel",
56
+ "Italy",
57
+ "Jamaica",
58
+ "Japan",
59
+ "Jordan",
60
+ "Kazakhstan",
61
+ "Kenya",
62
+ "Kiribati",
63
+ "Kuwait",
64
+ "Libya",
65
+ "Madagascar",
66
+ "Malawi",
67
+ "Malaysia",
68
+ "Maldives",
69
+ "Mali",
70
+ "Marshall Islands",
71
+ "Mauritius",
72
+ "Mexico",
73
+ "Micronesia",
74
+ "Moldova",
75
+ "Monaco",
76
+ "Mongolia",
77
+ "Montenegro",
78
+ "Morocco",
79
+ "Nauru",
80
+ "Nepal",
81
+ "New Zealand",
82
+ "Niger",
83
+ "Nigeria",
84
+ "North Macedonia",
85
+ "Norway",
86
+ "Oman",
87
+ "Pakistan",
88
+ "Palau",
89
+ "Panama",
90
+ "Papua New Guinea",
91
+ "Paraguay",
92
+ "Peru",
93
+ "Philippines",
94
+ "Qatar",
95
+ "Republic Of Congo",
96
+ "Republic Of Korea",
97
+ "Republic Of Moldova",
98
+ "Romania",
99
+ "Russian Federation",
100
+ "Rwanda",
101
+ "Samoa",
102
+ "Saudi Arabia",
103
+ "Senegal",
104
+ "Seychelles",
105
+ "Sierra Leone",
106
+ "Singapore",
107
+ "Solomon Islands",
108
+ "Somalia",
109
+ "South Africa",
110
+ "Sri Lanka",
111
+ "State Of Palestine",
112
+ "Switzerland",
113
+ "Syrian Arab Republic",
114
+ "Thailand",
115
+ "Timor-Leste",
116
+ "Togo",
117
+ "Tonga",
118
+ "Trinidad And Tobago",
119
+ "Tunisia",
120
+ "Turkey",
121
+ "Tuvalu",
122
+ "Uganda",
123
+ "Ukraine",
124
+ "United Arab Emirates",
125
+ "United Kingdom Of Great Britain And Northern Ireland",
126
+ "United Republic Of Tanzania",
127
+ "United States Of America",
128
+ "Uruguay",
129
+ "Venezuela",
130
+ "Vietnam",
131
+ "Yemen",
132
+ "Zambia",
133
+ "Zimbabwe"
134
+ ],
135
+ "International and Regional State Associations": [
136
+ "Asia-Pacific States",
137
+ "European Union (EU) And Its 27 Member States",
138
+ "Federated States Of Micronesia",
139
+ "Gulf Cooperation Council",
140
+ "High Ambition Coalition",
141
+ "Latin American And Caribbean Group (GRULAC)",
142
+ "Like Minded Countries",
143
+ "Pacific Small Island Developing States (PSIDS)",
144
+ "Alliance Of Small Island States (AOSIS)",
145
+ "The Federated States Of Micronesia",
146
+ "The Group Of African States",
147
+ "Western European And Other States",
148
+ "Coordinating Body on the Seas of East Asia (COBSEA)"
149
+ ]
150
+ },
151
+ "Intergovernmental Negotiation Committee": [
152
+ "Executive Secretary Of The INC",
153
+ "Intergovernmental Negotiation Committee"
154
+ ],
155
+ "Observers and Other Participants": [
156
+ "10YFP Secretariat (One Planet Network)",
157
+ "ABIPLAST - Brazilian Association Of The Plastic Industry",
158
+ "ACTS Organization",
159
+ "ASTM International (ASTM)",
160
+ "AWTAD Anti-Corruption Organization",
161
+ "Aaina",
162
+ "Action Et Education Pour Tous (AEPT)",
163
+ "Action On Smoking And Health (ASH)",
164
+ "Africa Climate And Environment Foundation (ACEF)",
165
+ "African Alliance For Health Research Economic Development",
166
+ "African Environmental Network",
167
+ "African Petroleum Producers\u2019 Organization (APPO)",
168
+ "Aguaclara",
169
+ "Alianza Basura Cero",
170
+ "Alianza Latinoamericana De Asociaciones De La Industria De Alimentos Y Bebidas (ALAIAB)",
171
+ "All-China Environment Federation",
172
+ "Alliance Panafricaine Multi-Acteurs Sur La Pollution Plastique",
173
+ "Alliance Pour Le Controle Du Tabac En Afrique (ACTA)",
174
+ "American Chemistry Council",
175
+ "American Fuel and Petrochemical Manufacturers (AFPM)",
176
+ "And Colectivo Jaguares De Nuestra Madre Tierra",
177
+ "Aotearoa Plastic Pollution Alliance (APPA)",
178
+ "Arctic Monitoring And Assessment Programme (AMAP)",
179
+ "Armenian Women For Health And Healthy Environment",
180
+ "Arnika",
181
+ "Asia Indigenous Peoples Pact And Indigenous Peoples",
182
+ "Asian Marine Conservation Association",
183
+ "Asociaci\u00f3n Ecol\u00f3gica Santo Tom\u00e1s",
184
+ "Asociaci\u00f3n Sustentar",
185
+ "Association For Promoting Sustainability In Campuses And Communities (APSCC)",
186
+ "Association For Rural Area Social Modification, Improvement And Nestling (ARASMIN)",
187
+ "Association For Supporting The SDGs For The United Nations",
188
+ "Association Institute Of Total Environment (INTEV)",
189
+ "Association Nationale Du Civisme",
190
+ "Association Of Leadership You-Lean",
191
+ "Association Of Plastic Recyclers",
192
+ "Association Of Solidarity Through Humanitarian Imperative Actions International (ASHIA)",
193
+ "Association Pour L'Integration Et La Developpement Durable Au Burundi",
194
+ "Australian Packaging Covenant Organisation (APCO)",
195
+ "Azul",
196
+ "BAN Toxics",
197
+ "BVRio",
198
+ "Barranquilla + 20",
199
+ "Basel Action Network (BAN)",
200
+ "Beijing Greenovation Institute For Public Welfare Development (GHub)",
201
+ "Brazilian Chemical Industry Association",
202
+ "Break Free From Plastic",
203
+ "Breathe Free Detroit",
204
+ "Bridgers Association Cameroon",
205
+ "Bureau Of International Recycling",
206
+ "Business And Industry Major Group Presented By World Business Council For Sustainable Development",
207
+ "Business Coalition For A Global Plastics Treaty Convened By The Ellen MacArthur Foundation And WWF, In Collaboration With Aligned Businesses And Financial Institutions, And Supported By NGO Partners",
208
+ "CAIRPLAS, C\u00e1mara Argentina De La Industria De Reciclados Plasticos",
209
+ "CDP",
210
+ "CEMPRE Colombia \u2013 Compromiso Empresarial Para El Reciclaje",
211
+ "CESTA",
212
+ "Carrizo Comecrudo Tribe",
213
+ "Center For Biological Diversity",
214
+ "Center For Coalfield Justice",
215
+ "Center For International Environmental Law (CIEL)",
216
+ "Center For International Law And Governance",
217
+ "Center For Islamic Studies Of Universitas Nasional, Jakarta",
218
+ "Center For Oceanic Awareness, Research, And Education (COARE)",
219
+ "Center For Public Health And Environmental Development (CEPHED)",
220
+ "Centre De Recherche Et D\u2019Education Pour Le D\u00e9veloppement (CREPD)",
221
+ "Centre D\u2019Accompagnement Des Alternatives Locales De D\u00e9veloppement (Caald)",
222
+ "Centre For Environmental Justice",
223
+ "Centre For Human Rights And Climate Change Research",
224
+ "Centre For Science And Environment (CSE)",
225
+ "Centre International De Droit Compar\u00e9 De L\u2019environnement",
226
+ "Centre International De Droit Compar\u00e9 De L\u2019environnement",
227
+ "Chemical And Allied Industries\u2019 Association (CAIA)",
228
+ "Chia Funkuin Foundation",
229
+ "Children And Youth Major Group (CYMG)",
230
+ "Children\u2019s Environmental Health Foundation",
231
+ "China Biodiversity Conservation And Green Development Foundation (CBCGDF)",
232
+ "Circular Economy For Flexible Packaging In Europe Initiative (CEFLEX)",
233
+ "Citeo",
234
+ "Citizen Consumer And Civic Action Group (CAG)",
235
+ "Civil Society And Rights Holder Coalition",
236
+ "Civil Society Organizations In Africa",
237
+ "Civil Society Organizations In Asia Pacific",
238
+ "Civil Society Organizations In Latin America",
239
+ "Co-habiter",
240
+ "Comision Centro Americana",
241
+ "Comit\u00e9 National Contre Le Tabagisme",
242
+ "Community Action Against Plastic Waste (CAPws)",
243
+ "Congregations Of St Joseph",
244
+ "Consciente Colectivo",
245
+ "Consumer Goods Forum",
246
+ "Consumers International",
247
+ "Contact Group 1",
248
+ "Contact Group 2",
249
+ "Convention On Biological Diversity (CBD)",
250
+ "Corporate Accountability",
251
+ "Council For Scientific And Industrial Research",
252
+ "C\u00e1mara De La Industria Qu\u00edmica Y Petroqu\u00edmica (CIQyP\u00ae), Miembro Del Concejo Internacional De Asociaciones Qu\u00edmicas (ICCA) Y EURECA",
253
+ "Danimer Scientific",
254
+ "EDANA And INDA",
255
+ "EPS Branchen-en Del Af Plastindustrien (Denmark)",
256
+ "EPS Industry Alliance",
257
+ "EURECA",
258
+ "Earth Day",
259
+ "Earth Law Center",
260
+ "Ecoplas",
261
+ "Ecoproject Partnership",
262
+ "Ellen MacArthur Foundation",
263
+ "Endocrine Society",
264
+ "Engineers Australia",
265
+ "Entidad Especializada En Pl\u00e1sticos Y Medio Ambiente Para Una Econom\u00eda Circular (ECOPLAS)",
266
+ "Entidades Unidas Reafirmando La Econom\u00eda Circular En Argentina (EURECA)",
267
+ "Environmental And Social Development Organization (ESDO)",
268
+ "Environmental Coalition On Standards (ECOS)",
269
+ "Environmental Development Association (FASEEL)",
270
+ "Environmental Investigation Agency",
271
+ "European Bioplastics (EUBP)",
272
+ "European Manufacturers Of EPS",
273
+ "European Manufacturers Of Expanded Polystyrene (EUMEPS)",
274
+ "Expanded Polystyrene Australia",
275
+ "Fauna and Flora International",
276
+ "Fauna and Flora International And Zoological Society Of London (ZSL)",
277
+ "Fenceline Watch",
278
+ "First Modern Agro Tools Common Initiative Group ( FI.MO.AT.C.I.G)",
279
+ "Fondation De La Mer",
280
+ "Food And Agriculture Organization Of The United Nations (FAO)",
281
+ "Food And Livestock Initiative (FLI Asbl)",
282
+ "Forum On Trade, Environment and The SDGs (TESS)",
283
+ "Foundation Of Fokus Nexus Tiga (Nexus3 Foundation)",
284
+ "French Water Partnership (Partenariat Fran\u00e7ais Pour L\u2019eau)",
285
+ "Friends World Committee For Consultation (FWCC)",
286
+ "Fronteras Comunes",
287
+ "Fundacion Avina",
288
+ "Fundaci\u00f3n Ambiente Y Recursos Naturales",
289
+ "Fundaci\u00f3n Interamericana Del Coraz\u00f3n (FIC)",
290
+ "GRID-Arendal",
291
+ "Galapagos Conservation Trust",
292
+ "Gallifrey Foundation",
293
+ "Geneva Cities Hub (GCH)",
294
+ "Gerakan Indonesia Diet Kantong Plastik (GIDKP) - The Indonesia Plastic Bag Diet Movement",
295
+ "Global Alliance For Incinerator Alternatives (GAIA)",
296
+ "Global Alliance On Health And Pollution (GAHP)",
297
+ "Global Cement And Concrete Association",
298
+ "Global Council For Science And The Environment (GCSE)",
299
+ "Global Plastics Policy Centre, University Of Portsmouth",
300
+ "Global Youth Coalition On Plastic Pollution (GYCPP)",
301
+ "Green Africa Youth Organization",
302
+ "Greenpeace International",
303
+ "GroundWork South Africa (GroundWorkSA)",
304
+ "Haitelmex Foundation",
305
+ "Hasiru Dala In Collaboration With Eleven Other Civil Society Organizations",
306
+ "Health And Environment Justice Support (HEJSupport)",
307
+ "Health Care Without Harm (HCWH)",
308
+ "Healthy Hospitals Project - PHS",
309
+ "Human Rights Watch",
310
+ "ICLEI - Local Governments For Sustainability",
311
+ "India Institute For Critical Action Centre In Movement (CACIM)",
312
+ "India Water Foundation",
313
+ "India Youth For Society",
314
+ "Indigenous Caucus",
315
+ "Indigenous Peoples And Their Communities Major Group",
316
+ "Indigenous Peoples Representatives",
317
+ "Indonesian Centre For Environmental Law (ICEL)",
318
+ "Innovazing Vision",
319
+ "Institute For Sustainable Development And Research (ISDR)",
320
+ "Integrated Strategies Forum",
321
+ "Interamerican Heart Foundation",
322
+ "International Air Transport Association",
323
+ "International Alliance Of Waste Pickers (IAWP)",
324
+ "International Alliance Of Waste-pickers",
325
+ "International Atomic Energy Agency (IAEA)",
326
+ "International Center Of Comparative Environmental Law (CIDCE)",
327
+ "International Centre For Environmental Education And Community Development (ICENECDEV)",
328
+ "International Chamber Of Commerce (ICC)",
329
+ "International Council Of Beverages Associations (ICBA)",
330
+ "International Council Of Chemical Associations (ICCA)",
331
+ "International Knowledge Hub Against Plastic Pollution (IKHAPP)",
332
+ "International Labour Office",
333
+ "International Labour Organization (ILO)",
334
+ "International Maritime Organization (IMO)",
335
+ "International Medical Crisis Response Alliance (IMCRA)",
336
+ "International Movement For Advancement Of Education Culture Social and Economic Development (IMAESED)",
337
+ "International Network For Bamboo And Rattan (INBAR)",
338
+ "International Organization For Standardization (ISO)",
339
+ "International Pollutants Elimination Network (IPEN)",
340
+ "International Science Council (ISC)",
341
+ "International Society Of Doctors For The Environment (ISDE)",
342
+ "International Solid Waste Association (ISWA)",
343
+ "International Trade Union Confederation (ITUC)",
344
+ "International Union For Conservation Of Nature And Natural Resources (IUCN)",
345
+ "Inuit Circumpolar Council (ICC)",
346
+ "Japan Clean Ocean Material Alliance (CLOMA)",
347
+ "King Abdullah Petroleum Studies And Research Center (KAPSARC)",
348
+ "Krityanand UNESCO Club, Jamshedpur",
349
+ "La Grande Puissance De Dieu",
350
+ "Latin American Organizations - Alianza Basura Cero",
351
+ "Ligue Camerounaise Des Droits De L'Homme",
352
+ "Litter4tokens South Africa NPO",
353
+ "Local And Subnational Government Working Group",
354
+ "Local Authorities",
355
+ "Loop",
356
+ "Major Alliance Education Centre (MAEC)",
357
+ "Major Group For Children And Youth",
358
+ "MarViva",
359
+ "Marine Ecosystems Protected Areas (MEPA) Trust",
360
+ "Members Of Microplastics Working Group",
361
+ "Mexican Network Of Ecological Action",
362
+ "Minderoo Foundation",
363
+ "Ministry Of Environment And Wildlife - Southwest State Of Somalia",
364
+ "Moms Clean Air Force",
365
+ "Multifaith Action Group On Pollution",
366
+ "NGO Major Group",
367
+ "NORCE On Behalf Of The North Atlantic Microplastic Centre (NAMC)",
368
+ "National Old Folks Of Liberia (NOFOL)",
369
+ "National Retail Association (NRA)",
370
+ "Natural Resources Defense Council (NRDC)",
371
+ "Neste",
372
+ "Nexus For Health, Environment, And Development (Nexus3) Foundation",
373
+ "Nipe Fagio",
374
+ "No Balloon Release Australia",
375
+ "No More Butts",
376
+ "Norwegian Academy Of International Law (NAIL)",
377
+ "Norwegian Institute For Water Research",
378
+ "Norwegian Institute For Water Research (NIVA)",
379
+ "Norwegian Research Centre (NORCE)",
380
+ "ONG Jeunesse Active De Guin\u00e9e (JAG)",
381
+ "Occidental Arts And Ecology Center (OAEC)",
382
+ "Ocean Conservancy",
383
+ "Ocean Recovery Alliance",
384
+ "Ocean. Now",
385
+ "OceanCare",
386
+ "OceanCare Global Ghost Gear Initiative",
387
+ "Office Of The UN High Commissioner For Human Rights (OHCHR)",
388
+ "OpenOceans Global",
389
+ "Organisation For Economic Co-operation And Development (OECD)",
390
+ "Organization Of Arab Petroleum Exporting Countries (OAPEC)",
391
+ "Organization Of The Petroleum Exporting Countries (OPEC)",
392
+ "Our Sea Of East Asia Network (OSEAN)",
393
+ "Out For Sustainability",
394
+ "PCX Solutions (HOPEx Environment Group, Inc)",
395
+ "Pacific Environment And Resources Center (Pacific Environment)",
396
+ "Pan American Neuroendocrine Society",
397
+ "Partnerships For Change",
398
+ "Paryavaran Mitra",
399
+ "PetStar",
400
+ "Planeteer Alliance And Captain Planet Foundation",
401
+ "Plastalliance - Alliance Plasturgie Et Composites Du Futur",
402
+ "Plastic Change",
403
+ "Plastic Free Foundation",
404
+ "Plastic Free Future",
405
+ "Plastic Oceans Australasia",
406
+ "Plastic Pollution Coalition",
407
+ "Plastics Federation Of South Africa",
408
+ "Plastics Industry Association",
409
+ "PlasticsEurope",
410
+ "Plasticulture",
411
+ "ProDelphinus",
412
+ "Public Services International (PSI)",
413
+ "RAPAL",
414
+ "Rapal Uruguay",
415
+ "Recycling Partnership",
416
+ "Red De Acci\u00f3n Ecol\u00f3gica De M\u00e9xico",
417
+ "Red De Acci\u00f3n Por Los Derechos Ambientales",
418
+ "Red Mexicana De Accion Ecologica (Accion Ecologica)",
419
+ "Regions4 Sustainable Development",
420
+ "Reloop Platform",
421
+ "Resolve",
422
+ "Royal Society Of Chemistry",
423
+ "Samo Foundation",
424
+ "Sanid Organization For Relief And Development (SORD)",
425
+ "Sasakawa Peace Foundation",
426
+ "Saudi Green Building Forum",
427
+ "Sciaena",
428
+ "Scientists\u2019 Coalition For An Effective Plastics Treaty (Scientists\u2019 Coalition)",
429
+ "Secretariat For The Pacific Regional Environment Programme",
430
+ "Secretariat Of The Basel, Rotterdam And Stockholm Conventions",
431
+ "Secretariat Of The Convention On The Protection And Use Of Transboundary Watercourses And International Lakes (Water Convention)",
432
+ "Secretariat Of The Pacific Regional Environment Programme (SPREP)",
433
+ "Secretariat Of The WHO Framework Convention On Tobacco Control",
434
+ "Secretariats Of The Basel, Rotterdam And Stockholm Conventions",
435
+ "Shenzhen Zero Waste",
436
+ "Smoke Free Partnership, A Member Of The Stop Tobacco Pollution Alliance (STPA)",
437
+ "Sociedad Peruana De Derecho Ambiental (Peruvian Society Of Environmental Law)",
438
+ "Somali Sustainable Development Organisation (SOSDO)",
439
+ "Somali Youth Development Foundation (SYDF)",
440
+ "South Asia Cooperative Environment Programme",
441
+ "Stand Earth",
442
+ "Stevenson Holistic Care Foundation (SHCF)",
443
+ "Stichting CEFLEX \u2013 The Circular Economy For Flexible Packaging Initiative",
444
+ "Stiftelsen Stockholm International Water Institute",
445
+ "Styrenics Industry",
446
+ "Sustainable Coastlines Charitable Trust",
447
+ "Sustainable Environment Food And Agriculture Initiative",
448
+ "Swedish Society For Nature Conservation (SSNC)",
449
+ "Systemiq",
450
+ "T/A Plastics SA",
451
+ "Take 3 For The Sea",
452
+ "Taller Ecologista",
453
+ "Tangaroa Blue Foundation",
454
+ "Tearfund",
455
+ "Thailand",
456
+ "The Australian Marine Conservation Society",
457
+ "The Center For Oceanic Awareness, Research, And Education (COARE)",
458
+ "The Descendants Project",
459
+ "The Fletcher School",
460
+ "The Global Organization For PHA (GO!PHA)",
461
+ "The Nature Conservancy",
462
+ "The Ocean Cleanup",
463
+ "The Pew Charitable Trusts",
464
+ "The Sea Cleaners",
465
+ "The Society Of Native Nations",
466
+ "The Terracycle Foundation",
467
+ "The Vinyl Institute",
468
+ "Toxics Link",
469
+ "Toxisphera, Mingas Por El Mar",
470
+ "Trade Unions Major Group",
471
+ "Trash Hero World",
472
+ "Trash4tokens NGO",
473
+ "Tufts University",
474
+ "U.S. Council For International Business (USCIB)",
475
+ "UN Women\u2019s Major Group",
476
+ "UNESCO Association - Guwahati",
477
+ "UNESCO Chair For Ocean Sustainability",
478
+ "Udisha",
479
+ "Unbutton Fashion",
480
+ "Unions Workers And Wastepickers",
481
+ "United Nations Association of Spain and the Government of Catalonia",
482
+ "United Nations Conference On Trade And Development (UNCTAD)",
483
+ "United Nations Development Programme (UNDP)",
484
+ "United Nations Economic Commission For Europe (UNECE)",
485
+ "United Nations Global Compact",
486
+ "United Nations Human Settlements Programme (UN-Habitat)",
487
+ "United Nations Industrial Development Organization (UNIDO)",
488
+ "United Nations Institute For Training And Research (UNITAR)",
489
+ "United Nations Office For Disaster Risk Reduction (UNDRR)",
490
+ "United Nations Office On Drugs And Crime (UNODC)",
491
+ "United States Council For Business (USCIB)",
492
+ "University Of Wollongong",
493
+ "Unplastify",
494
+ "Verra",
495
+ "Vital Strategies",
496
+ "WWF-Australia",
497
+ "Waste Free Oceans",
498
+ "Whole World",
499
+ "William Ruto",
500
+ "Women In Informal Employment Globalizing And Organizing (WIEGO)",
501
+ "Women Working Group",
502
+ "Wonjin Institute For Occupational And Environmental Health (WIOEH)",
503
+ "Workers And Trade Unions Major Group",
504
+ "Working Group On Marine Litter (WGML) Of Coordinating Body On The Seas Of East Asia (COBSEA)",
505
+ "World Against Single Use Plastic (WASUP)",
506
+ "World Business Council For Sustainable Development (WBCSD)",
507
+ "World Economic Forum And Global Plastic Action Partnership (GPAP)",
508
+ "World Health Organisation",
509
+ "World Health Organization, Including The Secretariat Of The WHO Framework Convention On Tobacco Control",
510
+ "World Plastics Council (WPC)",
511
+ "World Welfare Association",
512
+ "World Wide Fund For Nature (WWF)",
513
+ "Wrap",
514
+ "Youth Alive Uganda",
515
+ "Youth Focus Group",
516
+ "Yunus Environment Hub",
517
+ "Zero Waste Europe",
518
+ "Zoological Society Of London (ZSL)"
519
+ ]
520
+ }
data/data wrangling/readme.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ data validation: check data for inconsistencies and errors
2
+ data wrangling: update dataset
data/draftcat_taxonomy.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ { "part i": { "i": "part i: gerneral",
2
+ "i1": "preamble",
3
+ "i2": "objective",
4
+ "i3": "definition",
5
+ "i4": "principles",
6
+ "i5": "scope"},
7
+ "part ii": { "ii": "part ii: gerneral",
8
+ "ii1": "primary plastic polymers",
9
+ "ii2": "chemicals and polymers of concern",
10
+ "ii3": "problematic and avoidable plastic products, including short-lived and single-use plastic products and intentionally added microplastics",
11
+ "ii3a": "problematic and avoidable plastic products, including short-lived and single-use plastic products",
12
+ "ii3b": "intentionally added microplastics",
13
+ "ii4": "exemptions available to a Party upon request",
14
+ "ii4bis":"dedicated programmes of work",
15
+ "ii5": "product design, composition and performance",
16
+ "ii5a": "product design and performance",
17
+ "ii5b": "reduce, reuse, refill and repair of plastics and plastic products",
18
+ "ii5c": "use of recycled plastic contents",
19
+ "ii5d": "alternative plastics and plastic products",
20
+ "ii6": "non-plastic substitutes",
21
+ "ii7": "extended producer responsibility",
22
+ "ii8": "emissions and releases of plastic throughout its life cycle",
23
+ "ii9": "waste management",
24
+ "ii9a":"waste management",
25
+ "ii9b":"fishing gear",
26
+ "ii10": "trade in listed chemicals, polymers and products, and in plastic waste",
27
+ "ii10a": "trade in listed chemicals, polymers and products",
28
+ "ii10b": "transboundary movement of plastic waste",
29
+ "ii11": "existing plastic pollution, including in the marine environment",
30
+ "ii12": "just transition",
31
+ "ii13": "transparency, tracking, monitoring and labeling",
32
+ "ii13bis": "overarching provision related to part ii"},
33
+ "part iii": { "iii": "iii gerneral",
34
+ "iii1": "financing",
35
+ "iii2": "capacity-building, technical assistance and technology transfer"},
36
+ "part iv": { "iv": "part iv: gerneral",
37
+ "iv1": "national plans",
38
+ "iv2": "implementation and compliance",
39
+ "iv3": "reporting on progress",
40
+ "iv4": "periodic assessment and monitoring of the progress of implementation of the instrument* and effectiveness evaluation",
41
+ "iv4a": "effectiveness evaluation",
42
+ "iv4b": "review of chemicals and polymers of concern, microplastics and problematic and avoidable products",
43
+ "iv5": "international cooperation",
44
+ "iv6": "information exchange",
45
+ "iv7": "awareness-raising, education and research",
46
+ "iv8": "stakeholder engagement",
47
+ "iv8bis":"health aspects"},
48
+ "part v": {"v": "part v: gerneral",
49
+ "v0": "institutional arrangements",
50
+ "v1": "governing body",
51
+ "v2": "subsidiary bodies",
52
+ "v3": "secretariat"},
53
+ "part vi": {"vi": "part vi: gerneral",
54
+ "vi1": "final provisions"}
55
+ }
data/draftcat_taxonomy_filter.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Part I": [ "part i: general",
3
+ "preamble",
4
+ "objective",
5
+ "definition",
6
+ "principles",
7
+ "scope"
8
+ ],
9
+ "Part II": ["part ii: general",
10
+ "primary plastic polymers",
11
+ "chemicals and polymers of concern",
12
+ "problematic and avoidable plastic products, including short-lived and single-use plastic products and intentionally added microplastics",
13
+ "problematic and avoidable plastic products, including short-lived and single-use plastic products",
14
+ "intentionally added microplastics",
15
+ "micro- and nanoplastics",
16
+ "exemptions available to a party upon request",
17
+ "dedicated programmes of work",
18
+ "product design, composition and performance",
19
+ "product design and performance",
20
+ "reduce, reuse, refill and repair of plastics and plastic products",
21
+ "use of recycled plastic contents",
22
+ "alternative plastics and plastic products",
23
+ "non-plastic substitutes",
24
+ "extended producer responsibility",
25
+ "emissions and releases of plastic throughout its life cycle",
26
+ "waste management",
27
+ "waste management",
28
+ "fishing gear",
29
+ "trade in listed chemicals, polymers and products, and in plastic waste",
30
+ "trade in listed chemicals, polymers and products",
31
+ "transboundary movement of plastic waste",
32
+ "existing plastic pollution, including in the marine environment",
33
+ "just transition",
34
+ "transparency, tracking, monitoring and labeling",
35
+ "overarching provision related to part ii"
36
+ ],
37
+ "Part III": ["part iii: general",
38
+ "financing",
39
+ "capacity-building, technical assistance and technology transfer"
40
+ ],
41
+ "Part IV": ["part iv: general",
42
+ "national plans",
43
+ "implementation and compliance",
44
+ "reporting on progress",
45
+ "periodic assessment and monitoring of the progress of implementation of the instrument* and effectiveness evaluation",
46
+ "assessment and monitoring",
47
+ "effectiveness evaluation",
48
+ "review of chemicals and polymers of concern, microplastics and problematic and avoidable products",
49
+ "international cooperation",
50
+ "information exchange",
51
+ "awareness-raising, education and research",
52
+ "stakeholder engagement",
53
+ "health aspects"
54
+ ],
55
+ "Part V": [ "part v: general",
56
+ "institutional arrangements",
57
+ "governing body",
58
+ "subsidiary bodies",
59
+ "secretariat"
60
+ ],
61
+ "Part VI": [ "part vi: general",
62
+ "final provisions"
63
+ ]
64
+ }
data/example_prompts.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "id": 1,
4
+ "question": "What is Malysia's position on chemicals and polymers of concern?"
5
+ },
6
+ {
7
+ "id": 2,
8
+ "question": "Compare Indias and New Zealands position on just transition."
9
+ },
10
+ {
11
+ "id": 3,
12
+ "question": "Do the selected countries prefer a top-down instrument?"
13
+ }
14
+ ]
data/inc_df.csv ADDED
The diff for this file is too large to render. See raw diff
 
data/inc_df_v6_small_4.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:388930deaa6033b41f3c6fb550f841c1bd1384a7ff86f4d0e775922f08617029
3
+ size 28522667
data/taxonomies.txt ADDED
@@ -0,0 +1,497 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ authors_dict = {'Members': {'Countries': ['Algeria',
2
+ 'Angola',
3
+ 'Antigua And Barbuda',
4
+ 'Argentina',
5
+ 'Armenia',
6
+ 'Australia',
7
+ 'Azerbaijan',
8
+ 'Bahrain',
9
+ 'Bangladesh',
10
+ 'Barbados',
11
+ 'Benin',
12
+ 'Bosnia And Herzegovina',
13
+ 'Brazil',
14
+ 'Burkina Faso',
15
+ 'Cambodia',
16
+ 'Cameroon',
17
+ 'Canada',
18
+ 'Chile',
19
+ 'China',
20
+ 'Colombia',
21
+ 'Cook Islands',
22
+ 'Costa Rica',
23
+ 'Cuba',
24
+ 'Democratic Republic Of The Congo',
25
+ 'Dominican Republic',
26
+ 'Ecuador',
27
+ 'Egypt',
28
+ 'El Salvador',
29
+ 'Equatorial Guinea',
30
+ 'Eritrea',
31
+ 'Ethiopia',
32
+ 'Fiji',
33
+ 'France',
34
+ 'Gabon',
35
+ 'Georgia',
36
+ 'Germany',
37
+ 'Ghana',
38
+ 'Grenada',
39
+ 'Guatemala',
40
+ 'Guinea',
41
+ 'Guinea Bissau',
42
+ 'Iceland',
43
+ 'India',
44
+ 'Indonesia',
45
+ 'Islamic Republic Of Iran',
46
+ 'Israel',
47
+ 'Italy',
48
+ 'Jamaica',
49
+ 'Japan',
50
+ 'Jordan',
51
+ 'Kenya',
52
+ 'Kiribati',
53
+ 'Kuwait',
54
+ 'Libya',
55
+ 'Madagscar',
56
+ 'Malawi',
57
+ 'Malaysia',
58
+ 'Mali',
59
+ 'Marshall Islands',
60
+ 'Mauritius',
61
+ 'Mexico',
62
+ 'Micronesia',
63
+ 'Moldova',
64
+ 'Monaco',
65
+ 'Mongolia',
66
+ 'Montenegro',
67
+ 'Morocco',
68
+ 'Nauru',
69
+ 'Nepal',
70
+ 'New Zealand',
71
+ 'Niger',
72
+ 'Nigeria',
73
+ 'Norway',
74
+ 'Oman',
75
+ 'Pakistan',
76
+ 'Palau',
77
+ 'Panama',
78
+ 'Papua New Guinea',
79
+ 'Paraguay',
80
+ 'Peru',
81
+ 'Philippines',
82
+ 'Qatar',
83
+ 'Republic Of Congo',
84
+ 'Republic Of Korea',
85
+ 'Republic Of Moldova',
86
+ 'Russian Federation',
87
+ 'Rwanda',
88
+ 'Samoa',
89
+ 'Saudi Arabia',
90
+ 'Senegal',
91
+ 'Seychelles',
92
+ 'Sierra Leone',
93
+ 'Singapore',
94
+ 'Somalia',
95
+ 'South Africa',
96
+ 'Sri Lanka',
97
+ 'State Of Palestine',
98
+ 'Switzerland',
99
+ 'Syrian Arab Republic',
100
+ 'Tonga',
101
+ 'Trinidad And Tobago',
102
+ 'Tunisia',
103
+ 'Turkey',
104
+ 'Uganda',
105
+ 'Ukraine',
106
+ 'United Arab Emirates',
107
+ 'United Kingdom Of Great Britain And Northern Ireland',
108
+ 'United Republic Of Tanzania',
109
+ 'United States Of America',
110
+ 'Uruguay',
111
+ 'Venezuela',
112
+ 'Vietnam',
113
+ 'Yemen',
114
+ 'Zambia'],
115
+ 'International and Regional State Associations': ['Alliance Of Small Island States (AOSIS)',
116
+ 'Asia-Pacific States',
117
+ 'European Union (EU) And Its 27 Member States',
118
+ 'Federated States Of Micronesia',
119
+ 'Gulf Cooperation Council',
120
+ 'High Ambition Coalition',
121
+ 'Latin American And Caribbean Group (GRULAC)',
122
+ 'Pacific Small Island Developing States (PSIDS)',
123
+ 'The Alliance Of Small Island States (AOSIS)',
124
+ 'The Federated States Of Micronesia',
125
+ 'The Group Of African States',
126
+ 'Western European And Other States']
127
+ },
128
+ 'Intergovernmenta Negotiation Committee': ['Executive Secretary Of The INC', 'Intergovernmental Negotiation Committee'],
129
+ 'Observers and Other Participants': [
130
+ '10YFP Secretariat (One Planet Network)',
131
+ 'Aaina',
132
+ 'ABIPLAST - Brazilian Association Of The Plastic Industry',
133
+ 'Action Et Education Pour Tous (AEPT)',
134
+ 'Action On Smoking And Health (ASH)',
135
+ 'ACTS Organization',
136
+ 'Africa Climate And Environment Foundation (ACEF)',
137
+ 'African Alliance For Health Research Economic Development',
138
+ 'African Environmental Network',
139
+ 'African Petroleum Producers’ Organization (APPO)',
140
+ 'Aguaclara',
141
+ 'Alianza Basura Cero',
142
+ 'Alianza Latinoamericana De Asociaciones De La Industria De Alimentos Y Bebidas (ALAIAB)',
143
+ 'All-China Environment Federation',
144
+ 'Alliance Pour Le Controle Du Tabac En Afrique (ACTA)',
145
+ 'American Fuel and Petrochemical Manufacturers (AFPM)',
146
+ 'And Colectivo Jaguares De Nuestra Madre Tierra',
147
+ 'Aotearoa Plastic Pollution Alliance (APPA)',
148
+ 'Arctic Monitoring And Assessment Programme (AMAP)',
149
+ 'Arnika',
150
+ 'Asian Marine Conservation Association',
151
+ 'Asociación Ecológica Santo Tomás',
152
+ 'Asociación Sustentar',
153
+ 'Association For Promoting Sustainability In Campuses And Communities (APSCC)',
154
+ 'Association For Rural Area Social Modification, Improvement And Nestling (ARASMIN)',
155
+ 'Association For Supporting The SDGs For The United Nations',
156
+ 'Association Institute Of Total Environment (INTEV)',
157
+ 'Association Nationale Du Civisme',
158
+ 'Association Of Leadership You-Lean',
159
+ 'Association Of Plastic Recyclers',
160
+ 'Association Of Solidarity Through Humanitarian Imperative Actions International (ASHIA)',
161
+ 'ASTM International (ASTM)',
162
+ 'Australian Packaging Covenant Organisation (APCO)',
163
+ 'AWTAD Anti-Corruption Organization',
164
+ 'Azul',
165
+ 'BAN Toxics',
166
+ 'Barranquilla + 20',
167
+ 'Basel Action Network (BAN)',
168
+ 'Beijing Greenovation Institute For Public Welfare Development (GHub)',
169
+ 'Brazilian Chemical Industry Association',
170
+ 'Breathe Free Detroit',
171
+ 'Bridgers Association Cameroon',
172
+ 'Bureau Of International Recycling',
173
+ 'Business And Industry Major Group Presented By World Business Council For Sustainable Development',
174
+ 'Business Coalition For A Global Plastics Treaty Convened By The Ellen MacArthur Foundation And WWF, In Collaboration With Aligned Businesses And Financial Institutions, And Supported By NGO Partners',
175
+ 'BVRio',
176
+ 'CAIRPLAS, Cámara Argentina De La Industria De Reciclados Plasticos',
177
+ 'Cámara De La Industria Química Y Petroquímica (CIQyP®), Miembro Del Concejo Internacional De Asociaciones Químicas (ICCA) Y EURECA',
178
+ 'Carrizo Comecrudo Tribe',
179
+ 'CDP',
180
+ 'CEMPRE Colombia – Compromiso Empresarial Para El Reciclaje',
181
+ 'Center For Biological Diversity',
182
+ 'Center For International Environmental Law (CIEL)',
183
+ 'Center For International Law And Governance',
184
+ 'Center For Islamic Studies Of Universitas Nasional, Jakarta',
185
+ 'Center For Oceanic Awareness, Research, And Education (COARE)',
186
+ 'Center For Public Health And Environmental Development (CEPHED)',
187
+ 'Centre D’Accompagnement Des Alternatives Locales De Développement (Caald)',
188
+ 'Centre De Recherche Et D’Education Pour Le Développement (CREPD)',
189
+ 'Centre For Environmental Justice',
190
+ 'Centre For Human Rights And Climate Change Research',
191
+ 'Centre For Science And Environment (CSE)',
192
+ 'Centre International De Droit Comparé De L’environnement',
193
+ 'CESTA',
194
+ 'Chemical And Allied Industries’ Association (CAIA)',
195
+ 'Chia Funkuin Foundation',
196
+ 'Children And Youth Major Group (CYMG)',
197
+ 'Children’s Environmental Health Foundation',
198
+ 'China Biodiversity Conservation And Green Development Foundation (CBCGDF)',
199
+ 'Circular Economy For Flexible Packaging In Europe Initiative (CEFLEX)',
200
+ 'Citeo',
201
+ 'Citizen Consumer And Civic Action Group (CAG)',
202
+ 'Civil Society And Rights Holder Coalition',
203
+ 'Civil Society Organizations In Africa',
204
+ 'Civil Society Organizations In Asia Pacific',
205
+ 'Civil Society Organizations In Latin America',
206
+ 'Co-habiter',
207
+ 'Comité National Contre Le Tabagisme',
208
+ 'Community Action Against Plastic Waste (CAPws)',
209
+ 'Congregations Of St Joseph',
210
+ 'Consciente Colectivo',
211
+ 'Consumer Goods Forum',
212
+ 'Consumers International',
213
+ 'Contact Group 1',
214
+ 'Contact Group 2',
215
+ 'Convention On Biological Diversity (CBD)',
216
+ 'Corporate Accountability',
217
+ 'Council For Scientific And Industrial Research',
218
+ 'Danimer Scientific',
219
+ 'Earth Day',
220
+ 'Earth Law Center',
221
+ 'Ecoplas',
222
+ 'Ecoproject Partnership',
223
+ 'EDANA And INDA',
224
+ 'Ellen MacArthur Foundation',
225
+ 'Endocrine Society',
226
+ 'Engineers Australia',
227
+ 'Entidad Especializada En Plásticos Y Medio Ambiente Para Una Economía Circular (ECOPLAS)',
228
+ 'Entidades Unidas Reafirmando La Economía Circular En Argentina (EURECA)',
229
+ 'Environmental And Social Development Organization (ESDO)',
230
+ 'Environmental Coalition On Standards (ECOS)',
231
+ 'Environmental Development Association (FASEEL)',
232
+ 'Environmental Investigation Agency',
233
+ 'EPS Branchen-en Del Af Plastindustrien (Denmark)',
234
+ 'EPS Industry Alliance',
235
+ 'EURECA',
236
+ 'European Bioplastics (EUBP)',
237
+ 'European Manufacturers Of Expanded Polystyrene (EUMEPS)',
238
+ 'Expanded Polystyrene Australia',
239
+ 'Fauna and Flora International',
240
+ 'Fenceline Watch',
241
+ 'First Modern Agro Tools Common Initiative Group ( FI.MO.AT.C.I.G)',
242
+ 'Food And Agriculture Organization Of The United Nations (FAO)',
243
+ 'Food And Livestock Initiative (FLI Asbl)',
244
+ 'Forum On Trade, Environment and The SDGs (TESS)',
245
+ 'Foundation Of Fokus Nexus Tiga (Nexus3 Foundation)',
246
+ 'French Water Partnership (Partenariat Français Pour L’eau)',
247
+ 'Friends World Committee For Consultation (FWCC)',
248
+ 'Fronteras Comunes',
249
+ 'Fundación Ambiente Y Recursos Naturales',
250
+ 'Fundacion Avina',
251
+ 'Fundación Interamericana Del Corazón (FIC)',
252
+ 'Galapagos Conservation Trust',
253
+ 'Gallifrey Foundation',
254
+ 'Geneva Cities Hub (GCH)',
255
+ 'Gerakan Indonesia Diet Kantong Plastik (GIDKP) - The Indonesia Plastic Bag Diet Movement',
256
+ 'Global Alliance For Incinerator Alternatives (GAIA)',
257
+ 'Global Alliance On Health And Pollution (GAHP)',
258
+ 'Global Cement And Concrete Association',
259
+ 'Global Council For Science And The Environment (GCSE)',
260
+ 'Global Plastics Policy Centre, University Of Portsmouth',
261
+ 'Greenpeace International',
262
+ 'GRID-Arendal',
263
+ 'GroundWork South Africa (GroundWorkSA)',
264
+ 'Haitelmex Foundation',
265
+ 'Hasiru Dala In Collaboration With Eleven Other Civil Society Organizations',
266
+ 'Health And Environment Justice Support (HEJSupport)',
267
+ 'Health Care Without Harm (HCWH)',
268
+ 'Healthy Hospitals Project - PHS',
269
+ 'Human Rights Watch',
270
+ 'ICLEI - Local Governments For Sustainability',
271
+ 'India Institute For Critical Action Centre In Movement (CACIM)',
272
+ 'India Water Foundation',
273
+ 'Indigenous Peoples And Their Communities Major Group',
274
+ 'Indigenous Peoples Representatives',
275
+ 'Indonesian Centre For Environmental Law (ICEL)',
276
+ 'Innovazing Vision',
277
+ 'Institute For Sustainable Development And Research (ISDR)',
278
+ 'Integrated Strategies Forum',
279
+ 'Interamerican Heart Foundation',
280
+ 'International Air Transport Association',
281
+ 'International Alliance Of Waste-pickers',
282
+ 'International Alliance Of Waste Pickers (IAWP)',
283
+ 'International Atomic Energy Agency (IAEA)',
284
+ 'International Center Of Comparative Environmental Law (CIDCE)',
285
+ 'International Centre For Environmental Education And Community Development (ICENECDEV)',
286
+ 'International Chamber Of Commerce (ICC)',
287
+ 'International Council Of Beverages Associations (ICBA)',
288
+ 'International Council Of Chemical Associations (ICCA)',
289
+ 'International Knowledge Hub Against Plastic Pollution (IKHAPP)',
290
+ 'International Labour Organization (ILO)',
291
+ 'International Medical Crisis Response Alliance (IMCRA)',
292
+ 'International Movement For Advancement Of Education Culture Social and Economic Development (IMAESED)',
293
+ 'International Network For Bamboo And Rattan (INBAR)',
294
+ 'International Organization For Standardization (ISO)',
295
+ 'International Pollutants Elimination Network (IPEN)',
296
+ 'International Science Council (ISC)',
297
+ 'International Society Of Doctors For The Environment (ISDE)',
298
+ 'International Solid Waste Association (ISWA)',
299
+ 'International Trade Union Confederation (ITUC)',
300
+ 'International Union For Conservation Of Nature And Natural Resources (IUCN)',
301
+ 'Inuit Circumpolar Council (ICC)',
302
+ 'Japan Clean Ocean Material Alliance (CLOMA)',
303
+ 'King Abdullah Petroleum Studies And Research Center (KAPSARC)',
304
+ 'Krityanand UNESCO Club, Jamshedpur',
305
+ 'La Grande Puissance De Dieu',
306
+ 'Latin American Organizations - Alianza Basura Cero',
307
+ 'Litter4tokens South Africa NPO',
308
+ 'Major Alliance Education Centre (MAEC)',
309
+ 'Major Group For Children And Youth',
310
+ 'Marine Ecosystems Protected Areas (MEPA) Trust',
311
+ 'MarViva',
312
+ 'Members Of Microplastics Working Group',
313
+ 'Mexican Network Of Ecological Action',
314
+ 'Minderoo Foundation',
315
+ 'Ministry Of Environment And Wildlife - Southwest State Of Somalia',
316
+ 'Moms Clean Air Force',
317
+ 'Multifaith Action Group On Pollution',
318
+ 'National Old Folks Of Liberia (NOFOL)',
319
+ 'National Retail Association (NRA)',
320
+ 'Natural Resources Defense Council (NRDC)',
321
+ 'Neste',
322
+ 'Nexus For Health, Environment, And Development (Nexus3) Foundation',
323
+ 'NGO Major Group',
324
+ 'Nipe Fagio',
325
+ 'No Balloon Release Australia',
326
+ 'No More Butts',
327
+ 'NORCE On Behalf Of The North Atlantic Microplastic Centre (NAMC)',
328
+ 'Norwegian Academy Of International Law (NAIL)',
329
+ 'Norwegian Institute For Water Research (NIVA)',
330
+ 'Norwegian Research Centre (NORCE)',
331
+ 'Occidental Arts And Ecology Center (OAEC)',
332
+ 'Ocean Conservancy',
333
+ 'Ocean Recovery Alliance',
334
+ 'Ocean. Now',
335
+ 'OceanCare',
336
+ 'Office Of The UN High Commissioner For Human Rights (OHCHR)',
337
+ 'ONG Jeunesse Active De Guinée (JAG)',
338
+ 'OpenOceans Global',
339
+ 'Organisation For Economic Co-operation And Development (OECD)',
340
+ 'Organization Of Arab Petroleum Exporting Countries (OAPEC)',
341
+ 'Organization Of The Petroleum Exporting Countries (OPEC)',
342
+ 'Our Sea Of East Asia Network (OSEAN)',
343
+ 'Out For Sustainability',
344
+ 'Pacific Environment And Resources Center (Pacific Environment)',
345
+ 'Pan American Neuroendocrine Society',
346
+ 'Partnerships For Change',
347
+ 'Paryavaran Mitra',
348
+ 'PCX Solutions (HOPEx Environment Group, Inc)',
349
+ 'PetStar',
350
+ 'Planeteer Alliance And Captain Planet Foundation',
351
+ 'Plastalliance - Alliance Plasturgie Et Composites Du Futur',
352
+ 'Plastic Change',
353
+ 'Plastic Free Foundation',
354
+ 'Plastic Free Future',
355
+ 'Plastic Oceans Australasia',
356
+ 'Plastic Pollution Coalition',
357
+ 'Plastics Federation Of South Africa',
358
+ 'Plastics Industry Association',
359
+ 'Plasticulture',
360
+ 'ProDelphinus',
361
+ 'RAPAL',
362
+ 'Recycling Partnership',
363
+ 'Red De Acción Ecológica De México',
364
+ 'Red De Acción Por Los Derechos Ambientales',
365
+ 'Red Mexicana De Accion Ecologica (Accion Ecologica)',
366
+ 'Regions4 Sustainable Development',
367
+ 'Reloop Platform',
368
+ 'Resolve',
369
+ 'Samo Foundation',
370
+ 'Sanid Organization For Relief And Development (SORD)',
371
+ 'Sasakawa Peace Foundation',
372
+ 'Sciaena',
373
+ 'Scientists’ Coalition For An Effective Plastics Treaty (Scientists’ Coalition)',
374
+ 'Secretariat For The Pacific Regional Environment Programme',
375
+ 'Secretariat Of The Convention On The Protection And Use Of Transboundary Watercourses And International Lakes (Water Convention)',
376
+ 'Secretariat Of The Pacific Regional Environment Programme (SPREP)',
377
+ 'Secretariat Of The WHO Framework Convention On Tobacco Control',
378
+ 'Secretariats Of The Basel, Rotterdam And Stockholm Conventions',
379
+ 'Shenzhen Zero Waste',
380
+ 'Smoke Free Partnership, A Member Of The Stop Tobacco Pollution Alliance (STPA)',
381
+ 'Sociedad Peruana De Derecho Ambiental (Peruvian Society Of Environmental Law)',
382
+ 'Somali Sustainable Development Organisation (SOSDO)',
383
+ 'Somali Youth Development Foundation (SYDF)',
384
+ 'Stevenson Holistic Care Foundation (SHCF)',
385
+ 'Stichting CEFLEX – The Circular Economy For Flexible Packaging Initiative',
386
+ 'Stiftelsen Stockholm International Water Institute',
387
+ 'Styrenics Industry',
388
+ 'Sustainable Coastlines Charitable Trust',
389
+ 'Sustainable Environment Food And Agriculture Initiative',
390
+ 'Swedish Society For Nature Conservation (SSNC)',
391
+ 'T/A Plastics SA',
392
+ 'Take 3 For The Sea',
393
+ 'Taller Ecologista',
394
+ 'Tangaroa Blue Foundation',
395
+ 'Tearfund',
396
+ 'Thailand',
397
+ 'The Australian Marine Conservation Society',
398
+ 'The Center For Oceanic Awareness, Research, And Education (COARE)',
399
+ 'The Descendants Project',
400
+ 'The Fletcher School',
401
+ 'The Global Organization For PHA (GO!PHA)',
402
+ 'The Ocean Cleanup',
403
+ 'The Pew Charitable Trusts',
404
+ 'The Sea Cleaners',
405
+ 'The Society Of Native Nations',
406
+ 'The Vinyl Institute',
407
+ 'Toxisphera, Mingas Por El Mar',
408
+ 'Trade Unions Major Group',
409
+ 'Trash Hero World',
410
+ 'Trash4tokens NGO',
411
+ 'Tufts University',
412
+ 'U.S. Council For International Business (USCIB)',
413
+ 'Udisha',
414
+ 'Unbutton Fashion',
415
+ 'UNESCO Association - Guwahati',
416
+ 'UNESCO Chair For Ocean Sustainability',
417
+ 'Unions Workers And Wastepickers',
418
+ 'United Nations Conference On Trade And Development (UNCTAD)',
419
+ 'United Nations Development Programme (UNDP)',
420
+ 'United Nations Economic Commission For Europe (UNECE)',
421
+ 'United Nations Global Compact',
422
+ 'United Nations Human Settlements Programme (UN-Habitat)',
423
+ 'United Nations Industrial Development Organization (UNIDO)',
424
+ 'United Nations Institute For Training And Research (UNITAR)',
425
+ 'United Nations Office For Disaster Risk Reduction (UNDRR)',
426
+ 'United Nations Office On Drugs And Crime (UNODC)',
427
+ 'University Of Wollongong',
428
+ 'Unplastify',
429
+ 'Verra',
430
+ 'Vital Strategies',
431
+ 'Waste Free Oceans',
432
+ 'Women In Informal Employment Globalizing And Organizing (WIEGO)',
433
+ 'Wonjin Institute For Occupational And Environmental Health (WIOEH)',
434
+ 'World Against Single Use Plastic (WASUP)',
435
+ 'World Business Council For Sustainable Development (WBCSD)',
436
+ 'World Economic Forum And Global Plastic Action Partnership (GPAP)',
437
+ 'World Health Organisation',
438
+ 'World Health Organization, Including The Secretariat Of The WHO Framework Convention On Tobacco Control',
439
+ 'World Plastics Council (WPC)',
440
+ 'World Wide Fund For Nature (WWF)',
441
+ 'Wrap',
442
+ 'WWF-Australia',
443
+ 'Youth Focus Group',
444
+ 'Yunus Environment Hub',
445
+ 'Zero Waste Europe',
446
+ 'Zoological Society Of London (ZSL)']}
447
+
448
+ authors_dict
449
+
450
+ draftcat_dict = { 'part i': {'i1': 'preamble',
451
+ 'i2': 'objective',
452
+ 'i3': 'definition',
453
+ 'i4': 'principles',
454
+ 'i5': 'scope'},
455
+ 'part ii': {'ii1': 'primary plastic polymers',
456
+ 'ii2': 'chemicals and polymers of concern',
457
+ 'ii3': 'problematic and avoidable plastic products, including short-lived and single-use plastic products and intentionally added microplastics',
458
+ 'ii3a': 'problematic and avoidable plastic products, including short-lived and single-use plastic products',
459
+ 'ii3b': 'intentionally added microplastics',
460
+ 'ii4': 'exemptions available to a Party upon request',
461
+ 'ii5': 'product design, composition and performance',
462
+ 'ii5a': 'product design and performance',
463
+ 'ii5b': 'reduce, reuse, refill and repair of plastics and plastic products',
464
+ 'ii5c': 'use of recycled plastic contents',
465
+ 'ii5d': 'alternative plastics and plastic products',
466
+ 'ii6': 'non-plastic substitutes',
467
+ 'ii7': 'extended producer responsibility',
468
+ 'ii8': 'emissions and releases of plastic throughout its life cycle',
469
+ 'ii9': 'waste management',
470
+ 'ii9a':'waste management',
471
+ 'ii9b':'fishing gear',
472
+ 'ii10': 'trade in listed chemicals, polymers and products, and in plastic waste',
473
+ 'ii10a': 'trade in listed chemicals, polymers and products',
474
+ 'ii10b': 'transboundary movement of plastic waste',
475
+ 'ii11': 'existing plastic pollution, including in the marine environment',
476
+ 'ii12': 'just transition',
477
+ 'ii13': 'transparency, tracking, monitoring and labeling'},
478
+ 'part iii': {'iii1': 'financing',
479
+ 'iii2': 'capacity-building, technical assistance and technology transfer'},
480
+ 'part iv': {'iv1': 'national plans',
481
+ 'iv2': 'implementation and compliance',
482
+ 'iv3': 'reporting on progress ',
483
+ 'iv4': 'periodic assessment and monitoring of the progress of implementation of the instrument* and effectiveness evaluation',
484
+ 'iv4a': 'effectiveness evaluation',
485
+ 'iv4b': 'review of chemicals and polymers of concern, microplastics and problematic and avoidable products',
486
+ 'iv5': 'international cooperation',
487
+ 'iv6': 'information exchange',
488
+ 'iv7': 'awareness-raising, education and research',
489
+ 'iv8': 'stakeholder engagement'},
490
+ 'part v': {'part v': 'institutional arrangements',
491
+ 'v1': 'governing body',
492
+ 'v2': 'subsidiary bodies',
493
+ 'v3': 'secretariat'},
494
+ 'part vi': {'part vi': 'final provisions'}
495
+ }
496
+
497
+ draftcat_dict
database/document_store.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b056059cccd69629ccbb81260f3bf0f8cfa3513d75c56fc402495cf1af2b5a3
3
+ size 133
database/meta_data.csv ADDED
The diff for this file is too large to render. See raw diff
 
poetry.lock ADDED
The diff for this file is too large to render. See raw diff
 
pyproject.toml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [tool.poetry]
2
+ name = "plastic-treaty-app"
3
+ version = "0.1.0"
4
+ description = ""
5
+ authors = ["Rahkakavee Baskaran <[email protected]>"]
6
+ readme = "README.md"
7
+ packages = [{include = "plastic_treaty_app"}]
8
+
9
+ [tool.poetry.dependencies]
10
+ python = "^3.10"
11
+ streamlit = "^1.30.0"
12
+ farm-haystack = "^1.23.0"
13
+ pydantic = "<2"
14
+ load-dotenv = "^0.1.0"
15
+ torch = "^2.1.2"
16
+ nltk = "^3.8.1"
17
+ sentence-transformers = "^2.2.2"
18
+ scikit-learn = "^1.4.1.post1"
19
+ langchain = "^0.1.9"
20
+
21
+
22
+ [tool.poetry.group.dev.dependencies]
23
+ flake8 = "^7.0.0"
24
+ pre-commit = "^3.6.0"
25
+ mypy = "^1.8.0"
26
+ black = "^24.1.1"
27
+
28
+ [build-system]
29
+ requires = ["poetry-core"]
30
+ build-backend = "poetry.core.masonry.api"
src/__init__py ADDED
File without changes
src/analysis/__init__.py ADDED
File without changes
src/app/v1/app.py ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..', '..')))
4
+ import pandas as pd
5
+
6
+
7
+ from src.rag.pipeline import RAGPipeline
8
+ import streamlit as st
9
+ from src.utils.data import (
10
+ build_filter,
11
+ get_filter_values,
12
+ get_meta,
13
+ load_json,
14
+ load_css,
15
+ )
16
+ from src.utils.writer import typewriter
17
+
18
+ st.set_page_config(layout="wide")
19
+
20
+
21
+
22
+ EMBEDDING_MODEL = "sentence-transformers/distiluse-base-multilingual-cased-v1"
23
+ PROMPT_TEMPLATE = os.path.join("src", "rag", "prompt_template.yaml")
24
+
25
+
26
+ @st.cache_data
27
+ def load_css_style(path: str) -> None:
28
+ load_css(path)
29
+
30
+
31
+ @st.cache_data
32
+ def get_meta_data() -> pd.DataFrame:
33
+ return pd.read_csv(
34
+ os.path.join("database", "meta_data.csv"), dtype=({"retriever_id": str})
35
+ )
36
+
37
+
38
+ @st.cache_data
39
+ def get_authors_taxonomy() -> dict[str, list[str]]:
40
+ return load_json(os.path.join("data", "authors_filter.json"))
41
+
42
+
43
+ @st.cache_data
44
+ def get_draft_cat_taxonomy() -> dict[str, list[str]]:
45
+ return load_json(os.path.join("data", "draftcat_taxonomy_filter.json"))
46
+
47
+
48
+ @st.cache_data
49
+ def get_example_prompts() -> list[str]:
50
+ return [
51
+ example["question"]
52
+ for example in load_json(os.path.join("data", "example_prompts.json"))
53
+ ]
54
+
55
+
56
+ @st.cache_resource
57
+ def load_pipeline() -> RAGPipeline:
58
+ return RAGPipeline(
59
+ embedding_model=EMBEDDING_MODEL,
60
+ prompt_template=PROMPT_TEMPLATE,
61
+ )
62
+
63
+
64
+ @st.cache_data
65
+ def load_app_init() -> None:
66
+ # Define the title of the app
67
+ st.title("INC Plastic Treaty - Q&A")
68
+
69
+ # add warning emoji and style
70
+ st.markdown(
71
+ """
72
+ <p class="remark"> ⚠️ Remark:
73
+ The app is a beta version that serves as a basis for further development. We are aware that the performance is not yet sufficient and that the data basis is not yet complete. We are grateful for any feedback that contributes to the further development and improvement of the app!
74
+ """,
75
+ unsafe_allow_html=True,
76
+ )
77
+
78
+ # add explanation to the app
79
+ st.markdown(
80
+ """
81
+ <p class="description">
82
+ The app aims to facilitate the search for information and documents related to the UN Plastics Treaty Negotiations. The database includes all relevant documents that are available <a href=https://www.unep.org/inc-plastic-pollution target="_blank">here</a>. Users can query the data through a chatbot. Please note that, due to technical constraints, only a maximum of 10 documents can be used to generate the answer. A comprehensive response can therefore not be guaranteed. However, all relevant documents can be accessed via a link using the filter functions.
83
+ Filter functions are available to narrow down the data by country/author, zero draft categories and negotiation rounds. Pre-selecting relevant data enhances the accuracy of generated answers. Additionally, all documents selected via the filter function can be accessed via a link.
84
+ """,
85
+ unsafe_allow_html=True,
86
+ )
87
+
88
+
89
+ load_css_style("style/style.css")
90
+
91
+
92
+ # Load the data
93
+ metadata = get_meta_data()
94
+ authors_taxonomy = get_authors_taxonomy()
95
+ draft_cat_taxonomy = get_draft_cat_taxonomy()
96
+ example_prompts = get_example_prompts()
97
+
98
+ # Load pipeline
99
+ pipeline = load_pipeline()
100
+
101
+ # Load app init
102
+ load_app_init()
103
+
104
+
105
+ filter_col = st.columns(1)
106
+ # Filter column
107
+ with filter_col[0]:
108
+ st.markdown("## Select Filters")
109
+ author_col, round_col, draft_cat_col = st.columns([1, 1, 1])
110
+
111
+ with author_col:
112
+ st.markdown("### Authors")
113
+ selected_author_parent = st.multiselect(
114
+ "Entity Parent", list(authors_taxonomy.keys())
115
+ )
116
+
117
+ available_child_items = []
118
+ for category in selected_author_parent:
119
+ available_child_items.extend(authors_taxonomy[category])
120
+
121
+ selected_authors = st.multiselect("Entity", available_child_items)
122
+
123
+ with round_col:
124
+ st.markdown("### Round")
125
+ negotiation_rounds = get_filter_values(metadata, "round")
126
+ selected_rounds = st.multiselect("Round", negotiation_rounds)
127
+
128
+ with draft_cat_col:
129
+ st.markdown("### Draft Categories")
130
+ selected_draft_cats_parent = st.multiselect(
131
+ "Draft Categories Parent", list(draft_cat_taxonomy.keys())
132
+ )
133
+ available_draft_cats_child_items = []
134
+ for category in selected_draft_cats_parent:
135
+ available_draft_cats_child_items.extend(draft_cat_taxonomy[category])
136
+
137
+ selected_draft_cats = st.multiselect(
138
+ "Draft Categories", available_draft_cats_child_items
139
+ )
140
+
141
+
142
+ prompt_col, output_col = st.columns([1, 1.5])
143
+ # make the buttons text smaller
144
+
145
+
146
+ # GPT column
147
+ with prompt_col:
148
+ st.markdown("## Filter documents")
149
+ st.markdown(
150
+ """
151
+ * The filter function allows you to see all documents that match the selected filters.
152
+ * Additionally, all documents selected via the filter function can be accessed via a link.
153
+ * Alternatively, you can ask a question to the model. The model will then provide you with an answer based on the filtered documents.
154
+ """
155
+ )
156
+ trigger_filter = st.session_state.setdefault("trigger", False)
157
+ if st.button("Filter documents"):
158
+ filter_selection_transformed = build_filter(
159
+ meta_data=metadata,
160
+ authors_filter=selected_authors,
161
+ draft_cats_filter=selected_draft_cats,
162
+ round_filter=selected_rounds,
163
+ )
164
+ documents = pipeline.document_store.get_all_documents(
165
+ filters=filter_selection_transformed
166
+ )
167
+ trigger_filter = True
168
+
169
+ st.markdown("## Ask a question")
170
+ if "prompt" not in st.session_state:
171
+ prompt = st.text_area("")
172
+ if (
173
+ "prompt" in st.session_state
174
+ and st.session_state.prompt in example_prompts # noqa: E501
175
+ ): # noqa: E501
176
+ prompt = st.text_area(
177
+ "Enter a question", value=st.session_state.prompt
178
+ ) # noqa: E501
179
+ if (
180
+ "prompt" in st.session_state
181
+ and st.session_state.prompt not in example_prompts # noqa: E501
182
+ ): # noqa: E501
183
+ del st.session_state["prompt"]
184
+ prompt = st.text_area("Enter a question")
185
+
186
+ trigger_ask = st.session_state.setdefault("trigger", False)
187
+ if st.button("Ask"):
188
+ with st.status("Filtering documents...", expanded=False) as status:
189
+ if filter_selection_transformed == {}:
190
+ st.warning(
191
+ "No filters selected. We highly recommend to use filters otherwise the answer might not be accurate. In addition you might experience performance issues since the model has to analyze all available documents."
192
+ )
193
+ filter_selection_transformed = build_filter(
194
+ meta_data=metadata,
195
+ authors_filter=selected_authors,
196
+ draft_cats_filter=selected_draft_cats,
197
+ round_filter=selected_rounds,
198
+ )
199
+
200
+ documents = pipeline.document_store.get_all_documents(
201
+ filters=filter_selection_transformed
202
+ )
203
+ status.update(
204
+ label="Filtering documents completed!", state="complete", expanded=False
205
+ )
206
+ with st.status("Answering question...", expanded=True) as status:
207
+ result = pipeline(prompt=prompt, filters=filter_selection_transformed)
208
+ trigger_ask = True
209
+ status.update(
210
+ label="Answering question completed!", state="complete", expanded=False
211
+ )
212
+
213
+ st.markdown("### Examples")
214
+ st.markdown(
215
+ """
216
+ * These are example prompts that can be used to ask questions to the model
217
+ * Click on a prompt to use it as a question. You can also type your own question in the text area above.
218
+ * For questions like "How do country a, b and c [...]" please make sure to select the countries in the filter section. Otherwise the answer will not be accurate. In general we highly recommend to use the filter functions to narrow down the data.
219
+ """
220
+ )
221
+
222
+ for i, prompt in enumerate(example_prompts):
223
+ # with col[i % 4]:
224
+ if st.button(prompt):
225
+ if "key" not in st.session_state:
226
+ st.session_state["prompt"] = prompt
227
+ # Define the button
228
+
229
+
230
+ if trigger_ask:
231
+ with output_col:
232
+ meta_data = get_meta(result=result)
233
+ answer = result["answers"][0].answer
234
+
235
+ meta_data_cleaned = []
236
+ seen_retriever_ids = set()
237
+
238
+ for data in meta_data:
239
+ retriever_id = data["retriever_id"]
240
+ content = data["content"]
241
+ if retriever_id not in seen_retriever_ids:
242
+ meta_data_cleaned.append(
243
+ {
244
+ "retriever_id": retriever_id,
245
+ "href": data["href"],
246
+ "content": [content],
247
+ }
248
+ )
249
+ seen_retriever_ids.add(retriever_id)
250
+ else:
251
+ for i, item in enumerate(meta_data_cleaned):
252
+ if item["retriever_id"] == retriever_id:
253
+ meta_data_cleaned[i]["content"].append(content)
254
+
255
+ references = ["\n"]
256
+ for data in meta_data_cleaned:
257
+ retriever_id = data["retriever_id"]
258
+ href = data["href"]
259
+ references.append(f"-[{retriever_id}]: {href} \n")
260
+ st.write("#### 📌 Answer")
261
+ typewriter(
262
+ text=answer,
263
+ references=references,
264
+ speed=100,
265
+ )
266
+
267
+ with st.expander("Show more information to the documents"):
268
+ for data in meta_data_cleaned:
269
+ markdown_text = f"- Document: {data['retriever_id']}\n"
270
+ markdown_text += " - Text passages\n"
271
+ for content in data["content"]:
272
+ content = content.replace("[", "").replace("]", "").replace("'", "")
273
+ content = " ".join(content.split())
274
+ markdown_text += f" - {content}\n"
275
+ st.write(markdown_text)
276
+
277
+ col4 = st.columns(1)
278
+ with col4[0]:
279
+ references = []
280
+ for document in documents:
281
+ authors = document.meta["author"]
282
+ authors = authors.replace("'", "").replace("[", "").replace("]", "")
283
+ href = document.meta["href"]
284
+ source = f"- {authors}: {href}"
285
+ references.append(source)
286
+ references = list(set(references))
287
+ references = sorted(references)
288
+ st.markdown("### Overview of all filtered documents")
289
+ st.markdown(
290
+ f"<p class='description'> The answer above results from the most similar text passages (top 7) from the documents that you can find under 'References' in the answer block. Below you will find an overview of all documents that match the filters you have selected. Please note that the above answer is based specifically on the highlighted references above and does not include the findings from all the filtered documents shown below. \n For your current filtering, {len(references)} documents were found. </p>",
291
+ unsafe_allow_html=True,
292
+ )
293
+ for reference in references:
294
+ st.write(reference)
295
+ trigger = 0
296
+
297
+
298
+ if trigger_filter:
299
+ with output_col:
300
+ references = []
301
+ for document in documents:
302
+ authors = document.meta["author"]
303
+ authors = authors.replace("'", "").replace("[", "").replace("]", "")
304
+ href = document.meta["href"]
305
+ round_ = document.meta["round"]
306
+ draft_labs = document.meta["draft_labs"]
307
+ references.append(
308
+ {
309
+ "author": authors,
310
+ "href": href,
311
+ "draft_labs": draft_labs,
312
+ "round": round_,
313
+ }
314
+ )
315
+ references = pd.DataFrame(references)
316
+ references = references.drop_duplicates()
317
+ st.markdown("### Overview of all filtered documents")
318
+ # show
319
+ # make columns author and draft_labs bigger and make href width smaller and round width smaller
320
+ st.dataframe(
321
+ references,
322
+ hide_index=True,
323
+ column_config={
324
+ "author": st.column_config.ListColumn("Authors"),
325
+ "href": st.column_config.LinkColumn("Link to Document"),
326
+ "draft_labs": st.column_config.ListColumn("Draft Categories"),
327
+ "round": st.column_config.NumberColumn("Round"),
328
+ },
329
+ )
src/app/v2/app.py ADDED
@@ -0,0 +1,385 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..', '..')))
4
+ import pandas as pd
5
+ import streamlit as st
6
+ import time
7
+
8
+
9
+ from src.rag.pipeline import RAGPipeline
10
+ from src.utils.data_v2 import (
11
+ build_filter,
12
+ get_meta,
13
+ load_json,
14
+ load_css,
15
+ )
16
+ from src.utils.writer import typewriter
17
+
18
+
19
+ st.set_page_config(layout="wide")
20
+
21
+ EMBEDDING_MODEL = "sentence-transformers/distiluse-base-multilingual-cased-v1"
22
+ PROMPT_TEMPLATE = os.path.join("src", "rag", "prompt_template.yaml")
23
+
24
+
25
+ @st.cache_data
26
+ def load_css_style(path: str) -> None:
27
+ load_css(path)
28
+
29
+
30
+ @st.cache_data
31
+ def get_meta_data() -> pd.DataFrame:
32
+ return pd.read_csv(
33
+ os.path.join("database", "meta_data.csv"), dtype=({"retriever_id": str})
34
+ )
35
+
36
+
37
+ @st.cache_data
38
+ def get_df() -> pd.DataFrame:
39
+ return pd.read_csv(
40
+ os.path.join("data", "inc_df.csv"), dtype=({"retriever_id": str})
41
+ )[["retriever_id", "draft_labs", "author", "href", "round"]]
42
+
43
+
44
+ @st.cache_data
45
+ def get_authors_taxonomy() -> list[str]:
46
+ taxonomy = load_json(os.path.join("data", "authors_taxonomy.json"))
47
+ countries = []
48
+ members = taxonomy["Members"]
49
+ for key, value in members.items():
50
+ if key == "Countries" or key == "International and Regional State Associations":
51
+ countries.extend(value)
52
+ return countries
53
+
54
+
55
+ @st.cache_data
56
+ def get_draft_cat_taxonomy() -> dict[str, list[str]]:
57
+ taxonomy = load_json(os.path.join("data", "draftcat_taxonomy_filter.json"))
58
+ draft_labels = []
59
+ for _, subpart in taxonomy.items():
60
+ for label in subpart:
61
+ draft_labels.append(label)
62
+ return draft_labels
63
+
64
+
65
+ @st.cache_data
66
+ def get_example_prompts() -> list[str]:
67
+ return [
68
+ example["question"]
69
+ for example in load_json(os.path.join("data", "example_prompts.json"))
70
+ ]
71
+
72
+
73
+ @st.cache_data
74
+ def set_trigger_state_values() -> tuple[bool, bool]:
75
+ trigger_filter = st.session_state.setdefault("trigger", False)
76
+ trigger_ask = st.session_state.setdefault("trigger", False)
77
+ return trigger_filter, trigger_ask
78
+
79
+
80
+ @st.cache_resource
81
+ def load_pipeline() -> RAGPipeline:
82
+ return RAGPipeline(
83
+ embedding_model=EMBEDDING_MODEL,
84
+ prompt_template=PROMPT_TEMPLATE,
85
+ )
86
+
87
+
88
+ @st.cache_data
89
+ def load_app_init() -> None:
90
+ # Define the title of the app
91
+ st.title("INC Plastic Pollution Country Profile Analysis")
92
+
93
+ # add warning emoji and style
94
+
95
+ st.markdown(
96
+ """
97
+ <div class="remark">
98
+ <div class="remark-content">
99
+ <p class="remark-text" style="font-size: 20px;"> ⚠️ The app is a beta version that serves as a basis for further development. We are aware that the performance is not yet sufficient and that the data basis is not yet complete. We are grateful for any feedback that contributes to the further development and improvement of the app!</p>
100
+ </div>
101
+ </div>
102
+ """,
103
+ unsafe_allow_html=True,
104
+ )
105
+
106
+ st.markdown(
107
+ """
108
+ <a href="mailto:[email protected]" class="feedback-link">Send feedback</a>
109
+ """,
110
+ unsafe_allow_html=True,
111
+ )
112
+
113
+ # add explanation to the app
114
+ st.markdown(
115
+ """
116
+ <p class="description">
117
+ The app is tailored to enhance the efficiency of finding and accessing information on the UN Plastics Treaty Negotiations. It hosts a comprehensive database of relevant documents submitted by the members available <a href="https://www.unep.org/inc-plastic-pollution"> here</a>, which users can explore through an intuitive chatbot interface as well as simple filtering options.
118
+ The app excels in querying specific information about countries and their positions in the negotiations, providing targeted and precise answers. However, it can process only up to 8 relevant documents at a time, which may limit responses to more complex inquiries. Filter options by authors and sections of the negotiation draft ensure the accuracy of the answers. Each document found via these filters is also directly accessible via a link, ensuring complete and easy access to the desired information.
119
+ </p>
120
+ """,
121
+ unsafe_allow_html=True,
122
+ )
123
+
124
+
125
+ load_css_style("style/style.css")
126
+
127
+
128
+ # Load the data
129
+ df = get_df()
130
+ df_transformed = get_meta_data()
131
+ countries = get_authors_taxonomy()
132
+ draft_labels = get_draft_cat_taxonomy()
133
+ example_prompts = get_example_prompts()
134
+ trigger_filter, trigger_ask = set_trigger_state_values()
135
+
136
+ # Load pipeline
137
+ pipeline = load_pipeline()
138
+
139
+ # Load app init
140
+ load_app_init()
141
+
142
+
143
+ application_col = st.columns(1)
144
+
145
+
146
+ with application_col[0]:
147
+ st.markdown("""<p class="header"> 1️⃣ Select countries""", unsafe_allow_html=True)
148
+ st.markdown(
149
+ """
150
+ <p class="description">
151
+ Please select the countries of interest. Your selection will refine the database to include documents submitted by these countries or recognized groupings such as Small Developing States, the African States Group, etc. </p>
152
+ """,
153
+ unsafe_allow_html=True,
154
+ )
155
+ selected_authors = st.multiselect(
156
+ label="country",
157
+ options=countries,
158
+ label_visibility="collapsed",
159
+ placeholder="Select country/countries",
160
+ )
161
+
162
+ st.write("\n")
163
+ st.write("\n")
164
+
165
+ st.markdown(
166
+ """<p class="header"> 2️⃣ Select parts of the negotiation draft""",
167
+ unsafe_allow_html=True,
168
+ )
169
+ st.markdown(
170
+ """
171
+ <p class="description">
172
+ Please select the parts of the negotiation draft of interest. The negotiation draft can be accessed <a href="https://www.unep.org/inc-plastic-pollution/session-4/documents"> here</a>. </p>
173
+ """,
174
+ unsafe_allow_html=True,
175
+ )
176
+ selected_draft_cats = st.multiselect(
177
+ label="Subpart",
178
+ options=draft_labels,
179
+ label_visibility="collapsed",
180
+ placeholder="Select draft category/draft categories",
181
+ )
182
+
183
+ st.write("\n")
184
+ st.write("\n")
185
+
186
+ st.markdown(
187
+ """<p class="header"> 3️⃣ Ask a question or show documents based on selected filters""",
188
+ unsafe_allow_html=True,
189
+ )
190
+
191
+ asking, filtering = st.tabs(["Ask a question", "Filter documents"])
192
+
193
+ with filtering:
194
+ application_col_filter, output_col_filter = st.columns([1, 1.5])
195
+ # make the buttons text smaller
196
+ with application_col_filter:
197
+ st.markdown(
198
+ """
199
+ <p class="description">
200
+ This filter function allows you to see all documents that match the selected filters. The documents can be accessed via a link. \n
201
+ """,
202
+ unsafe_allow_html=True,
203
+ )
204
+ if st.button("Filter documents"):
205
+ filters, status = build_filter(
206
+ meta_data=df_transformed,
207
+ authors_filter=selected_authors,
208
+ draft_cats_filter=selected_draft_cats,
209
+ )
210
+ if status == "no filters selected":
211
+ st.info("No filters selected. All documents will be shown.")
212
+ df_filtered = df[
213
+ ["author", "href", "draft_labs", "round"]
214
+ ].sort_values(by="author")
215
+ trigger_filter = True
216
+ if status == "no results found":
217
+ st.info(
218
+ "No documents found for the combination of filters you've chosen. All countries are represented at least once in the data. Remove the draft categories to see all documents for the countries selected or try other draft categories."
219
+ )
220
+ if status == "success":
221
+ df_filtered = df[df["retriever_id"].isin(filters["retriever_id"])][
222
+ ["author", "href", "draft_labs", "round"]
223
+ ].sort_values(by="author")
224
+ trigger_filter = True
225
+
226
+ with asking:
227
+ application_col_ask, output_col_ask = st.columns([1, 1.5])
228
+ with application_col_ask:
229
+ st.markdown(
230
+ """
231
+ <p class="description"> Ask a question, noting that the database has been restricted by filters and that your question should pertain to the selected data. \n
232
+ """,
233
+ unsafe_allow_html=True,
234
+ )
235
+ if "prompt" not in st.session_state:
236
+ prompt = st.text_area("Enter a question")
237
+ if (
238
+ "prompt" in st.session_state
239
+ and st.session_state.prompt in example_prompts # noqa: E501
240
+ ): # noqa: E501
241
+ prompt = st.text_area(
242
+ "Enter a question", value=st.session_state.prompt
243
+ ) # noqa: E501
244
+ if (
245
+ "prompt" in st.session_state
246
+ and st.session_state.prompt not in example_prompts # noqa: E501
247
+ ): # noqa: E501
248
+ del st.session_state["prompt"]
249
+ prompt = st.text_area("Enter a question")
250
+
251
+ trigger_ask = st.session_state.setdefault("trigger", False)
252
+
253
+ if st.button("Ask"):
254
+ if prompt == "":
255
+ st.error(
256
+ "Please enter a question. Reloading the app in few seconds"
257
+ )
258
+ time.sleep(3)
259
+ st.rerun()
260
+ with st.spinner("Filtering data...") as status:
261
+ filter_selection_transformed, status = build_filter(
262
+ meta_data=df_transformed,
263
+ authors_filter=selected_authors,
264
+ draft_cats_filter=selected_draft_cats,
265
+ )
266
+
267
+ if status == "no filters selected":
268
+ st.info(
269
+ "No filters selcted.This will increase the prcessing time significantly. Please select at least one filter."
270
+ )
271
+ # st.error(
272
+ # "Selecting a filter is mandatory. We especially recommend to select countries you are interested in. Selecting at least one filter is mandatory, because otherwise the model would have to analyze all available documents which results in inaccurate answers and long processing times. Please select at least one filter."
273
+ # )
274
+ # st.stop()
275
+
276
+ documents = pipeline.document_store.get_all_documents(
277
+ filters=filter_selection_transformed
278
+ )
279
+
280
+ st.success("Filtering data completed.")
281
+ with st.spinner("Answering question...") as status:
282
+ if filter_selection_transformed == {}:
283
+ st.warning(
284
+ "The combination of filters you've chosen does not match any documents. Giving answer based on all documents. Please note that the answer might not be accurate. We highly recommend to use a combination of filters that match the data. All countries are represented at least once in the data. Thus, for example, you could remove the draft categories to match the documents. Or you could check with the Filter documents function which documents are available for the selected countries by removing the draft categories and filter the documents."
285
+ )
286
+
287
+ result = pipeline.run(
288
+ prompt=prompt, filters=filter_selection_transformed
289
+ )
290
+ trigger_ask = True
291
+ st.success("Answering question completed.")
292
+
293
+ st.markdown("### Examples")
294
+ for i, prompt in enumerate(example_prompts):
295
+ # with col[i % 4]:
296
+ if st.button(prompt):
297
+ if "key" not in st.session_state:
298
+ st.session_state["prompt"] = prompt
299
+ st.markdown(
300
+ """
301
+ <ul class="description" style="font-size: 20px;">
302
+ <li style="font-size: 17px;">These are example prompts that can be used to ask questions to the model</li>
303
+ <li style="font-size: 17px;">Click on a prompt to use it as a question. You can also type your own question in the text area above.</li>
304
+ <li style="font-size: 17px;">For questions like "How do country a, b and c [...]" please make sure to select the countries in the filter section. Otherwise the answer will not be accurate. In general we highly recommend to use the filter functions to narrow down the data.</li>
305
+ </ul>
306
+ """,
307
+ unsafe_allow_html=True,
308
+ )
309
+
310
+ # for i, prompt in enumerate(example_prompts):
311
+ # # with col[i % 4]:
312
+ # if st.button(prompt):
313
+ # if "key" not in st.session_state:
314
+ # st.session_state["prompt"] = prompt
315
+ # Define the button
316
+
317
+ if trigger_ask:
318
+ with output_col_ask:
319
+ if result is None:
320
+ st.error(
321
+ "Open AI rate limit exceeded. Please try again in a few seconds."
322
+ )
323
+ st.stop()
324
+ meta_data = get_meta(result=result)
325
+ answer = result["answers"][0].answer
326
+
327
+ meta_data_cleaned = []
328
+ seen_retriever_ids = set()
329
+
330
+ for data in meta_data:
331
+ retriever_id = data["retriever_id"]
332
+ content = data["content"]
333
+ if retriever_id not in seen_retriever_ids:
334
+ meta_data_cleaned.append(
335
+ {
336
+ "retriever_id": retriever_id,
337
+ "href": data["href"],
338
+ "content": [content],
339
+ }
340
+ )
341
+ seen_retriever_ids.add(retriever_id)
342
+ else:
343
+ for i, item in enumerate(meta_data_cleaned):
344
+ if item["retriever_id"] == retriever_id:
345
+ meta_data_cleaned[i]["content"].append(content)
346
+
347
+ references = ["\n"]
348
+ for data in meta_data_cleaned:
349
+ retriever_id = data["retriever_id"]
350
+ href = data["href"]
351
+ references.append(f"-[{retriever_id}]: {href} \n")
352
+ st.write("#### 📌 Answer")
353
+ typewriter(
354
+ text=answer,
355
+ references=references,
356
+ speed=100,
357
+ )
358
+
359
+ with st.expander("Show more information to the documents"):
360
+ for data in meta_data_cleaned:
361
+ markdown_text = f"- Document: {data['retriever_id']}\n"
362
+ markdown_text += " - Text passages\n"
363
+ for content in data["content"]:
364
+ content = (
365
+ content.replace("[", "").replace("]", "").replace("'", "")
366
+ )
367
+ content = " ".join(content.split())
368
+ markdown_text += f" - {content}\n"
369
+ st.write(markdown_text)
370
+
371
+ trigger = 0
372
+
373
+ if trigger_filter:
374
+ with output_col_filter:
375
+ st.markdown("### Overview of all filtered documents")
376
+ st.dataframe(
377
+ df_filtered,
378
+ hide_index=True,
379
+ column_config={
380
+ "author": st.column_config.ListColumn("Authors"),
381
+ "href": st.column_config.LinkColumn("Link to Document"),
382
+ "draft_labs": st.column_config.ListColumn("Draft Categories"),
383
+ "round": st.column_config.NumberColumn("Round"),
384
+ },
385
+ )
src/data_processing/__init__.py ADDED
File without changes
src/data_processing/document_store_data.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import ast
3
+ import json
4
+
5
+ DATASET = "data/inc_df_v6_small_4.csv"
6
+ DATASET_PROCESSED = "data/inc_df.csv"
7
+ MEMBERS = "data/authors_filter.json"
8
+
9
+
10
+ def main():
11
+ print(f"Length of dataset: {len(pd.read_csv(DATASET))}")
12
+ df = pd.read_csv(DATASET)
13
+ df["retriever_id"] = df.index
14
+ columns = [
15
+ "retriever_id",
16
+ "description",
17
+ "href",
18
+ "draft_labs_list",
19
+ "authors_list",
20
+ "draft_allcats",
21
+ "doc_subtype",
22
+ "doc_type",
23
+ "text",
24
+ "round",
25
+ ]
26
+
27
+ df = df[columns]
28
+
29
+ df.rename(
30
+ mapper={
31
+ "draft_labs_list": "draft_labs",
32
+ "draft_allcats": "draft_cats",
33
+ "authors_list": "author",
34
+ },
35
+ axis=1,
36
+ inplace=True,
37
+ )
38
+
39
+ ###Subselect for countries and country groups
40
+ with open(MEMBERS, "r") as f:
41
+ authors = json.load(f)
42
+ special_character_words_mapper = {
43
+ "Côte D'Ivoire": "Cote DIvoire",
44
+ "Ligue Camerounaise Des Droits De L'Homme": "Ligue Camerounaise Des Droits De LHomme",
45
+ "Association Pour L'Integration Et La Developpement Durable Au Burundi": "Association Pour LIntegration Et La Developpement Durable Au Burundi",
46
+ }
47
+ members = [
48
+ authors[key]
49
+ for key in [
50
+ "Members - Countries",
51
+ "Members - International and Regional State Associations",
52
+ ]
53
+ ]
54
+ members = [item for sublist in members for item in sublist]
55
+ members = [special_character_words_mapper.get(member, member) for member in members]
56
+
57
+ nonmembers = [
58
+ authors[key]
59
+ for key in [
60
+ "Intergovernmental Negotiation Committee",
61
+ "Observers and Other Participants",
62
+ ]
63
+ ]
64
+ nonmembers = [item for sublist in nonmembers for item in sublist]
65
+
66
+ df["author"][df["author"] == "['Côte D'Ivoire']"] = "['Cote DIvoire']"
67
+ df["author"][
68
+ df["author"] == "['Ligue Camerounaise Des Droits De L'Homme']"
69
+ ] = "['Ligue Camerounaise Des Droits De LHomme']"
70
+ df["author"][
71
+ df["author"]
72
+ == "['Association Pour L'Integration Et La Developpement Durable Au Burundi']"
73
+ ] = "['Association Pour LIntegration Et La Developpement Durable Au Burundi']"
74
+
75
+ df["author"] = df["author"].apply(ast.literal_eval)
76
+ df = df[df["author"].apply(lambda x: any(item in members for item in x))]
77
+ df["author"] = df["author"].apply(
78
+ lambda x: [item for item in x if item not in nonmembers]
79
+ )
80
+ df["author"] = df["author"].apply(
81
+ lambda x: [item.replace("Côte DIvoire", "Cote D'Ivoire") for item in x]
82
+ )
83
+ df["draft_labs"] = df["draft_labs"].fillna("[]")
84
+ df["author"][
85
+ df["author"] == "['The Alliance Of Small Island States (AOSIS)']"
86
+ ] = "['Alliance Of Small Island States (AOSIS)']"
87
+
88
+ print(f"Filtered dataset to {len(df)} entries")
89
+ df.to_csv(DATASET_PROCESSED, index=False)
90
+
91
+
92
+ if __name__ == "__main__":
93
+ main()
src/data_processing/document_store_data_all.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import ast
3
+ import json
4
+
5
+ DATASET = "data/inc_df_v6_small.csv"
6
+ DATASET_PROCESSED = "data/inc_df.csv"
7
+ MEMBERS = "data/authors_filter.json"
8
+
9
+
10
+ def main():
11
+ print(f"Length of dataset: {len(pd.read_csv(DATASET))}")
12
+ df = pd.read_csv(DATASET)
13
+ df["retriever_id"] = df.index
14
+ columns = [
15
+ "retriever_id",
16
+ "description",
17
+ "href",
18
+ "draft_labs_list",
19
+ "authors_list",
20
+ "draft_allcats",
21
+ "doc_subtype",
22
+ "doc_type",
23
+ "text",
24
+ "round",
25
+ ]
26
+
27
+ df = df[columns]
28
+
29
+ df.rename(
30
+ mapper={
31
+ "draft_labs_list": "draft_labs",
32
+ "draft_allcats": "draft_cats",
33
+ "authors_list": "author",
34
+ },
35
+ axis=1,
36
+ inplace=True,
37
+ )
38
+
39
+ ###Subselect for countries and country groups
40
+ with open(MEMBERS, "r") as f:
41
+ authors = json.load(f)
42
+ special_character_words_mapper = {
43
+ "Côte D'Ivoire": "Côte DIvoire",
44
+ "Ligue Camerounaise Des Droits De L'Homme": "Ligue Camerounaise Des Droits De LHomme",
45
+ "Association Pour L'Integration Et La Developpement Durable Au Burundi": "Association Pour LIntegration Et La Developpement Durable Au Burundi",
46
+ }
47
+ members = [
48
+ authors[key]
49
+ for key in [
50
+ "Members - Countries",
51
+ "Members - International and Regional State Associations",
52
+ ]
53
+ ]
54
+ members = [item for sublist in members for item in sublist]
55
+ members = [special_character_words_mapper.get(member, member) for member in members]
56
+
57
+ nonmembers = [
58
+ authors[key]
59
+ for key in [
60
+ "Intergovernmental Negotiation Committee",
61
+ "Observers and Other Participants",
62
+ ]
63
+ ]
64
+ nonmembers = [item for sublist in nonmembers for item in sublist]
65
+
66
+ df["author"][df["author"] == "['Côte D'Ivoire']"] = "['Côte DIvoire']"
67
+ df["author"][
68
+ df["author"] == "['Ligue Camerounaise Des Droits De L'Homme']"
69
+ ] = "['Ligue Camerounaise Des Droits De LHomme']"
70
+ df["author"][
71
+ df["author"]
72
+ == "['Association Pour L'Integration Et La Developpement Durable Au Burundi']"
73
+ ] = "['Association Pour LIntegration Et La Developpement Durable Au Burundi']"
74
+
75
+ df["author"] = df["author"].apply(ast.literal_eval)
76
+ df = df[df["author"].apply(lambda x: any(item in members for item in x))]
77
+ df["author"] = df["author"].apply(
78
+ lambda x: [item for item in x if item not in nonmembers]
79
+ )
80
+ df["author"] = df["author"].apply(
81
+ lambda x: [item.replace("Côte DIvoire", "Côte D 'Ivoire") for item in x]
82
+ )
83
+ df["draft_labs"] = df["draft_labs"].fillna("[]")
84
+ df["author"][
85
+ df["author"] == "['The Alliance Of Small Island States (AOSIS)']"
86
+ ] = "['Alliance Of Small Island States (AOSIS)']"
87
+
88
+ print(f"Filtered dataset to {len(df)} entries")
89
+ df.to_csv(DATASET_PROCESSED, index=False)
90
+
91
+
92
+ if __name__ == "__main__":
93
+ main()
src/data_processing/get_meta_data_filter.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import sys
3
+ import os
4
+ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../../')))
5
+ from src.rag.pipeline import RAGPipeline
6
+
7
+ DATASET = os.path.join("data", "inc_df.csv")
8
+ META_DATA = os.path.join("database", "meta_data.csv")
9
+
10
+ rag_pipeline = RAGPipeline(
11
+ embedding_model="sentence-transformers/distiluse-base-multilingual-cased-v1",
12
+ prompt_template="src/rag/prompt_template.yaml",
13
+ )
14
+
15
+ meta_data = pd.DataFrame(
16
+ [document.meta for document in rag_pipeline.document_store.get_all_documents()]
17
+ )
18
+
19
+ meta_data = meta_data.drop_duplicates(subset=["retriever_id"], keep="first")
20
+
21
+ meta_data.to_csv(META_DATA, index=False)
src/data_processing/taxonomy_processing.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ from src.utils.data import load_json, save_json
4
+
5
+ AUTHORS_TAXONOMY = os.path.join("data", "authors_taxonomy.json")
6
+ AUTHORS_FILTER = os.path.join("data", "authors_filter.json")
7
+
8
+ DRAFT_CATEGORIES_TAXONOMY = os.path.join("data", "draftcat_taxonomy.json")
9
+ DRAFT_CATEGORIES_FILTER = os.path.join("data", "draftcat_taxonomy_filter.json")
10
+
11
+
12
+ def get_authors(taxonomy: dict) -> dict:
13
+ countries = taxonomy["Members"]["Countries"]
14
+ associations = taxonomy["Members"][
15
+ "International and Regional State Associations"
16
+ ] # noqa: E501
17
+ intergovernmental_negotiations = taxonomy[
18
+ "Intergovernmental Negotiation Committee"
19
+ ] # noqa: E501
20
+ observers = taxonomy["Observers and Other Participants"] # noqa: E501
21
+ return {
22
+ "Members - Countries": countries,
23
+ "Members - International and Regional State Associations": associations, # noqa: E501
24
+ "Intergovernmental Negotiation Committee": intergovernmental_negotiations, # noqa: E501
25
+ "Observers and Other Participants": observers,
26
+ }
27
+
28
+
29
+ def get_draftcategories(taxonomy: dict) -> dict:
30
+ taxonomy_filter = {}
31
+ for draft_part, part_values in taxonomy.items():
32
+ part = draft_part
33
+ temp_values = []
34
+ for part_name, part_value in part_values.items():
35
+ temp_values.append(part_value)
36
+ taxonomy_filter[part] = temp_values
37
+ return taxonomy_filter
38
+
39
+
40
+ if __name__ == "__main__":
41
+ authors_taxonomy = load_json(AUTHORS_TAXONOMY)
42
+ authors_filter = get_authors(authors_taxonomy)
43
+ save_json(file_path=AUTHORS_FILTER, data=authors_filter)
44
+
45
+ draft_categories_taxonomy = load_json(DRAFT_CATEGORIES_TAXONOMY)
46
+ draft_categories_filter = get_draftcategories(draft_categories_taxonomy)
47
+ save_json(file_path=DRAFT_CATEGORIES_FILTER, data=draft_categories_filter)
src/document_store/document_store.py ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from haystack.document_stores import InMemoryDocumentStore
2
+ import pandas as pd
3
+ import os
4
+ import pathlib
5
+ import ast
6
+ from sklearn.preprocessing import MultiLabelBinarizer
7
+ from langchain_community.document_loaders import DataFrameLoader
8
+ from langchain.text_splitter import (
9
+ RecursiveCharacterTextSplitter,
10
+ )
11
+ from typing import Any
12
+
13
+
14
+ INC_TEST_DATASET_PATH = os.path.join("data", "inc_df.csv")
15
+ EMBEDDING_DIMENSION = 512
16
+
17
+ special_character_words_mapper = {
18
+ "Côte D'Ivoire": "Côte DIvoire",
19
+ "Ligue Camerounaise Des Droits De L'Homme": "Ligue Camerounaise Des Droits De LHomme",
20
+ "Association Pour L'Integration Et La Developpement Durable Au Burundi": "Association Pour LIntegration Et La Developpement Durable Au Burundi",
21
+ }
22
+ special_character_words_reverse_mapper = {}
23
+ for key, value in special_character_words_mapper.items():
24
+ special_character_words_reverse_mapper[value] = key
25
+
26
+
27
+ def transform_to_list(row):
28
+ special_characters = False
29
+ if str(row) == "[]" or str(row) == "nan":
30
+ return []
31
+ else:
32
+ # replace special characters
33
+ for key, value in special_character_words_mapper.items():
34
+ if key in row:
35
+ row = row.replace(key, value)
36
+ special_characters = True
37
+ row = ast.literal_eval(row)
38
+ if special_characters:
39
+ for key, value in special_character_words_reverse_mapper.items():
40
+ if key in row:
41
+ # get the index of the special character word
42
+ index = row.index(key)
43
+ # replace the special character word with the original word
44
+ row[index] = value
45
+ return row
46
+
47
+
48
+ def transform_data(df: pd.DataFrame):
49
+ # df["author"] = df["authors"].drop(columns=["authors"], axis=1)
50
+ df = df[df["doc_subtype"] != "Working documents"]
51
+ df = df[df["doc_subtype"] != "Contact Groups"]
52
+ df = df[df["doc_subtype"] != "Unsolicitated Submissions"]
53
+ df = df[df["doc_type"] != "official document"]
54
+ df = df[df["doc_subtype"] != "Stakeholder Dialogue"]
55
+ df["text"] = df["text"].astype(str).str.replace("_x000D_", " ")
56
+ df["text"] = df["text"].astype(str).str.replace("\n", " ")
57
+ # df["text"] = df["text"].astype(str).str.replace("\r", " ")
58
+ df["author"] = df["author"].str.replace("\xa0", " ")
59
+ df["author"] = df["author"].str.replace("ü", "u")
60
+ df["author"] = df["author"].str.strip()
61
+ df["author"] = df["author"].astype(str).str.replace("\r", " ")
62
+
63
+ df = df[
64
+ [
65
+ "author",
66
+ "doc_type",
67
+ "round",
68
+ "text",
69
+ "href",
70
+ "draft_labs",
71
+ "draft_cats",
72
+ "retriever_id",
73
+ ]
74
+ ].copy()
75
+
76
+ df = df.rename(columns={"text": "page_content"}).copy()
77
+
78
+ df["draft_labs2"] = df["draft_labs"]
79
+ df["author2"] = df["author"]
80
+
81
+ df["draft_labs"] = df.apply(lambda x: transform_to_list(x["draft_labs"]), axis=1)
82
+ df["author"] = df.apply(lambda x: transform_to_list(x["author"]), axis=1)
83
+
84
+ # df["draft_labs"] = df["draft_labs"].apply(
85
+ # lambda x: ast.literal_eval(x) if str(x) != "[]" or str(x) != "nan" else []
86
+ # )
87
+ # df["author"] = df["author"].apply(
88
+ # lambda x: ast.literal_eval(x) if str(x) != "[]" else []
89
+ # )
90
+
91
+ mlb = MultiLabelBinarizer(sparse_output=True)
92
+ mlb = MultiLabelBinarizer()
93
+ df = df.join(
94
+ pd.DataFrame(
95
+ mlb.fit_transform(df.pop("draft_labs")),
96
+ columns=mlb.classes_,
97
+ index=df.index,
98
+ )
99
+ ).join(
100
+ pd.DataFrame(
101
+ mlb.fit_transform(df.pop("author")), columns=mlb.classes_, index=df.index
102
+ )
103
+ )
104
+
105
+ df["draft_labs"] = df["draft_labs2"]
106
+ df = df.drop(columns=["draft_labs2"], axis=1)
107
+
108
+ df["author"] = df["author2"]
109
+ df = df.drop(columns=["author2"], axis=1)
110
+
111
+ loader = DataFrameLoader(df, page_content_column="page_content")
112
+ docs = loader.load()
113
+ return docs
114
+
115
+
116
+ def process_data(docs):
117
+
118
+ chunk_size = 512
119
+ text_splitter = RecursiveCharacterTextSplitter(
120
+ chunk_size=chunk_size,
121
+ chunk_overlap=int(chunk_size / 10),
122
+ add_start_index=True,
123
+ strip_whitespace=True,
124
+ separators=["\n\n", "\n", " ", ""],
125
+ )
126
+
127
+ docs_chunked = text_splitter.transform_documents(docs)
128
+
129
+ df = pd.DataFrame(docs_chunked, columns=["page_content", "metadata", "type"]).drop(
130
+ "type", axis=1
131
+ )
132
+ df["page_content"] = df["page_content"].astype(str)
133
+ df["page_content"] = df["page_content"].str.replace("'page_content'", "")
134
+ df["page_content"] = df["page_content"].str.replace("(", "")
135
+ df["page_content"] = df["page_content"].str.replace(")", "").str[1:]
136
+ df = pd.concat(
137
+ [df.drop("metadata", axis=1), df["metadata"].apply(pd.Series)], axis=1
138
+ )
139
+ df = df.rename(columns={0: "a", 1: "b"})
140
+ df = pd.concat([df.drop(["a", "b"], axis=1), df["b"].apply(pd.Series)], axis=1)
141
+
142
+ cols = ["author", "draft_labs"]
143
+ for c in cols:
144
+ df[c] = df[c].apply(
145
+ lambda x: "".join(x) if isinstance(x, (list, tuple)) else str(x)
146
+ )
147
+ chars = ["[", "]", "'"]
148
+ for g in chars:
149
+ df[c] = df[c].str.replace(g, "")
150
+
151
+ df["page_content"] = df["page_content"].astype(str).str.replace("\n", " ")
152
+ df["page_content"] = df["page_content"].astype(str).str.replace("\r", " ")
153
+
154
+ cols = ["author", "draft_labs", "page_content"]
155
+ df["page_content"] = df[cols].apply(lambda row: " | ".join(row.astype(str)), axis=1)
156
+ df = df.rename(columns={"page_content": "content"})
157
+
158
+ documents = []
159
+ for _, row in df.iterrows():
160
+ row_meta: dict[str, Any] = {}
161
+ for column in df.columns:
162
+ if column != "content":
163
+ if column == "retriever_id":
164
+ row_meta[column] = str(row[column])
165
+ else:
166
+ row_meta[column] = row[column]
167
+ documents.append({"content": row["content"], "meta": row_meta})
168
+ return documents
169
+
170
+
171
+ def get_document_store():
172
+ df = pd.read_csv(INC_TEST_DATASET_PATH)
173
+ # df["retriever_id"] = [str(i) for i in range(len(df))]
174
+ pathlib.Path("database").mkdir(parents=True, exist_ok=True)
175
+ document_store = InMemoryDocumentStore(
176
+ embedding_field="embedding", embedding_dim=EMBEDDING_DIMENSION, use_bm25=False
177
+ )
178
+ docs = transform_data(df=df)
179
+ document_store.write_documents(process_data(docs=docs))
180
+ return document_store
src/rag/__init__.py ADDED
File without changes
src/rag/pipeline.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import pickle
3
+ from typing import Any
4
+
5
+ from dotenv import load_dotenv
6
+ from haystack.nodes import ( # type: ignore
7
+ AnswerParser,
8
+ EmbeddingRetriever,
9
+ PromptNode,
10
+ PromptTemplate,
11
+ )
12
+ from haystack.pipelines import Pipeline
13
+
14
+ from src.document_store.document_store import get_document_store
15
+
16
+ load_dotenv()
17
+
18
+ OPENAI_API_KEY = os.environ.get("OPEN_API_KEY")
19
+
20
+
21
+ class RAGPipeline:
22
+ def __init__(
23
+ self,
24
+ embedding_model: str,
25
+ prompt_template: str,
26
+ ):
27
+ self.load_document_store()
28
+ self.embedding_model = embedding_model
29
+ self.prompt_template = prompt_template
30
+ self.retriever_node = self.generate_retriever_node()
31
+ self.prompt_node = self.generate_prompt_node()
32
+ self.update_embeddings()
33
+ self.pipe = self.build_pipeline()
34
+
35
+ def run(self, prompt: str, filters: dict) -> Any:
36
+ try:
37
+ result = self.pipe.run(query=prompt, params={"filters": filters})
38
+ return result
39
+ except Exception as e:
40
+ print(e)
41
+ return None
42
+
43
+ def build_pipeline(self):
44
+ pipe = Pipeline()
45
+ pipe.add_node(component=self.retriever_node, name="retriever", inputs=["Query"])
46
+ pipe.add_node(
47
+ component=self.prompt_node,
48
+ name="prompt_node",
49
+ inputs=["retriever"],
50
+ )
51
+ return pipe
52
+
53
+ def load_document_store(self):
54
+ if os.path.exists(os.path.join("database", "document_store.pkl")):
55
+ with open(
56
+ file=os.path.join("database", "document_store.pkl"), mode="rb"
57
+ ) as f:
58
+ self.document_store = pickle.load(f)
59
+ else:
60
+ self.document_store = get_document_store()
61
+
62
+ def generate_retriever_node(self):
63
+ retriever_node = EmbeddingRetriever(
64
+ document_store=self.document_store,
65
+ embedding_model=self.embedding_model,
66
+ top_k=7,
67
+ )
68
+ return retriever_node
69
+
70
+ def update_embeddings(self):
71
+ if not os.path.exists(os.path.join("database", "document_store.pkl")):
72
+ self.document_store.update_embeddings(
73
+ self.retriever_node, update_existing_embeddings=True
74
+ )
75
+
76
+ with open(
77
+ file=os.path.join("database", "document_store.pkl"), mode="wb"
78
+ ) as f:
79
+ pickle.dump(self.document_store, f)
80
+
81
+ def generate_prompt_node(self):
82
+ rag_prompt = PromptTemplate(
83
+ prompt=self.prompt_template,
84
+ output_parser=AnswerParser(reference_pattern=r"Document\[(\d+)\]"),
85
+ )
86
+ prompt_node = PromptNode(
87
+ model_name_or_path="gpt-4",
88
+ default_prompt_template=rag_prompt,
89
+ api_key="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",
90
+ max_length=3000,
91
+ model_kwargs={"temperature": 0.2, "max_tokens": 4096},
92
+ )
93
+ return prompt_node
src/rag/prompt_template.yaml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: deepset/question-answering-with-document-references
2
+ text: |
3
+ Answer the question '{query}' using only the provided documents and avoiding text.
4
+ Formulate your answer in the style of an academic report.
5
+ Provide example quotes and citations using extracted text from the documents.
6
+ Use facts and numbers from the documents in your answer.
7
+ ALWAYS include the references of the documents used from documents at the end of each applicable sentence using the format [number].
8
+ If the answer isn't in the document say 'Answering is not possible given the available information'.
9
+ Documents: \n
10
+ {join(documents, delimiter=new_line, pattern=new_line+'Document($retriever_id): $content', str_replace={new_line: ' ', '[': '(', ']': ')'})} \n
11
+ Answer:
12
+ tags:
13
+ - question-answering
14
+ description: Perform question answering with references to documents.
15
+ meta:
16
+ authors:
17
+ - deepset-ai
18
+ version: '0.1.0'
19
+
20
+
src/utils/__init__.py ADDED
File without changes
src/utils/data.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from typing import Any
3
+
4
+ import pandas as pd
5
+ import streamlit as st
6
+ from functools import reduce
7
+
8
+
9
+ def get_filter_values(df: pd.DataFrame, column_name: str) -> list:
10
+ return df[column_name].unique().tolist()
11
+
12
+
13
+ def build_filter(
14
+ meta_data: pd.DataFrame,
15
+ authors_filter: list[str],
16
+ draft_cats_filter: list[str],
17
+ round_filter: list[int],
18
+ ) -> dict[str, int | str] | dict:
19
+ authors = authors_filter
20
+ round_number = round_filter
21
+ draft_cats = draft_cats_filter
22
+
23
+ # set authors_flag to True if not empty list
24
+ authors_flag = True if len(authors) > 0 else False
25
+ draft_cats_flag = True if len(draft_cats) > 0 else False
26
+ round_number_flag = True if len(round_number) > 0 else False
27
+
28
+ conditions = []
29
+
30
+ if authors_flag:
31
+ authors_condition = (meta_data[col] == 1 for col in authors)
32
+ authors_conditions_list = reduce(lambda a, b: a | b, authors_condition)
33
+ conditions.append(authors_conditions_list)
34
+
35
+ if draft_cats_flag:
36
+ draft_cat_condition = (meta_data[col] for col in draft_cats)
37
+ draft_cat_conditions_list = reduce(lambda a, b: a | b, draft_cat_condition)
38
+ conditions.append(draft_cat_conditions_list)
39
+
40
+ if round_number_flag:
41
+ round_condition = meta_data["round"].isin(round_number)
42
+ conditions.append(round_condition)
43
+
44
+ if len(conditions) == 0:
45
+ filtered_retriever_ids = []
46
+ else:
47
+ final_condition = reduce(lambda a, b: a & b, conditions)
48
+ filtered_retriever_ids = meta_data[final_condition]["retriever_id"].tolist()
49
+ if len(filtered_retriever_ids) == 0:
50
+ return {}
51
+ else:
52
+ return {"retriever_id": filtered_retriever_ids}
53
+
54
+
55
+ def load_json(file_path: str) -> dict:
56
+ with open(file_path, "r") as f:
57
+ return json.load(f)
58
+
59
+
60
+ def save_json(file_path: str, data: dict) -> None:
61
+ with open(file_path, "w") as f:
62
+ json.dump(data, f, indent=4)
63
+
64
+
65
+ def get_meta(result: dict[str, Any]) -> list[dict[str, Any]]:
66
+ meta_data = []
67
+ for doc in result["documents"]:
68
+ current_meta = doc.meta
69
+ current_meta["content"] = doc.content
70
+ meta_data.append(current_meta)
71
+ return meta_data
72
+
73
+
74
+ def load_css(file_name) -> None:
75
+ with open(file_name) as f:
76
+ st.markdown(f"<style>{f.read()}</style>", unsafe_allow_html=True)
src/utils/data_v2.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from typing import Any
3
+
4
+ import pandas as pd
5
+ import streamlit as st
6
+ from functools import reduce
7
+
8
+
9
+ def get_filter_values(df: pd.DataFrame, column_name: str) -> list:
10
+ return df[column_name].unique().tolist()
11
+
12
+
13
+ def build_filter(
14
+ meta_data: pd.DataFrame,
15
+ authors_filter: list[str],
16
+ draft_cats_filter: list[str],
17
+ # round_filter: list[int],
18
+ ) -> dict[str, int | str] | dict:
19
+ authors = authors_filter
20
+ #round_number = round_filter
21
+ draft_cats = draft_cats_filter
22
+
23
+ # set authors_flag to True if not empty list
24
+ authors_flag = True if len(authors) > 0 else False
25
+ draft_cats_flag = True if len(draft_cats) > 0 else False
26
+ #round_number_flag = True if len(round_number) > 0 else False
27
+
28
+ if authors_flag is False and draft_cats_flag is False:
29
+ return {}, "no filters selected"
30
+
31
+ conditions = []
32
+
33
+ if authors_flag:
34
+ authors_condition = (meta_data[col] == 1 for col in authors)
35
+ authors_conditions_list = reduce(lambda a, b: a | b, authors_condition)
36
+ conditions.append(authors_conditions_list)
37
+
38
+ if draft_cats_flag:
39
+ draft_cat_condition = (meta_data[col] for col in draft_cats)
40
+ draft_cat_conditions_list = reduce(lambda a, b: a | b, draft_cat_condition)
41
+ conditions.append(draft_cat_conditions_list)
42
+
43
+ # if round_number_flag:
44
+ # round_condition = meta_data["round"].isin(round_number)
45
+ # conditions.append(round_condition)
46
+
47
+ if len(conditions) == 0:
48
+ filtered_retriever_ids = []
49
+ else:
50
+ final_condition = reduce(lambda a, b: a & b, conditions)
51
+ filtered_retriever_ids = meta_data[final_condition]["retriever_id"].tolist()
52
+
53
+ if len(filtered_retriever_ids) == 0:
54
+ return {}, "no results found"
55
+ else:
56
+ return {"retriever_id": filtered_retriever_ids}, "success"
57
+
58
+
59
+ def load_json(file_path: str) -> dict:
60
+ with open(file_path, "r") as f:
61
+ return json.load(f)
62
+
63
+
64
+ def save_json(file_path: str, data: dict) -> None:
65
+ with open(file_path, "w") as f:
66
+ json.dump(data, f, indent=4)
67
+
68
+
69
+ def get_meta(result: dict[str, Any]) -> list[dict[str, Any]]:
70
+ meta_data = []
71
+ for doc in result["documents"]:
72
+ current_meta = doc.meta
73
+ current_meta["content"] = doc.content
74
+ meta_data.append(current_meta)
75
+ return meta_data
76
+
77
+
78
+ def load_css(file_name) -> None:
79
+ with open(file_name) as f:
80
+ st.markdown(f"<style>{f.read()}</style>", unsafe_allow_html=True)
src/utils/writer.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import time
3
+
4
+
5
+ def typewriter(text: str, references: list, speed: int):
6
+ tokens = text.split()
7
+ container = st.empty()
8
+ for index in range(len(tokens) + 1):
9
+ curr_full_text = " ".join(tokens[:index])
10
+ container.markdown(curr_full_text)
11
+ time.sleep(1 / speed)
12
+ curr_full_text += "\n"
13
+ container.markdown(curr_full_text)
14
+ curr_full_text += "\n **References** \n"
15
+ container.markdown(curr_full_text)
16
+ curr_full_text += "\n".join(references)
17
+ container.markdown(curr_full_text)
style/style.css ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .remark {
2
+ background-color: #FFCDD2;
3
+ border: 1px solid #E57373;
4
+ border-radius: 4px;
5
+ box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
6
+ margin-bottom: 1rem;
7
+ padding: 1rem;
8
+ }
9
+
10
+ .remark-content {
11
+ display: flex;
12
+ align-items: center;
13
+ }
14
+
15
+ .remark-text {
16
+ margin-bottom: 0.5rem;
17
+ font-size: 20;
18
+ }
19
+
20
+ .feedback-link {
21
+ display: inline-block;
22
+ background-color: #b50d1c; /* Adjusted color */
23
+ color: white;
24
+ border: none;
25
+ border-radius: 4px;
26
+ padding: 0.5rem 1rem;
27
+ text-decoration: none;
28
+ font-size: 1rem;
29
+ cursor: pointer;
30
+ box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1); /* Added */
31
+ }
32
+
33
+ .feedback-link:hover {
34
+ background-color: #8c0716; /* Adjusted hover color */
35
+ }
36
+
37
+ .feedback-link[href^="mailto"] {
38
+ color: white !important;
39
+ text-decoration: none !important;
40
+ }
41
+
42
+
43
+
44
+ /* Style streamlit general text */
45
+ .description {
46
+ font-size:20px !important;
47
+ }
48
+
49
+
50
+ /* Style streamlit header with bold text */
51
+ .header {
52
+ font-size:30px !important;
53
+ font-weight: bold;
54
+ }
55
+
56
+ .stMultiSelect > div > div > div {
57
+ width: 350px !important;
58
+ font-size: 20px;
59
+ }
60
+
61
+ /* Style columns */
62
+ [data-testid="column"] {
63
+ border-radius: 15px;
64
+ background-color: white;
65
+ box-shadow: 0 0 10px #eee;
66
+ border: 1px solid #ddd;
67
+ padding: 1rem;;
68
+ }
69
+ /* Style containers */
70
+ [data-testid="stVerticalBlock"] > [style*="flex-direction: column;"] > [data-testid="stVerticalBlock"] {
71
+ border-radius: 15px;
72
+ background-color: white;
73
+ box-shadow: 0 0 10px #eee;
74
+ border: 1px solid #ddd;
75
+ padding: 1rem;;
76
+ }
77
+
78
+ .stTabs [data-baseweb="tab-list"] button [data-testid="stMarkdownContainer"] p {
79
+ font-size:20px;
80
+ }
81
+
82
+ /* Style streamlit button */
83
+ .stButton>button {
84
+ font-size: 12px;
85
+ padding: 8px 12px;
86
+ border: none;
87
+ text-align: center;
88
+ text-decoration: none;
89
+ display: inline-block;
90
+ cursor: pointer;
91
+ border-radius: 4px;
92
+ transition: background-color 0.3s, box-shadow 0.3s;
93
+ background-color: #f2f2f2;
94
+ color: #333;
95
+ box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
96
+
97
+ .stButton>button:hover {
98
+ background-color: #e0e0e0;
99
+ box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
100
+ }
101
+
102
+
103
+