initial model files

Files changed (7) hide show

LICENSE +21 -0
README.md +60 -0
config.json +28 -0
pytorch_model.bin +3 -0
special_tokens_map.json +1 -0
spiece.model +3 -0
tokenizer_config.json +1 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2021 Philip May, Deutsche Telekom AG
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+language:
+- de
+license: mit
+tags:
+- summarization
+datasets:
+- swiss_text_2019
+---
+# mT5-small-sum-de-mit-v1
+This is a German summarization model. It is based on the multilingual T5 model [google/mt5-small](https://huggingface.co/google/mt5-small). The special characteristic of this model is that, unlike many other models, it is licensed under a permissive open source license (MIT). Among other things, this license allows commercial use.
+[![One Conversation](https://raw.githubusercontent.com/telekom/HPOflow/main/docs/source/imgs/1c-logo.png)](https://www.welove.ai/)
+This model is provided by the [One Conversation](https://www.welove.ai/)
+team of [Deutsche Telekom AG](https://www.telekom.com/).
+## Training
+The training was conducted with the following hyperparameters:
+- base model: [google/mt5-small](https://huggingface.co/google/mt5-small)
+- source_prefix: `"summarize: "`
+- batch size: xxx
+- max_source_length: 800
+- max_target_length: 96
+- warmup_ratio: 0.3
+- number of train epochs: xxx
+- gradient accumulation steps: xxx
+## Datasets and Preprocessing
+The datasets were preprocessed as follows:
+The summary was tokenized with the [google/mt5-small](https://huggingface.co/google/mt5-small) tokenizer. Then only the records with no more than 94 tokens were selected.
+This model is trained on the following dataset:
+| Name | Language | Size | License
+|------|----------|------|--------
+| [SwissText 2019 - Train](https://www.swisstext.org/2019/shared-task/german-text-summarization-challenge.html) | de | 84,564 | Concrete license is unclear. The data was published in the [German Text Summarization Challenge](https://www.swisstext.org/2019/shared-task/german-text-summarization-challenge.html). xxx
+## Evaluation on MLSUM German Test Set (no beams)
+| Model | rouge1 | rouge2 | rougeL | rougeLsum
+|-------|--------|--------|--------|----------
+| deutsche-telekom/mt5-small-sum-de-mit-v1 (this) | xxx | xxx | xxx | xxx
+| [ml6team/mt5-small-german-finetune-mlsum](https://huggingface.co/ml6team/mt5-small-german-finetune-mlsum) | 18.3607 | 5.3604 | 14.5456 | 16.1946
+| **[deutsche-telekom/mt5-small-sum-de-en-01](https://huggingface.co/deutsche-telekom/mt5-small-sum-de-en-v1)** | **21.7336** | **7.2614** | **17.1323** | **19.3977**
+## License
+Copyright (c) 2021 Philip May, Deutsche Telekom AG
+Licensed under the MIT License (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License by reviewing the file [LICENSE](https://huggingface.co/deutsche-telekom/mt5-small-sum-de-mit-v1/blob/main/LICENSE) in the repository.

config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "_name_or_path": "google/mt5-small",
+  "architectures": [
+    "MT5ForConditionalGeneration"
+  ],
+  "d_ff": 1024,
+  "d_kv": 64,
+  "d_model": 512,
+  "decoder_start_token_id": 0,
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "feed_forward_proj": "gated-gelu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "mt5",
+  "num_decoder_layers": 8,
+  "num_heads": 6,
+  "num_layers": 8,
+  "pad_token_id": 0,
+  "relative_attention_num_buckets": 32,
+  "tie_word_embeddings": false,
+  "tokenizer_class": "T5Tokenizer",
+  "torch_dtype": "float32",
+  "transformers_version": "4.9.1",
+  "use_cache": true,
+  "vocab_size": 250100
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:616ca42b48507a819df3399dcc5aff07f80e76c8a982f6cb3ab7b28208276130
+size 1200721733

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>"}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ef78f86560d809067d12bac6c09f19a462cb3af3f54d2b8acbba26e1433125d6
+size 4309802

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "extra_ids": 0, "additional_special_tokens": null, "special_tokens_map_file": "/home/phmay/.cache/huggingface/transformers/685ac0ca8568ec593a48b61b0a3c272beee9bc194a3c7241d15dcadb5f875e53.f76030f3ec1b96a8199b2593390c610e76ca8028ef3d24680000619ffb646276", "name_or_path": "google/mt5-small", "sp_model_kwargs": {}, "tokenizer_class": "T5Tokenizer"}