Isotonic commited on
Commit
74bf3b5
·
1 Parent(s): c8af990

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -76
README.md CHANGED
@@ -6,6 +6,14 @@ tags:
6
  model-index:
7
  - name: deberta-v3-base_finetuned_ai4privacy_v2
8
  results: []
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -13,73 +21,20 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # deberta-v3-base_finetuned_ai4privacy_v2
15
 
16
- This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on the None dataset.
17
- It achieves the following results on the evaluation set:
18
- - Loss: 0.0693
19
- - Overall Precision: 0.9664
20
- - Overall Recall: 0.9732
21
- - Overall F1: 0.9698
22
- - Overall Accuracy: 0.9728
23
- - Accountname F1: 1.0
24
- - Accountnumber F1: 1.0
25
- - Age F1: 0.9760
26
- - Amount F1: 0.9897
27
- - Bic F1: 0.9978
28
- - Bitcoinaddress F1: 0.9907
29
- - Buildingnumber F1: 0.9906
30
- - City F1: 0.9930
31
- - Companyname F1: 0.9994
32
- - County F1: 0.9939
33
- - Creditcardcvv F1: 1.0
34
- - Creditcardissuer F1: 0.9891
35
- - Creditcardnumber F1: 0.9590
36
- - Currency F1: 0.9052
37
- - Currencycode F1: 0.9875
38
- - Currencyname F1: 0.7022
39
- - Currencysymbol F1: 0.9892
40
- - Date F1: 0.9126
41
- - Dob F1: 0.7438
42
- - Email F1: 1.0
43
- - Ethereumaddress F1: 1.0
44
- - Eyecolor F1: 1.0
45
- - Firstname F1: 0.9934
46
- - Gender F1: 0.9991
47
- - Height F1: 1.0
48
- - Iban F1: 1.0
49
- - Ip F1: 0.1551
50
- - Ipv4 F1: 0.8393
51
- - Ipv6 F1: 0.8034
52
- - Jobarea F1: 0.9942
53
- - Jobtitle F1: 0.9993
54
- - Jobtype F1: 0.9928
55
- - Lastname F1: 0.9877
56
- - Litecoinaddress F1: 0.9770
57
- - Mac F1: 1.0
58
- - Maskednumber F1: 0.9451
59
- - Middlename F1: 0.9773
60
- - Nearbygpscoordinate F1: 1.0
61
- - Ordinaldirection F1: 0.9924
62
- - Password F1: 1.0
63
- - Phoneimei F1: 1.0
64
- - Phonenumber F1: 1.0
65
- - Pin F1: 0.9929
66
- - Prefix F1: 0.9722
67
- - Secondaryaddress F1: 0.9974
68
- - Sex F1: 0.9949
69
- - Ssn F1: 0.9970
70
- - State F1: 0.9941
71
- - Street F1: 0.9972
72
- - Time F1: 0.9967
73
- - Url F1: 1.0
74
- - Useragent F1: 1.0
75
- - Username F1: 0.9991
76
- - Vehiclevin F1: 1.0
77
- - Vehiclevrm F1: 1.0
78
- - Zipcode F1: 0.9890
79
 
80
  ## Model description
81
 
82
- More information needed
 
 
 
 
 
 
83
 
84
  ## Intended uses & limitations
85
 
@@ -89,19 +44,84 @@ More information needed
89
 
90
  More information needed
91
 
92
- ## Training procedure
93
-
94
- ### Training hyperparameters
95
 
96
  The following hyperparameters were used during training:
97
- - learning_rate: 5e-05
98
- - train_batch_size: 4
99
- - eval_batch_size: 4
100
- - seed: 42
101
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
102
  - lr_scheduler_type: cosine_with_restarts
103
- - lr_scheduler_warmup_ratio: 0.2
104
  - num_epochs: 7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
  ### Training results
107
 
@@ -115,10 +135,9 @@ The following hyperparameters were used during training:
115
  | 0.0808 | 6.0 | 14358 | 0.0693 | 0.9664 | 0.9732 | 0.9698 | 0.9728 | 1.0 | 1.0 | 0.9760 | 0.9897 | 0.9978 | 0.9907 | 0.9906 | 0.9930 | 0.9994 | 0.9939 | 1.0 | 0.9891 | 0.9590 | 0.9052 | 0.9875 | 0.7022 | 0.9892 | 0.9126 | 0.7438 | 1.0 | 1.0 | 1.0 | 0.9934 | 0.9991 | 1.0 | 1.0 | 0.1551 | 0.8393 | 0.8034 | 0.9942 | 0.9993 | 0.9928 | 0.9877 | 0.9770 | 1.0 | 0.9451 | 0.9773 | 1.0 | 0.9924 | 1.0 | 1.0 | 1.0 | 0.9929 | 0.9722 | 0.9974 | 0.9949 | 0.9970 | 0.9941 | 0.9972 | 0.9967 | 1.0 | 1.0 | 0.9991 | 1.0 | 1.0 | 0.9890 |
116
  | 0.0779 | 7.0 | 16751 | 0.0697 | 0.9698 | 0.9756 | 0.9727 | 0.9739 | 0.9983 | 1.0 | 0.9815 | 0.9904 | 1.0 | 0.9938 | 0.9935 | 0.9930 | 0.9994 | 0.9935 | 1.0 | 0.9903 | 0.9584 | 0.9206 | 0.9917 | 0.7753 | 0.9914 | 0.9315 | 0.8305 | 1.0 | 1.0 | 1.0 | 0.9939 | 1.0 | 1.0 | 1.0 | 0.1404 | 0.8382 | 0.8029 | 0.9958 | 1.0 | 0.9944 | 0.9910 | 0.9875 | 1.0 | 0.9480 | 0.9788 | 1.0 | 0.9924 | 1.0 | 1.0 | 1.0 | 0.9929 | 0.9747 | 0.9961 | 0.9949 | 0.9970 | 0.9925 | 0.9983 | 0.9967 | 1.0 | 1.0 | 0.9991 | 1.0 | 1.0 | 0.9953 |
117
 
118
-
119
  ### Framework versions
120
 
121
  - Transformers 4.35.2
122
- - Pytorch 2.1.0+cu121
123
  - Datasets 2.15.0
124
- - Tokenizers 0.15.0
 
6
  model-index:
7
  - name: deberta-v3-base_finetuned_ai4privacy_v2
8
  results: []
9
+ datasets:
10
+ - ai4privacy/pii-masking-200k
11
+ - Isotonic/pii-masking-200k
12
+ language:
13
+ - en
14
+ metrics:
15
+ - seqeval
16
+ pipeline_tag: token-classification
17
  ---
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
21
 
22
  # deberta-v3-base_finetuned_ai4privacy_v2
23
 
24
+ This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on the [ai4privacy/pii-masking-200k](https://huggingface.co/ai4privacy/pii-masking-200k) dataset.
25
+
26
+ ## Useage
27
+ GitHub Implementation: [Ai4Privacy](https://github.com/Sripaad/ai4privacy)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Model description
30
 
31
+ This model has been finetuned on the World's largest open source privacy dataset.
32
+
33
+ The purpose of the trained models is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs.
34
+
35
+ The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion subjects / use cases split across business, education, psychology and legal fields, and 5 interactions styles (e.g. casual conversation, formal document, emails etc...).
36
+
37
+ Take a look at the Github implementation for specific reasearch.
38
 
39
  ## Intended uses & limitations
40
 
 
44
 
45
  More information needed
46
 
47
+ ## Training hyperparameters
 
 
48
 
49
  The following hyperparameters were used during training:
50
+ - learning_rate: 6e-04
51
+ - train_batch_size: 32
52
+ - eval_batch_size: 32
53
+ - seed: 412
54
+ - optimizer: Adam with betas=(0.96,0.996) and epsilon=1e-08
55
  - lr_scheduler_type: cosine_with_restarts
56
+ - lr_scheduler_warmup_ratio: 0.22
57
  - num_epochs: 7
58
+ - mixed_precision_training: N/A
59
+
60
+ ## Class wise metrics
61
+ It achieves the following results on the evaluation set:
62
+
63
+ - Loss: 0.0211
64
+ - Overall Precision: 0.9722
65
+ - Overall Recall: 0.9792
66
+ - Overall F1: 0.9757
67
+ - Overall Accuracy: 0.9915
68
+
69
+ - Accountname F1: 0.9993
70
+ - Accountnumber F1: 0.9986
71
+ - Age F1: 0.9884
72
+ - Amount F1: 0.9984
73
+ - Bic F1: 0.9942
74
+ - Bitcoinaddress F1: 0.9974
75
+ - Buildingnumber F1: 0.9898
76
+ - City F1: 1.0
77
+ - Companyname F1: 1.0
78
+ - County F1: 0.9976
79
+ - Creditcardcvv F1: 0.9541
80
+ - Creditcardissuer F1: 0.9970
81
+ - Creditcardnumber F1: 0.9754
82
+ - Currency F1: 0.8966
83
+ - Currencycode F1: 0.9946
84
+ - Currencyname F1: 0.7697
85
+ - Currencysymbol F1: 0.9958
86
+ - Date F1: 0.9778
87
+ - Dob F1: 0.9546
88
+ - Email F1: 1.0
89
+ - Ethereumaddress F1: 1.0
90
+ - Eyecolor F1: 0.9925
91
+ - Firstname F1: 0.9947
92
+ - Gender F1: 1.0
93
+ - Height F1: 1.0
94
+ - Iban F1: 0.9978
95
+ - Ip F1: 0.5404
96
+ - Ipv4 F1: 0.8455
97
+ - Ipv6 F1: 0.8855
98
+ - Jobarea F1: 0.9091
99
+ - Jobtitle F1: 1.0
100
+ - Jobtype F1: 0.9672
101
+ - Lastname F1: 0.9855
102
+ - Litecoinaddress F1: 0.9949
103
+ - Mac F1: 0.9965
104
+ - Maskednumber F1: 0.9836
105
+ - Middlename F1: 0.7385
106
+ - Nearbygpscoordinate F1: 1.0
107
+ - Ordinaldirection F1: 1.0
108
+ - Password F1: 1.0
109
+ - Phoneimei F1: 0.9978
110
+ - Phonenumber F1: 0.9975
111
+ - Pin F1: 0.9820
112
+ - Prefix F1: 0.9872
113
+ - Secondaryaddress F1: 1.0
114
+ - Sex F1: 0.9916
115
+ - Ssn F1: 0.9960
116
+ - State F1: 0.9967
117
+ - Street F1: 0.9991
118
+ - Time F1: 1.0
119
+ - Url F1: 1.0
120
+ - Useragent F1: 0.9981
121
+ - Username F1: 1.0
122
+ - Vehiclevin F1: 0.9950
123
+ - Vehiclevrm F1: 0.9870
124
+ - Zipcode F1: 0.9966
125
 
126
  ### Training results
127
 
 
135
  | 0.0808 | 6.0 | 14358 | 0.0693 | 0.9664 | 0.9732 | 0.9698 | 0.9728 | 1.0 | 1.0 | 0.9760 | 0.9897 | 0.9978 | 0.9907 | 0.9906 | 0.9930 | 0.9994 | 0.9939 | 1.0 | 0.9891 | 0.9590 | 0.9052 | 0.9875 | 0.7022 | 0.9892 | 0.9126 | 0.7438 | 1.0 | 1.0 | 1.0 | 0.9934 | 0.9991 | 1.0 | 1.0 | 0.1551 | 0.8393 | 0.8034 | 0.9942 | 0.9993 | 0.9928 | 0.9877 | 0.9770 | 1.0 | 0.9451 | 0.9773 | 1.0 | 0.9924 | 1.0 | 1.0 | 1.0 | 0.9929 | 0.9722 | 0.9974 | 0.9949 | 0.9970 | 0.9941 | 0.9972 | 0.9967 | 1.0 | 1.0 | 0.9991 | 1.0 | 1.0 | 0.9890 |
136
  | 0.0779 | 7.0 | 16751 | 0.0697 | 0.9698 | 0.9756 | 0.9727 | 0.9739 | 0.9983 | 1.0 | 0.9815 | 0.9904 | 1.0 | 0.9938 | 0.9935 | 0.9930 | 0.9994 | 0.9935 | 1.0 | 0.9903 | 0.9584 | 0.9206 | 0.9917 | 0.7753 | 0.9914 | 0.9315 | 0.8305 | 1.0 | 1.0 | 1.0 | 0.9939 | 1.0 | 1.0 | 1.0 | 0.1404 | 0.8382 | 0.8029 | 0.9958 | 1.0 | 0.9944 | 0.9910 | 0.9875 | 1.0 | 0.9480 | 0.9788 | 1.0 | 0.9924 | 1.0 | 1.0 | 1.0 | 0.9929 | 0.9747 | 0.9961 | 0.9949 | 0.9970 | 0.9925 | 0.9983 | 0.9967 | 1.0 | 1.0 | 0.9991 | 1.0 | 1.0 | 0.9953 |
137
 
 
138
  ### Framework versions
139
 
140
  - Transformers 4.35.2
141
+ - Pytorch 2.1.0+cu118
142
  - Datasets 2.15.0
143
+ - Tokenizers 0.15.0