zeroMN commited on
Commit
ab2c44d
·
verified ·
1 Parent(s): b49926f

Upload 6 files

Browse files
Files changed (6) hide show
  1. AutoModel.pth +3 -0
  2. model_config.json +29 -0
  3. tokenizer_config.json +32 -0
  4. vocab.json +0 -0
  5. vocab.txt +0 -0
  6. 配置权重 +1309 -0
AutoModel.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27227c5532d3027b044ad00c12c4aed1334e910459edf80b6b0f2bc83e673198
3
+ size 3237240570
model_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "AutoModel",
3
+ "model_type": "multimodal-transformer",
4
+ "hidden_size": 768,
5
+ "num_attention_heads": 12,
6
+ "num_hidden_layers": 12,
7
+ "intermediate_size": 2048,
8
+ "hidden_dropout_prob": 0.1,
9
+ "attention_probs_dropout_prob": 0.1,
10
+ "image_size": 224,
11
+ "image_channels": 3,
12
+ "patch_size": 16,
13
+ "max_position_embeddings": 512,
14
+ "vocab_size": 30522,
15
+ "type_vocab_size": 2,
16
+ "audio_sample_rate": 16000,
17
+ "audio_frame_size": 1024,
18
+ "audio_hop_size": 512,
19
+ "enable_vqa": true,
20
+ "enable_caption": true,
21
+ "enable_retrieval": true,
22
+ "enable_asr": true,
23
+ "enable_realtime_asr": true,
24
+ "batch_size": 32,
25
+ "learning_rate": 0.0001,
26
+ "weight_decay": 0.01,
27
+ "warmup_steps": 10000,
28
+ "max_steps": 100000
29
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tokenizer_name": "AutoTokenizer",
3
+ "pretrained_model_name": "AutoModel",
4
+ "vocab": {
5
+ "vocab_size": 30522,
6
+ "model_max_length": 512,
7
+ "padding_side": "right",
8
+ "truncation_side": "right",
9
+ "special_tokens": {
10
+ "pad_token": "[PAD]",
11
+ "unk_token": "[UNK]",
12
+ "cls_token": "[CLS]",
13
+ "sep_token": "[SEP]",
14
+ "mask_token": "[MASK]"
15
+ },
16
+ "tokenizer_type": "WordPiece",
17
+ "lowercase": true,
18
+ "pad_token_id": 0,
19
+ "unk_token_id": 100,
20
+ "cls_token_id": 101,
21
+ "sep_token_id": 102,
22
+ "mask_token_id": 103
23
+ },
24
+ "normalization": {
25
+ "lowercase": true,
26
+ "strip_accents": true
27
+ },
28
+ "preprocessing": {
29
+ "do_lower_case": true,
30
+ "handle_chinese_chars": true
31
+ }
32
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
配置权重 ADDED
@@ -0,0 +1,1309 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 配置文件已生成: C:\Users\baby7\Desktop\fastAPI\model_config.json
2
+ {
3
+ "model_info": {
4
+ "total_layers": 176,
5
+ "layers": [
6
+ {
7
+ "name": "image_encoder.encoder_layer.0.weight",
8
+ "shape": [
9
+ 64,
10
+ 3,
11
+ 3,
12
+ 3
13
+ ],
14
+ "dtype": "torch.float32"
15
+ },
16
+ {
17
+ "name": "image_encoder.encoder_layer.0.bias",
18
+ "shape": [
19
+ 64
20
+ ],
21
+ "dtype": "torch.float32"
22
+ },
23
+ {
24
+ "name": "image_encoder.encoder_layer.4.weight",
25
+ "shape": [
26
+ 768,
27
+ 788544
28
+ ],
29
+ "dtype": "torch.float32"
30
+ },
31
+ {
32
+ "name": "image_encoder.encoder_layer.4.bias",
33
+ "shape": [
34
+ 768
35
+ ],
36
+ "dtype": "torch.float32"
37
+ },
38
+ {
39
+ "name": "text_encoder.transformer_layer.self_attn.in_proj_weight",
40
+ "shape": [
41
+ 2304,
42
+ 768
43
+ ],
44
+ "dtype": "torch.float32"
45
+ },
46
+ {
47
+ "name": "text_encoder.transformer_layer.self_attn.in_proj_bias",
48
+ "shape": [
49
+ 2304
50
+ ],
51
+ "dtype": "torch.float32"
52
+ },
53
+ {
54
+ "name": "text_encoder.transformer_layer.self_attn.out_proj.weight",
55
+ "shape": [
56
+ 768,
57
+ 768
58
+ ],
59
+ "dtype": "torch.float32"
60
+ },
61
+ {
62
+ "name": "text_encoder.transformer_layer.self_attn.out_proj.bias",
63
+ "shape": [
64
+ 768
65
+ ],
66
+ "dtype": "torch.float32"
67
+ },
68
+ {
69
+ "name": "text_encoder.transformer_layer.linear1.weight",
70
+ "shape": [
71
+ 2048,
72
+ 768
73
+ ],
74
+ "dtype": "torch.float32"
75
+ },
76
+ {
77
+ "name": "text_encoder.transformer_layer.linear1.bias",
78
+ "shape": [
79
+ 2048
80
+ ],
81
+ "dtype": "torch.float32"
82
+ },
83
+ {
84
+ "name": "text_encoder.transformer_layer.linear2.weight",
85
+ "shape": [
86
+ 768,
87
+ 2048
88
+ ],
89
+ "dtype": "torch.float32"
90
+ },
91
+ {
92
+ "name": "text_encoder.transformer_layer.linear2.bias",
93
+ "shape": [
94
+ 768
95
+ ],
96
+ "dtype": "torch.float32"
97
+ },
98
+ {
99
+ "name": "text_encoder.transformer_layer.norm1.weight",
100
+ "shape": [
101
+ 768
102
+ ],
103
+ "dtype": "torch.float32"
104
+ },
105
+ {
106
+ "name": "text_encoder.transformer_layer.norm1.bias",
107
+ "shape": [
108
+ 768
109
+ ],
110
+ "dtype": "torch.float32"
111
+ },
112
+ {
113
+ "name": "text_encoder.transformer_layer.norm2.weight",
114
+ "shape": [
115
+ 768
116
+ ],
117
+ "dtype": "torch.float32"
118
+ },
119
+ {
120
+ "name": "text_encoder.transformer_layer.norm2.bias",
121
+ "shape": [
122
+ 768
123
+ ],
124
+ "dtype": "torch.float32"
125
+ },
126
+ {
127
+ "name": "text_encoder.transformer_encoder.layers.0.self_attn.in_proj_weight",
128
+ "shape": [
129
+ 2304,
130
+ 768
131
+ ],
132
+ "dtype": "torch.float32"
133
+ },
134
+ {
135
+ "name": "text_encoder.transformer_encoder.layers.0.self_attn.in_proj_bias",
136
+ "shape": [
137
+ 2304
138
+ ],
139
+ "dtype": "torch.float32"
140
+ },
141
+ {
142
+ "name": "text_encoder.transformer_encoder.layers.0.self_attn.out_proj.weight",
143
+ "shape": [
144
+ 768,
145
+ 768
146
+ ],
147
+ "dtype": "torch.float32"
148
+ },
149
+ {
150
+ "name": "text_encoder.transformer_encoder.layers.0.self_attn.out_proj.bias",
151
+ "shape": [
152
+ 768
153
+ ],
154
+ "dtype": "torch.float32"
155
+ },
156
+ {
157
+ "name": "text_encoder.transformer_encoder.layers.0.linear1.weight",
158
+ "shape": [
159
+ 2048,
160
+ 768
161
+ ],
162
+ "dtype": "torch.float32"
163
+ },
164
+ {
165
+ "name": "text_encoder.transformer_encoder.layers.0.linear1.bias",
166
+ "shape": [
167
+ 2048
168
+ ],
169
+ "dtype": "torch.float32"
170
+ },
171
+ {
172
+ "name": "text_encoder.transformer_encoder.layers.0.linear2.weight",
173
+ "shape": [
174
+ 768,
175
+ 2048
176
+ ],
177
+ "dtype": "torch.float32"
178
+ },
179
+ {
180
+ "name": "text_encoder.transformer_encoder.layers.0.linear2.bias",
181
+ "shape": [
182
+ 768
183
+ ],
184
+ "dtype": "torch.float32"
185
+ },
186
+ {
187
+ "name": "text_encoder.transformer_encoder.layers.0.norm1.weight",
188
+ "shape": [
189
+ 768
190
+ ],
191
+ "dtype": "torch.float32"
192
+ },
193
+ {
194
+ "name": "text_encoder.transformer_encoder.layers.0.norm1.bias",
195
+ "shape": [
196
+ 768
197
+ ],
198
+ "dtype": "torch.float32"
199
+ },
200
+ {
201
+ "name": "text_encoder.transformer_encoder.layers.0.norm2.weight",
202
+ "shape": [
203
+ 768
204
+ ],
205
+ "dtype": "torch.float32"
206
+ },
207
+ {
208
+ "name": "text_encoder.transformer_encoder.layers.0.norm2.bias",
209
+ "shape": [
210
+ 768
211
+ ],
212
+ "dtype": "torch.float32"
213
+ },
214
+ {
215
+ "name": "text_encoder.transformer_encoder.layers.1.self_attn.in_proj_weight",
216
+ "shape": [
217
+ 2304,
218
+ 768
219
+ ],
220
+ "dtype": "torch.float32"
221
+ },
222
+ {
223
+ "name": "text_encoder.transformer_encoder.layers.1.self_attn.in_proj_bias",
224
+ "shape": [
225
+ 2304
226
+ ],
227
+ "dtype": "torch.float32"
228
+ },
229
+ {
230
+ "name": "text_encoder.transformer_encoder.layers.1.self_attn.out_proj.weight",
231
+ "shape": [
232
+ 768,
233
+ 768
234
+ ],
235
+ "dtype": "torch.float32"
236
+ },
237
+ {
238
+ "name": "text_encoder.transformer_encoder.layers.1.self_attn.out_proj.bias",
239
+ "shape": [
240
+ 768
241
+ ],
242
+ "dtype": "torch.float32"
243
+ },
244
+ {
245
+ "name": "text_encoder.transformer_encoder.layers.1.linear1.weight",
246
+ "shape": [
247
+ 2048,
248
+ 768
249
+ ],
250
+ "dtype": "torch.float32"
251
+ },
252
+ {
253
+ "name": "text_encoder.transformer_encoder.layers.1.linear1.bias",
254
+ "shape": [
255
+ 2048
256
+ ],
257
+ "dtype": "torch.float32"
258
+ },
259
+ {
260
+ "name": "text_encoder.transformer_encoder.layers.1.linear2.weight",
261
+ "shape": [
262
+ 768,
263
+ 2048
264
+ ],
265
+ "dtype": "torch.float32"
266
+ },
267
+ {
268
+ "name": "text_encoder.transformer_encoder.layers.1.linear2.bias",
269
+ "shape": [
270
+ 768
271
+ ],
272
+ "dtype": "torch.float32"
273
+ },
274
+ {
275
+ "name": "text_encoder.transformer_encoder.layers.1.norm1.weight",
276
+ "shape": [
277
+ 768
278
+ ],
279
+ "dtype": "torch.float32"
280
+ },
281
+ {
282
+ "name": "text_encoder.transformer_encoder.layers.1.norm1.bias",
283
+ "shape": [
284
+ 768
285
+ ],
286
+ "dtype": "torch.float32"
287
+ },
288
+ {
289
+ "name": "text_encoder.transformer_encoder.layers.1.norm2.weight",
290
+ "shape": [
291
+ 768
292
+ ],
293
+ "dtype": "torch.float32"
294
+ },
295
+ {
296
+ "name": "text_encoder.transformer_encoder.layers.1.norm2.bias",
297
+ "shape": [
298
+ 768
299
+ ],
300
+ "dtype": "torch.float32"
301
+ },
302
+ {
303
+ "name": "text_encoder.transformer_encoder.layers.2.self_attn.in_proj_weight",
304
+ "shape": [
305
+ 2304,
306
+ 768
307
+ ],
308
+ "dtype": "torch.float32"
309
+ },
310
+ {
311
+ "name": "text_encoder.transformer_encoder.layers.2.self_attn.in_proj_bias",
312
+ "shape": [
313
+ 2304
314
+ ],
315
+ "dtype": "torch.float32"
316
+ },
317
+ {
318
+ "name": "text_encoder.transformer_encoder.layers.2.self_attn.out_proj.weight",
319
+ "shape": [
320
+ 768,
321
+ 768
322
+ ],
323
+ "dtype": "torch.float32"
324
+ },
325
+ {
326
+ "name": "text_encoder.transformer_encoder.layers.2.self_attn.out_proj.bias",
327
+ "shape": [
328
+ 768
329
+ ],
330
+ "dtype": "torch.float32"
331
+ },
332
+ {
333
+ "name": "text_encoder.transformer_encoder.layers.2.linear1.weight",
334
+ "shape": [
335
+ 2048,
336
+ 768
337
+ ],
338
+ "dtype": "torch.float32"
339
+ },
340
+ {
341
+ "name": "text_encoder.transformer_encoder.layers.2.linear1.bias",
342
+ "shape": [
343
+ 2048
344
+ ],
345
+ "dtype": "torch.float32"
346
+ },
347
+ {
348
+ "name": "text_encoder.transformer_encoder.layers.2.linear2.weight",
349
+ "shape": [
350
+ 768,
351
+ 2048
352
+ ],
353
+ "dtype": "torch.float32"
354
+ },
355
+ {
356
+ "name": "text_encoder.transformer_encoder.layers.2.linear2.bias",
357
+ "shape": [
358
+ 768
359
+ ],
360
+ "dtype": "torch.float32"
361
+ },
362
+ {
363
+ "name": "text_encoder.transformer_encoder.layers.2.norm1.weight",
364
+ "shape": [
365
+ 768
366
+ ],
367
+ "dtype": "torch.float32"
368
+ },
369
+ {
370
+ "name": "text_encoder.transformer_encoder.layers.2.norm1.bias",
371
+ "shape": [
372
+ 768
373
+ ],
374
+ "dtype": "torch.float32"
375
+ },
376
+ {
377
+ "name": "text_encoder.transformer_encoder.layers.2.norm2.weight",
378
+ "shape": [
379
+ 768
380
+ ],
381
+ "dtype": "torch.float32"
382
+ },
383
+ {
384
+ "name": "text_encoder.transformer_encoder.layers.2.norm2.bias",
385
+ "shape": [
386
+ 768
387
+ ],
388
+ "dtype": "torch.float32"
389
+ },
390
+ {
391
+ "name": "text_encoder.transformer_encoder.layers.3.self_attn.in_proj_weight",
392
+ "shape": [
393
+ 2304,
394
+ 768
395
+ ],
396
+ "dtype": "torch.float32"
397
+ },
398
+ {
399
+ "name": "text_encoder.transformer_encoder.layers.3.self_attn.in_proj_bias",
400
+ "shape": [
401
+ 2304
402
+ ],
403
+ "dtype": "torch.float32"
404
+ },
405
+ {
406
+ "name": "text_encoder.transformer_encoder.layers.3.self_attn.out_proj.weight",
407
+ "shape": [
408
+ 768,
409
+ 768
410
+ ],
411
+ "dtype": "torch.float32"
412
+ },
413
+ {
414
+ "name": "text_encoder.transformer_encoder.layers.3.self_attn.out_proj.bias",
415
+ "shape": [
416
+ 768
417
+ ],
418
+ "dtype": "torch.float32"
419
+ },
420
+ {
421
+ "name": "text_encoder.transformer_encoder.layers.3.linear1.weight",
422
+ "shape": [
423
+ 2048,
424
+ 768
425
+ ],
426
+ "dtype": "torch.float32"
427
+ },
428
+ {
429
+ "name": "text_encoder.transformer_encoder.layers.3.linear1.bias",
430
+ "shape": [
431
+ 2048
432
+ ],
433
+ "dtype": "torch.float32"
434
+ },
435
+ {
436
+ "name": "text_encoder.transformer_encoder.layers.3.linear2.weight",
437
+ "shape": [
438
+ 768,
439
+ 2048
440
+ ],
441
+ "dtype": "torch.float32"
442
+ },
443
+ {
444
+ "name": "text_encoder.transformer_encoder.layers.3.linear2.bias",
445
+ "shape": [
446
+ 768
447
+ ],
448
+ "dtype": "torch.float32"
449
+ },
450
+ {
451
+ "name": "text_encoder.transformer_encoder.layers.3.norm1.weight",
452
+ "shape": [
453
+ 768
454
+ ],
455
+ "dtype": "torch.float32"
456
+ },
457
+ {
458
+ "name": "text_encoder.transformer_encoder.layers.3.norm1.bias",
459
+ "shape": [
460
+ 768
461
+ ],
462
+ "dtype": "torch.float32"
463
+ },
464
+ {
465
+ "name": "text_encoder.transformer_encoder.layers.3.norm2.weight",
466
+ "shape": [
467
+ 768
468
+ ],
469
+ "dtype": "torch.float32"
470
+ },
471
+ {
472
+ "name": "text_encoder.transformer_encoder.layers.3.norm2.bias",
473
+ "shape": [
474
+ 768
475
+ ],
476
+ "dtype": "torch.float32"
477
+ },
478
+ {
479
+ "name": "text_encoder.transformer_encoder.layers.4.self_attn.in_proj_weight",
480
+ "shape": [
481
+ 2304,
482
+ 768
483
+ ],
484
+ "dtype": "torch.float32"
485
+ },
486
+ {
487
+ "name": "text_encoder.transformer_encoder.layers.4.self_attn.in_proj_bias",
488
+ "shape": [
489
+ 2304
490
+ ],
491
+ "dtype": "torch.float32"
492
+ },
493
+ {
494
+ "name": "text_encoder.transformer_encoder.layers.4.self_attn.out_proj.weight",
495
+ "shape": [
496
+ 768,
497
+ 768
498
+ ],
499
+ "dtype": "torch.float32"
500
+ },
501
+ {
502
+ "name": "text_encoder.transformer_encoder.layers.4.self_attn.out_proj.bias",
503
+ "shape": [
504
+ 768
505
+ ],
506
+ "dtype": "torch.float32"
507
+ },
508
+ {
509
+ "name": "text_encoder.transformer_encoder.layers.4.linear1.weight",
510
+ "shape": [
511
+ 2048,
512
+ 768
513
+ ],
514
+ "dtype": "torch.float32"
515
+ },
516
+ {
517
+ "name": "text_encoder.transformer_encoder.layers.4.linear1.bias",
518
+ "shape": [
519
+ 2048
520
+ ],
521
+ "dtype": "torch.float32"
522
+ },
523
+ {
524
+ "name": "text_encoder.transformer_encoder.layers.4.linear2.weight",
525
+ "shape": [
526
+ 768,
527
+ 2048
528
+ ],
529
+ "dtype": "torch.float32"
530
+ },
531
+ {
532
+ "name": "text_encoder.transformer_encoder.layers.4.linear2.bias",
533
+ "shape": [
534
+ 768
535
+ ],
536
+ "dtype": "torch.float32"
537
+ },
538
+ {
539
+ "name": "text_encoder.transformer_encoder.layers.4.norm1.weight",
540
+ "shape": [
541
+ 768
542
+ ],
543
+ "dtype": "torch.float32"
544
+ },
545
+ {
546
+ "name": "text_encoder.transformer_encoder.layers.4.norm1.bias",
547
+ "shape": [
548
+ 768
549
+ ],
550
+ "dtype": "torch.float32"
551
+ },
552
+ {
553
+ "name": "text_encoder.transformer_encoder.layers.4.norm2.weight",
554
+ "shape": [
555
+ 768
556
+ ],
557
+ "dtype": "torch.float32"
558
+ },
559
+ {
560
+ "name": "text_encoder.transformer_encoder.layers.4.norm2.bias",
561
+ "shape": [
562
+ 768
563
+ ],
564
+ "dtype": "torch.float32"
565
+ },
566
+ {
567
+ "name": "text_encoder.transformer_encoder.layers.5.self_attn.in_proj_weight",
568
+ "shape": [
569
+ 2304,
570
+ 768
571
+ ],
572
+ "dtype": "torch.float32"
573
+ },
574
+ {
575
+ "name": "text_encoder.transformer_encoder.layers.5.self_attn.in_proj_bias",
576
+ "shape": [
577
+ 2304
578
+ ],
579
+ "dtype": "torch.float32"
580
+ },
581
+ {
582
+ "name": "text_encoder.transformer_encoder.layers.5.self_attn.out_proj.weight",
583
+ "shape": [
584
+ 768,
585
+ 768
586
+ ],
587
+ "dtype": "torch.float32"
588
+ },
589
+ {
590
+ "name": "text_encoder.transformer_encoder.layers.5.self_attn.out_proj.bias",
591
+ "shape": [
592
+ 768
593
+ ],
594
+ "dtype": "torch.float32"
595
+ },
596
+ {
597
+ "name": "text_encoder.transformer_encoder.layers.5.linear1.weight",
598
+ "shape": [
599
+ 2048,
600
+ 768
601
+ ],
602
+ "dtype": "torch.float32"
603
+ },
604
+ {
605
+ "name": "text_encoder.transformer_encoder.layers.5.linear1.bias",
606
+ "shape": [
607
+ 2048
608
+ ],
609
+ "dtype": "torch.float32"
610
+ },
611
+ {
612
+ "name": "text_encoder.transformer_encoder.layers.5.linear2.weight",
613
+ "shape": [
614
+ 768,
615
+ 2048
616
+ ],
617
+ "dtype": "torch.float32"
618
+ },
619
+ {
620
+ "name": "text_encoder.transformer_encoder.layers.5.linear2.bias",
621
+ "shape": [
622
+ 768
623
+ ],
624
+ "dtype": "torch.float32"
625
+ },
626
+ {
627
+ "name": "text_encoder.transformer_encoder.layers.5.norm1.weight",
628
+ "shape": [
629
+ 768
630
+ ],
631
+ "dtype": "torch.float32"
632
+ },
633
+ {
634
+ "name": "text_encoder.transformer_encoder.layers.5.norm1.bias",
635
+ "shape": [
636
+ 768
637
+ ],
638
+ "dtype": "torch.float32"
639
+ },
640
+ {
641
+ "name": "text_encoder.transformer_encoder.layers.5.norm2.weight",
642
+ "shape": [
643
+ 768
644
+ ],
645
+ "dtype": "torch.float32"
646
+ },
647
+ {
648
+ "name": "text_encoder.transformer_encoder.layers.5.norm2.bias",
649
+ "shape": [
650
+ 768
651
+ ],
652
+ "dtype": "torch.float32"
653
+ },
654
+ {
655
+ "name": "text_encoder.transformer_encoder.layers.6.self_attn.in_proj_weight",
656
+ "shape": [
657
+ 2304,
658
+ 768
659
+ ],
660
+ "dtype": "torch.float32"
661
+ },
662
+ {
663
+ "name": "text_encoder.transformer_encoder.layers.6.self_attn.in_proj_bias",
664
+ "shape": [
665
+ 2304
666
+ ],
667
+ "dtype": "torch.float32"
668
+ },
669
+ {
670
+ "name": "text_encoder.transformer_encoder.layers.6.self_attn.out_proj.weight",
671
+ "shape": [
672
+ 768,
673
+ 768
674
+ ],
675
+ "dtype": "torch.float32"
676
+ },
677
+ {
678
+ "name": "text_encoder.transformer_encoder.layers.6.self_attn.out_proj.bias",
679
+ "shape": [
680
+ 768
681
+ ],
682
+ "dtype": "torch.float32"
683
+ },
684
+ {
685
+ "name": "text_encoder.transformer_encoder.layers.6.linear1.weight",
686
+ "shape": [
687
+ 2048,
688
+ 768
689
+ ],
690
+ "dtype": "torch.float32"
691
+ },
692
+ {
693
+ "name": "text_encoder.transformer_encoder.layers.6.linear1.bias",
694
+ "shape": [
695
+ 2048
696
+ ],
697
+ "dtype": "torch.float32"
698
+ },
699
+ {
700
+ "name": "text_encoder.transformer_encoder.layers.6.linear2.weight",
701
+ "shape": [
702
+ 768,
703
+ 2048
704
+ ],
705
+ "dtype": "torch.float32"
706
+ },
707
+ {
708
+ "name": "text_encoder.transformer_encoder.layers.6.linear2.bias",
709
+ "shape": [
710
+ 768
711
+ ],
712
+ "dtype": "torch.float32"
713
+ },
714
+ {
715
+ "name": "text_encoder.transformer_encoder.layers.6.norm1.weight",
716
+ "shape": [
717
+ 768
718
+ ],
719
+ "dtype": "torch.float32"
720
+ },
721
+ {
722
+ "name": "text_encoder.transformer_encoder.layers.6.norm1.bias",
723
+ "shape": [
724
+ 768
725
+ ],
726
+ "dtype": "torch.float32"
727
+ },
728
+ {
729
+ "name": "text_encoder.transformer_encoder.layers.6.norm2.weight",
730
+ "shape": [
731
+ 768
732
+ ],
733
+ "dtype": "torch.float32"
734
+ },
735
+ {
736
+ "name": "text_encoder.transformer_encoder.layers.6.norm2.bias",
737
+ "shape": [
738
+ 768
739
+ ],
740
+ "dtype": "torch.float32"
741
+ },
742
+ {
743
+ "name": "text_encoder.transformer_encoder.layers.7.self_attn.in_proj_weight",
744
+ "shape": [
745
+ 2304,
746
+ 768
747
+ ],
748
+ "dtype": "torch.float32"
749
+ },
750
+ {
751
+ "name": "text_encoder.transformer_encoder.layers.7.self_attn.in_proj_bias",
752
+ "shape": [
753
+ 2304
754
+ ],
755
+ "dtype": "torch.float32"
756
+ },
757
+ {
758
+ "name": "text_encoder.transformer_encoder.layers.7.self_attn.out_proj.weight",
759
+ "shape": [
760
+ 768,
761
+ 768
762
+ ],
763
+ "dtype": "torch.float32"
764
+ },
765
+ {
766
+ "name": "text_encoder.transformer_encoder.layers.7.self_attn.out_proj.bias",
767
+ "shape": [
768
+ 768
769
+ ],
770
+ "dtype": "torch.float32"
771
+ },
772
+ {
773
+ "name": "text_encoder.transformer_encoder.layers.7.linear1.weight",
774
+ "shape": [
775
+ 2048,
776
+ 768
777
+ ],
778
+ "dtype": "torch.float32"
779
+ },
780
+ {
781
+ "name": "text_encoder.transformer_encoder.layers.7.linear1.bias",
782
+ "shape": [
783
+ 2048
784
+ ],
785
+ "dtype": "torch.float32"
786
+ },
787
+ {
788
+ "name": "text_encoder.transformer_encoder.layers.7.linear2.weight",
789
+ "shape": [
790
+ 768,
791
+ 2048
792
+ ],
793
+ "dtype": "torch.float32"
794
+ },
795
+ {
796
+ "name": "text_encoder.transformer_encoder.layers.7.linear2.bias",
797
+ "shape": [
798
+ 768
799
+ ],
800
+ "dtype": "torch.float32"
801
+ },
802
+ {
803
+ "name": "text_encoder.transformer_encoder.layers.7.norm1.weight",
804
+ "shape": [
805
+ 768
806
+ ],
807
+ "dtype": "torch.float32"
808
+ },
809
+ {
810
+ "name": "text_encoder.transformer_encoder.layers.7.norm1.bias",
811
+ "shape": [
812
+ 768
813
+ ],
814
+ "dtype": "torch.float32"
815
+ },
816
+ {
817
+ "name": "text_encoder.transformer_encoder.layers.7.norm2.weight",
818
+ "shape": [
819
+ 768
820
+ ],
821
+ "dtype": "torch.float32"
822
+ },
823
+ {
824
+ "name": "text_encoder.transformer_encoder.layers.7.norm2.bias",
825
+ "shape": [
826
+ 768
827
+ ],
828
+ "dtype": "torch.float32"
829
+ },
830
+ {
831
+ "name": "text_encoder.transformer_encoder.layers.8.self_attn.in_proj_weight",
832
+ "shape": [
833
+ 2304,
834
+ 768
835
+ ],
836
+ "dtype": "torch.float32"
837
+ },
838
+ {
839
+ "name": "text_encoder.transformer_encoder.layers.8.self_attn.in_proj_bias",
840
+ "shape": [
841
+ 2304
842
+ ],
843
+ "dtype": "torch.float32"
844
+ },
845
+ {
846
+ "name": "text_encoder.transformer_encoder.layers.8.self_attn.out_proj.weight",
847
+ "shape": [
848
+ 768,
849
+ 768
850
+ ],
851
+ "dtype": "torch.float32"
852
+ },
853
+ {
854
+ "name": "text_encoder.transformer_encoder.layers.8.self_attn.out_proj.bias",
855
+ "shape": [
856
+ 768
857
+ ],
858
+ "dtype": "torch.float32"
859
+ },
860
+ {
861
+ "name": "text_encoder.transformer_encoder.layers.8.linear1.weight",
862
+ "shape": [
863
+ 2048,
864
+ 768
865
+ ],
866
+ "dtype": "torch.float32"
867
+ },
868
+ {
869
+ "name": "text_encoder.transformer_encoder.layers.8.linear1.bias",
870
+ "shape": [
871
+ 2048
872
+ ],
873
+ "dtype": "torch.float32"
874
+ },
875
+ {
876
+ "name": "text_encoder.transformer_encoder.layers.8.linear2.weight",
877
+ "shape": [
878
+ 768,
879
+ 2048
880
+ ],
881
+ "dtype": "torch.float32"
882
+ },
883
+ {
884
+ "name": "text_encoder.transformer_encoder.layers.8.linear2.bias",
885
+ "shape": [
886
+ 768
887
+ ],
888
+ "dtype": "torch.float32"
889
+ },
890
+ {
891
+ "name": "text_encoder.transformer_encoder.layers.8.norm1.weight",
892
+ "shape": [
893
+ 768
894
+ ],
895
+ "dtype": "torch.float32"
896
+ },
897
+ {
898
+ "name": "text_encoder.transformer_encoder.layers.8.norm1.bias",
899
+ "shape": [
900
+ 768
901
+ ],
902
+ "dtype": "torch.float32"
903
+ },
904
+ {
905
+ "name": "text_encoder.transformer_encoder.layers.8.norm2.weight",
906
+ "shape": [
907
+ 768
908
+ ],
909
+ "dtype": "torch.float32"
910
+ },
911
+ {
912
+ "name": "text_encoder.transformer_encoder.layers.8.norm2.bias",
913
+ "shape": [
914
+ 768
915
+ ],
916
+ "dtype": "torch.float32"
917
+ },
918
+ {
919
+ "name": "text_encoder.transformer_encoder.layers.9.self_attn.in_proj_weight",
920
+ "shape": [
921
+ 2304,
922
+ 768
923
+ ],
924
+ "dtype": "torch.float32"
925
+ },
926
+ {
927
+ "name": "text_encoder.transformer_encoder.layers.9.self_attn.in_proj_bias",
928
+ "shape": [
929
+ 2304
930
+ ],
931
+ "dtype": "torch.float32"
932
+ },
933
+ {
934
+ "name": "text_encoder.transformer_encoder.layers.9.self_attn.out_proj.weight",
935
+ "shape": [
936
+ 768,
937
+ 768
938
+ ],
939
+ "dtype": "torch.float32"
940
+ },
941
+ {
942
+ "name": "text_encoder.transformer_encoder.layers.9.self_attn.out_proj.bias",
943
+ "shape": [
944
+ 768
945
+ ],
946
+ "dtype": "torch.float32"
947
+ },
948
+ {
949
+ "name": "text_encoder.transformer_encoder.layers.9.linear1.weight",
950
+ "shape": [
951
+ 2048,
952
+ 768
953
+ ],
954
+ "dtype": "torch.float32"
955
+ },
956
+ {
957
+ "name": "text_encoder.transformer_encoder.layers.9.linear1.bias",
958
+ "shape": [
959
+ 2048
960
+ ],
961
+ "dtype": "torch.float32"
962
+ },
963
+ {
964
+ "name": "text_encoder.transformer_encoder.layers.9.linear2.weight",
965
+ "shape": [
966
+ 768,
967
+ 2048
968
+ ],
969
+ "dtype": "torch.float32"
970
+ },
971
+ {
972
+ "name": "text_encoder.transformer_encoder.layers.9.linear2.bias",
973
+ "shape": [
974
+ 768
975
+ ],
976
+ "dtype": "torch.float32"
977
+ },
978
+ {
979
+ "name": "text_encoder.transformer_encoder.layers.9.norm1.weight",
980
+ "shape": [
981
+ 768
982
+ ],
983
+ "dtype": "torch.float32"
984
+ },
985
+ {
986
+ "name": "text_encoder.transformer_encoder.layers.9.norm1.bias",
987
+ "shape": [
988
+ 768
989
+ ],
990
+ "dtype": "torch.float32"
991
+ },
992
+ {
993
+ "name": "text_encoder.transformer_encoder.layers.9.norm2.weight",
994
+ "shape": [
995
+ 768
996
+ ],
997
+ "dtype": "torch.float32"
998
+ },
999
+ {
1000
+ "name": "text_encoder.transformer_encoder.layers.9.norm2.bias",
1001
+ "shape": [
1002
+ 768
1003
+ ],
1004
+ "dtype": "torch.float32"
1005
+ },
1006
+ {
1007
+ "name": "text_encoder.transformer_encoder.layers.10.self_attn.in_proj_weight",
1008
+ "shape": [
1009
+ 2304,
1010
+ 768
1011
+ ],
1012
+ "dtype": "torch.float32"
1013
+ },
1014
+ {
1015
+ "name": "text_encoder.transformer_encoder.layers.10.self_attn.in_proj_bias",
1016
+ "shape": [
1017
+ 2304
1018
+ ],
1019
+ "dtype": "torch.float32"
1020
+ },
1021
+ {
1022
+ "name": "text_encoder.transformer_encoder.layers.10.self_attn.out_proj.weight",
1023
+ "shape": [
1024
+ 768,
1025
+ 768
1026
+ ],
1027
+ "dtype": "torch.float32"
1028
+ },
1029
+ {
1030
+ "name": "text_encoder.transformer_encoder.layers.10.self_attn.out_proj.bias",
1031
+ "shape": [
1032
+ 768
1033
+ ],
1034
+ "dtype": "torch.float32"
1035
+ },
1036
+ {
1037
+ "name": "text_encoder.transformer_encoder.layers.10.linear1.weight",
1038
+ "shape": [
1039
+ 2048,
1040
+ 768
1041
+ ],
1042
+ "dtype": "torch.float32"
1043
+ },
1044
+ {
1045
+ "name": "text_encoder.transformer_encoder.layers.10.linear1.bias",
1046
+ "shape": [
1047
+ 2048
1048
+ ],
1049
+ "dtype": "torch.float32"
1050
+ },
1051
+ {
1052
+ "name": "text_encoder.transformer_encoder.layers.10.linear2.weight",
1053
+ "shape": [
1054
+ 768,
1055
+ 2048
1056
+ ],
1057
+ "dtype": "torch.float32"
1058
+ },
1059
+ {
1060
+ "name": "text_encoder.transformer_encoder.layers.10.linear2.bias",
1061
+ "shape": [
1062
+ 768
1063
+ ],
1064
+ "dtype": "torch.float32"
1065
+ },
1066
+ {
1067
+ "name": "text_encoder.transformer_encoder.layers.10.norm1.weight",
1068
+ "shape": [
1069
+ 768
1070
+ ],
1071
+ "dtype": "torch.float32"
1072
+ },
1073
+ {
1074
+ "name": "text_encoder.transformer_encoder.layers.10.norm1.bias",
1075
+ "shape": [
1076
+ 768
1077
+ ],
1078
+ "dtype": "torch.float32"
1079
+ },
1080
+ {
1081
+ "name": "text_encoder.transformer_encoder.layers.10.norm2.weight",
1082
+ "shape": [
1083
+ 768
1084
+ ],
1085
+ "dtype": "torch.float32"
1086
+ },
1087
+ {
1088
+ "name": "text_encoder.transformer_encoder.layers.10.norm2.bias",
1089
+ "shape": [
1090
+ 768
1091
+ ],
1092
+ "dtype": "torch.float32"
1093
+ },
1094
+ {
1095
+ "name": "text_encoder.transformer_encoder.layers.11.self_attn.in_proj_weight",
1096
+ "shape": [
1097
+ 2304,
1098
+ 768
1099
+ ],
1100
+ "dtype": "torch.float32"
1101
+ },
1102
+ {
1103
+ "name": "text_encoder.transformer_encoder.layers.11.self_attn.in_proj_bias",
1104
+ "shape": [
1105
+ 2304
1106
+ ],
1107
+ "dtype": "torch.float32"
1108
+ },
1109
+ {
1110
+ "name": "text_encoder.transformer_encoder.layers.11.self_attn.out_proj.weight",
1111
+ "shape": [
1112
+ 768,
1113
+ 768
1114
+ ],
1115
+ "dtype": "torch.float32"
1116
+ },
1117
+ {
1118
+ "name": "text_encoder.transformer_encoder.layers.11.self_attn.out_proj.bias",
1119
+ "shape": [
1120
+ 768
1121
+ ],
1122
+ "dtype": "torch.float32"
1123
+ },
1124
+ {
1125
+ "name": "text_encoder.transformer_encoder.layers.11.linear1.weight",
1126
+ "shape": [
1127
+ 2048,
1128
+ 768
1129
+ ],
1130
+ "dtype": "torch.float32"
1131
+ },
1132
+ {
1133
+ "name": "text_encoder.transformer_encoder.layers.11.linear1.bias",
1134
+ "shape": [
1135
+ 2048
1136
+ ],
1137
+ "dtype": "torch.float32"
1138
+ },
1139
+ {
1140
+ "name": "text_encoder.transformer_encoder.layers.11.linear2.weight",
1141
+ "shape": [
1142
+ 768,
1143
+ 2048
1144
+ ],
1145
+ "dtype": "torch.float32"
1146
+ },
1147
+ {
1148
+ "name": "text_encoder.transformer_encoder.layers.11.linear2.bias",
1149
+ "shape": [
1150
+ 768
1151
+ ],
1152
+ "dtype": "torch.float32"
1153
+ },
1154
+ {
1155
+ "name": "text_encoder.transformer_encoder.layers.11.norm1.weight",
1156
+ "shape": [
1157
+ 768
1158
+ ],
1159
+ "dtype": "torch.float32"
1160
+ },
1161
+ {
1162
+ "name": "text_encoder.transformer_encoder.layers.11.norm1.bias",
1163
+ "shape": [
1164
+ 768
1165
+ ],
1166
+ "dtype": "torch.float32"
1167
+ },
1168
+ {
1169
+ "name": "text_encoder.transformer_encoder.layers.11.norm2.weight",
1170
+ "shape": [
1171
+ 768
1172
+ ],
1173
+ "dtype": "torch.float32"
1174
+ },
1175
+ {
1176
+ "name": "text_encoder.transformer_encoder.layers.11.norm2.bias",
1177
+ "shape": [
1178
+ 768
1179
+ ],
1180
+ "dtype": "torch.float32"
1181
+ },
1182
+ {
1183
+ "name": "audio_encoder.encoder_layer.0.weight",
1184
+ "shape": [
1185
+ 768,
1186
+ 16000
1187
+ ],
1188
+ "dtype": "torch.float32"
1189
+ },
1190
+ {
1191
+ "name": "audio_encoder.encoder_layer.0.bias",
1192
+ "shape": [
1193
+ 768
1194
+ ],
1195
+ "dtype": "torch.float32"
1196
+ },
1197
+ {
1198
+ "name": "audio_encoder.encoder_layer.2.weight",
1199
+ "shape": [
1200
+ 768,
1201
+ 768
1202
+ ],
1203
+ "dtype": "torch.float32"
1204
+ },
1205
+ {
1206
+ "name": "audio_encoder.encoder_layer.2.bias",
1207
+ "shape": [
1208
+ 768
1209
+ ],
1210
+ "dtype": "torch.float32"
1211
+ },
1212
+ {
1213
+ "name": "fusion_layer.fusion_layer.weight",
1214
+ "shape": [
1215
+ 768,
1216
+ 2304
1217
+ ],
1218
+ "dtype": "torch.float32"
1219
+ },
1220
+ {
1221
+ "name": "fusion_layer.fusion_layer.bias",
1222
+ "shape": [
1223
+ 768
1224
+ ],
1225
+ "dtype": "torch.float32"
1226
+ },
1227
+ {
1228
+ "name": "vqa_layer.vqa_layer.weight",
1229
+ "shape": [
1230
+ 30522,
1231
+ 768
1232
+ ],
1233
+ "dtype": "torch.float32"
1234
+ },
1235
+ {
1236
+ "name": "vqa_layer.vqa_layer.bias",
1237
+ "shape": [
1238
+ 30522
1239
+ ],
1240
+ "dtype": "torch.float32"
1241
+ },
1242
+ {
1243
+ "name": "caption_layer.caption_layer.weight",
1244
+ "shape": [
1245
+ 30522,
1246
+ 768
1247
+ ],
1248
+ "dtype": "torch.float32"
1249
+ },
1250
+ {
1251
+ "name": "caption_layer.caption_layer.bias",
1252
+ "shape": [
1253
+ 30522
1254
+ ],
1255
+ "dtype": "torch.float32"
1256
+ },
1257
+ {
1258
+ "name": "retrieval_layer.retrieval_layer.weight",
1259
+ "shape": [
1260
+ 30522,
1261
+ 768
1262
+ ],
1263
+ "dtype": "torch.float32"
1264
+ },
1265
+ {
1266
+ "name": "retrieval_layer.retrieval_layer.bias",
1267
+ "shape": [
1268
+ 30522
1269
+ ],
1270
+ "dtype": "torch.float32"
1271
+ },
1272
+ {
1273
+ "name": "asr_layer.asr_layer.weight",
1274
+ "shape": [
1275
+ 30522,
1276
+ 768
1277
+ ],
1278
+ "dtype": "torch.float32"
1279
+ },
1280
+ {
1281
+ "name": "asr_layer.asr_layer.bias",
1282
+ "shape": [
1283
+ 30522
1284
+ ],
1285
+ "dtype": "torch.float32"
1286
+ },
1287
+ {
1288
+ "name": "realtime_asr_layer.realtime_asr_layer.weight",
1289
+ "shape": [
1290
+ 30522,
1291
+ 768
1292
+ ],
1293
+ "dtype": "torch.float32"
1294
+ },
1295
+ {
1296
+ "name": "realtime_asr_layer.realtime_asr_layer.bias",
1297
+ "shape": [
1298
+ 30522
1299
+ ],
1300
+ "dtype": "torch.float32"
1301
+ }
1302
+ ]
1303
+ },
1304
+ "file_info": {
1305
+ "path": "C:\\Users\\baby7\\Desktop\\fastAPI\\AutoModel.pth",
1306
+ "size": 3237240570,
1307
+ "last_modified": 1735983514.6732724
1308
+ }
1309
+ }