sigridjineth
commited on
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Running Jina Embedding V3 on Text-Embedding-Inference
|
2 |
+
Changes Made to GTE styled architecture:
|
3 |
+
1. Removed the "roberta" prefix from all tensor names.
|
4 |
+
2. Renamed "mixer" to "attention" in encoder layers.
|
5 |
+
3. Converted "Wqkv" to "qkv_proj" for combined query, key, value projections.
|
6 |
+
4. Renamed "mlp.fc1" to "mlp.up_proj" and "mlp.fc2" to "mlp.down_proj".
|
7 |
+
5. Created "mlp.up_gate_proj" by duplicating and expanding "mlp.up_proj".
|
8 |
+
6. Renamed "norm1" to "attn_ln" and "norm2" to "mlp_ln" in encoder layers.
|
9 |
+
7. Changed "emb_ln" to "embeddings.LayerNorm".
|
10 |
+
8. Renamed "weight" to "gamma" and "bias" to "beta" for layer normalization layers.
|
11 |
+
9. Removed LoRA-related tensors.
|
12 |
+
|
13 |
+
Features:
|
14 |
+
1. Structural Compatibility: The renamed model now closely matches the expected GTE architecture, allowing it to load without "tensor not found" errors.
|
15 |
+
2. Preservation of Core Weights: Most of the original model's weights are preserved, maintaining some of the learned features.
|
16 |
+
3. Adaptability: The script can handle various naming conventions and structures, making it somewhat flexible for future adjustments.
|
17 |
+
4. Transparency: The script provides a clear view of the tensor names and shapes after conversion, aiding in debugging.
|
18 |
+
|
19 |
+
Limitations:
|
20 |
+
1. Approximated Architecture: The conversion is an approximation of the GTE architecture, not an exact match. This may affect model performance.
|
21 |
+
2. Loss of LoRA Adaptations: By removing LoRA-related tensors, we've lost the fine-tuning adaptations, potentially impacting the model's specialized capabilities.
|
22 |
+
3. Up-Gate Projection Approximation: The "up_gate_proj" is created by duplicating weights, which may not accurately represent the intended GTE architecture.
|
23 |
+
4. Potential Performance Impact: The structural changes, especially in the MLP layers, may affect the model's performance and output quality.
|
24 |
+
5. Lack of Positional Embeddings Handling: We haven't specifically addressed positional embeddings, which might be different between XLM-RoBERTa and GTE models.
|
25 |
+
6. Possible Missing Specialized Layers: There might be specialized layers or components in the GTE architecture that we haven't accounted for.
|
26 |
+
7. No Guarantee of Functional Equivalence: While the model now loads, there's no guarantee it will function identically to a true GTE model.
|
27 |
+
8. Config File Mismatch: We haven't addressed potential mismatches in the config.json file, which might cause issues during model initialization or inference.
|