hku-nlp commited on
Commit
20c37ba
·
1 Parent(s): 0c8d9fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -5
README.md CHANGED
@@ -10,15 +10,19 @@ tags:
10
  ---
11
 
12
  # hku-nlp/instructor-xl
13
- This is a general embedding model: It maps sentences & paragraphs to a 768 dimensional dense vector space.
14
- The model was trained on diverse tasks.
15
- It takes customized instructions and text inputs, and generates task-specific embeddings for general purposes, e.g., information retrieval, classification, clustering, etc.
16
- ```
 
 
17
  git clone https://github.com/HKUNLP/instructor-embedding
18
  cd sentence-transformers
19
  pip install -e .
20
  ```
21
- Then you can use the model like this:
 
 
22
  ```python
23
  from sentence_transformers import SentenceTransformer
24
  sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
@@ -26,4 +30,18 @@ instruction = "Represent the Science title; Input:"
26
  model = SentenceTransformer('hku-nlp/instructor-xl')
27
  embeddings = model.encode([[instruction,sentence,0]])
28
  print(embeddings)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ```
 
10
  ---
11
 
12
  # hku-nlp/instructor-xl
13
+ This is a general embedding model: It maps **any** piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time **without further training**. With instructions, the embeddings are **domain-specific** (e.g., specialized for science, finance, etc.) and **task-aware** (e.g., customized for classification, information retrieval, etc.)
14
+
15
+ The model is easy to use with `sentence-transformer` library.
16
+
17
+ ## Installation
18
+ ```bash
19
  git clone https://github.com/HKUNLP/instructor-embedding
20
  cd sentence-transformers
21
  pip install -e .
22
  ```
23
+
24
+ ## Compute your customized embeddings
25
+ Then you can use the model like this to calculate domain-specific and task-aware embeddings:
26
  ```python
27
  from sentence_transformers import SentenceTransformer
28
  sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
 
30
  model = SentenceTransformer('hku-nlp/instructor-xl')
31
  embeddings = model.encode([[instruction,sentence,0]])
32
  print(embeddings)
33
+ ```
34
+
35
+ ## Calculate Sentence similarities
36
+ You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**.
37
+ ```python
38
+ from sklearn.metrics.pairwise import cosine_similarity
39
+ sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
40
+ ['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]
41
+ sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
42
+ ['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]
43
+ embeddings_a = model.encode(sentences_a)
44
+ embeddings_b = model.encode(sentences_b)
45
+ similarities = cosine_similarity(embeddings_a,embeddings_b)
46
+ print(similarities)
47
  ```