File size: 1,669 Bytes
0d2906c 91acbe9 0d2906c 91acbe9 0d2906c 91acbe9 0d2906c 91acbe9 0d2906c 91acbe9 0d2906c 91acbe9 0d2906c 91acbe9 0d2906c 91acbe9 0d2906c 91acbe9 0d2906c 91acbe9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
library_name: peft
base_model: facebook/mcontriever-msmarco
language:
- ko
---
# smartPatent-mContriever-lora
The model is fine-tuned on the customed Korean Patent Retrieval system.
### Training Data
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Two types of datasets are used as training data: queries automatically generated through GPT-4 and patent titles that are linked to existing patent abstracts.
### Usage
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
```python
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification
import torch
from transformers import AutoModel, AutoTokenizer
from peft import PeftModel, PeftConfig
def get_model(peft_model_name):
config = PeftConfig.from_pretrained(peft_model_name)
base_model = AutoModel.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, peft_model_name)
model = model.merge_and_unload()
model.eval()
return model
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('facebook/mcontriever-msmarco')
model = get_model('hanseokOh/smartPatent-mContriever-lora')
```
### Info
- **Developed by:** hanseokOh
- **Model type:** information retriever
- **Language(s) (NLP):** Korean
- **Finetuned from model [optional]:** mContriever-msmarco
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/hanseokOh/PatentSearch |