|
--- |
|
language: |
|
- ja |
|
license: cc-by-sa-3.0 |
|
library_name: transformers |
|
tags: |
|
- fastText |
|
- embedding |
|
pipeline_tag: feature-extraction |
|
widget: |
|
- text: "海賊王におれはなる" |
|
example_title: "ワンピース" |
|
|
|
--- |
|
|
|
|
|
# fasttext-jp-embedding |
|
**This model is experimental.** |
|
|
|
Pretrained FastText word vector for Japanese |
|
|
|
## Usage |
|
|
|
Google Colaboratory Example |
|
``` |
|
! apt install aptitude swig > /dev/null |
|
! aptitude install mecab libmecab-dev mecab-ipadic-utf8 git make curl xz-utils file -y > /dev/null |
|
! pip install transformers torch mecab-python3 torchtyping > /dev/null |
|
! ln -s /etc/mecabrc /usr/local/etc/mecabrc |
|
``` |
|
|
|
``` |
|
from transformers import pipeline |
|
import pandas as pd |
|
import numpy as np |
|
|
|
text = "海賊王におれはなる" |
|
|
|
pipeline = pipeline("feature-extraction", model="paulhindemith/fasttext-jp-embedding", revision="2022.11.13", trust_remote_code=True) |
|
pd.DataFrame(np.array(pipeline(text)).T, columns=pipeline.tokenizer.tokenize(text)) |
|
``` |
|
|
|
``` |
|
pipeline.tokenizer.target_hinshi = ["動詞", "名詞", "形容詞"] |
|
pd.DataFrame(np.array(pipeline(text)).T, columns=pipeline.tokenizer.tokenize(text)) |
|
``` |
|
|
|
## License |
|
This model utilizes the folllowing pretrained vectors. |
|
Name: fastText |
|
Credit: https://fasttext.cc/ |
|
License: [Creative Commons Attribution-Share-Alike License 3.0](https://creativecommons.org/licenses/by-sa/3.0/) |
|
Link: https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ja.vec |