[B! NLP] teddy-gのブックマーク

teddy-g id:teddy-g

NLPに関するteddy-gのブックマーク (32)

How vector similarity search works
teddy-g 2024/02/26
Vectorstoreの検索方法色々についての纏め。備忘。

AI

generativeAI

vectorstore

LLM

NLP

LangChain

searchengine

embeddings
リンク
Building LLM-Powered Web Apps with Client-Side Technology
teddy-g 2024/01/30
クライアント側のJavascriptでLLM／チャットを動かしてみた、的な。ローカルでOllamaが動いてないとデモサイトも動かない。Transformer.jsとVoyをもうちょっと調べねば。

NLP

LLM

LangChain

chatGPT

openAI

javascript

JavaScript

tips

hacks
リンク
Google ColabにMeCabとipadic-NEologdをインストールする - Qiita
1.はじめに Google Colab に MeCab と ipadic-NEologd をインストールしようと思ったら意外に手間取ったので備忘録として残します。 2.コード色々なWeb情報を漁った結果、インストールには下記のコードがベストではないかと思います。 # 形態素分析ライブラリーMeCab と辞書(mecab-ipadic-NEologd)のインストール !apt-get -q -y install sudo file mecab libmecab-dev mecab-ipadic-utf8 git curl python-mecab > /dev/null !git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git > /dev/null !echo yes | mecab-ipadic-
teddy-g 2024/01/06
Google ColabにMeCabとipadic-NEologdをインストールする方法…なんだがうまくいかない。Default Pathがそもそも空の様子。シンボリックリンクだけではダメなのでちょっと手を考える必要あり。

GoogleColab

python

python3

mecab

NLP

jupyter
リンク
openai-community/gpt2 · Hugging Face
GPT-2 Test the whole generation capabilities here: https://transf ormer.huggingface.co/doc/gpt2-large Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in this paper and first released at this page. Disclaimer: The team releasing GPT-2 also wrote a model card for their model. Content from this model card has been written by the Hugging Face tea
teddy-g 2021/10/31
Hugging FaceのGPT2の使い方説明。備忘。Hugging Face自体備忘。

datascience

machine learning

machinelearning

NaturalLanguage

NLP

gpt

gpt-2

gpt-3

tips
リンク
GitHub - tanreinama/Japanese-BPEEncoder: Japanese-BPEEncoder
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
teddy-g 2021/10/31
GPT2使うときとかに必要。備忘。

datascience

machine learning

machinelearning

NaturalLanguage

NLP

gpt

gpt-2

gpt-3

tips
リンク
gpt2-japaneseの使い方 (2) - GPT-2のファインチューニング｜npaka
「gpt2-japanese」の「smallモデル」と「ファインチューニングのコード」が公開されたので、日本語によるGPT-2のファインチューニングを試してみました。前回 (1) Google Colabのノートブックを開く。 (2) メニュー「編集→ノートブック→ハードウェアアクセラレータ」で「GPU」を選択。 (3) 以下のコマンドで、「gpt2-japanese」をインストール。 # gpt2-japaneseのインストール !git clone https://github.com/tanreinama/gpt2-japanese %cd gpt2-japanese !pip uninstall tensorflow -y !pip install -r requirements.txt2. モデルのダウンロード「smallモデル」を「gpt2-japanese」フォルダにダウン
teddy-g 2021/10/31
GPT2で日本語生成するときのTIPS等々。備忘。

datascience

machine learning

machinelearning

NaturalLanguage

NLP

gpt

gpt-2

gpt-3

tips
リンク
DeepL APIをPythonから利用して日本語の文章を翻訳する - deepblue
はじめにこの記事ではDeepL API(DeepL Pro)を使って日本語を翻訳を試してみたいと思います。最近DeepLの翻訳の精度が良いと話題になっていましたが、これまでAPIの利用は日本語には対応してきませんでした。ところがDeepL社の2020月6月16日のプレスリリースで日本語対応したとの発表がありました。そこで早速DeepL APIから日本語の翻訳を試してみます。 DeepL APIについて DeepL APIは有料で公式サイトの右上のメニューからDeepL Proを選んで登録することが出来ます。 DeepL Proには「個人向け」「チーム向け」「開発者向け」と3つのタイプがありますが、DeepL APIが利用できるのは一番右の「開発者向け」です。 DeepL APIの料金体系は、現在の基本料金は月々¥630です。翻訳済みの文字数は、1,000,000文字につき ¥2,50
teddy-g 2021/08/30
DeepLが笑っちゃうくらい簡単にPythonから使える件

python

python3

machine learning

machinelearning

DeepL

NaturalLanguage

NLP
リンク
pycld2
teddy-g 2021/02/22
Pythonで言語識別をしたくなったらコレ。割と精度は良いが、ときどき聞いたこともない謎の言語と判定される。

python

NLP

NaturalLanguage
リンク
Rule-based Matcher Explorer · Explosion
teddy-g 2020/08/16
Matcher或いはEntityRulerのパターンを自動で作って検証できるやつ。

python

spaCy

NLP

token

tokenize

tips
リンク
EntityRulerを使って深層学習ベースのNERにルールを追加[sciSpacy] | VasteeLab
本記事では、Spacyにおける標準のNER(en_core_sci_sm)に、ルールを追加する方法について紹介する。これができると、NERの結果が少し物足りないときにルールで微調整することができるため、覚えておくと便利だと思う。まず、NERをあてるための前処理を行う。ここでは、nlpという名前でNERモデルを読み込むところまでを行っている。 import spacy from spacy.pipeline import EntityRuler nlp = spacy.load("en_core_sci_sm") patterns = [{"label": "ORG", "pattern": "Jeffrey Hinton"}, {"label": "ORG", "pattern": "University of Toronto"}, {"label": "ORG", "pattern":
teddy-g 2020/08/16
spaCyでEntiytRulerを使って固有名詞を使う際は、初期化の際にoverwrite_ents=Trueをしないと上書きされない。人名、社名、ブランド名、製品名等を追加するときには覚えておく必要あり。

python

spaCy

token

tokenize

tips

NLP
リンク
Setting up text preprocessing pipeline using scikit-learn and spaCy
teddy-g 2020/07/11
NLTKとspaCyを使ったtokenizationのTips。Stop Words、emoticon、HTMLタグ、punctuationの対応も書いてあり親切。

NLP

python

spaCy

nltk

scikit-learn

datascience

machine learning

machinelearning
リンク
Linguistic Features · spaCy Usage Documentation
GuidesGet startedInstallationModels & LanguagesFacts & FiguresspaCy 101New in v3.7New in v3.6New in v3.5GuidesLinguistic FeaturesPOS TaggingMorphologyLemmatizationDependency ParseNamed EntitiesEntity LinkingTokenizationMerging & SplittingSentence SegmentationMappings & ExceptionsVectors & SimilarityLanguage DataRule-based MatchingProcessing PipelinesEmbeddings & Transf ormersLarge Language Modelsne
teddy-g 2020/07/05
Similarity計算するときにはen_core_web_lg入れなさいって話。

spaCy

python

machine learning

machinelearning

NLP

NaturalLanguage

datascience
リンク
Classify Text Using spaCy – Dataquest
teddy-g 2020/07/05
spaCyを使ったNLPについての簡単な説明。ストップワードの設定を知りたくて調べた。

python

spaCy

datascience

machinelearning

machine learning

NLP

NaturalLanguage
リンク
GitHub - atefm/pDMM: Python implemetation for Dirichlet Multinomial Mixture (DMM) model
teddy-g 2020/06/02
BTM同様、短い文章を対象にトピック分析したい場合の手法、DMMのPython実装。

datascience

NLP

NaturalLanguage

lda

shorttext
リンク
biterm
teddy-g 2020/06/02
短い文章に対しトピック分析を行いたい場合の手法の1つ、BTMのPython実装。

datascience

NLP

NaturalLanguage

lda

python

shorttext
リンク
自然言語処理における自己相互情報量 (Pointwise Mutual Information, PMI)
自己相互情報量とは, 2つの事象の間の関連度合いを測る尺度である(負から正までの値をとる). 自然言語処理では自己相互情報量が相互情報量と呼ばれることがある. しかし, 情報理論で定義される相互情報量(後述する)とは全く異なるため, 自己相互情報量と呼ぶのが賢明である. 自然言語処理に関する本や論文では略称のPMIがよく用いられる. PMIの定義確率変数のある実現値xと, 別の確率変数のある実現値yに対して, 自己相互情報量PMI(x, y)は, $PMI(x, y) = \log_2\frac{P(x, y)}{P(x)P(y)}$ ・・・(1) と定義され, 値が大きければ大きいほどxとyの関連している度合いが強い. PMIが正の値の場合 $P(x, y) > P(x)P(y)$ ⇒ $PMI(x, y) > 0$ xとyが一緒に出現しやすい. (独立よりも)共起しやすい傾向にある.
teddy-g 2020/05/28
PMIは単語の共起確率を計算する。LDAの精度を測るCoherenceの計算法の1つでもある。

datascience

machinelearning

NaturalLanguage

NLP
リンク
Tutorial: Quickstart — TextBlob 0.18.0.post0 documentation
teddy-g 2020/03/20
TextBlobのドキュメント。

NLP

python

TextBlob
リンク
TextBlob and Sentiment Analysis — Python
teddy-g 2020/03/20
TextBlobを使うと文章の感情分析ができるよ！好悪と主観性の2軸でアウトプットが出るよ！検索結果のカテゴライズとかに使えるかと思ったが割と難しい。

NLP

python

TextBlob
リンク
A Complete Exploratory Data Analysis and Visualization for Text Data
Visually representing the content of a text document is one of the most important tasks in the field of text mining. As a data scientist or NLP specialist, not only we explore the content of documents from different aspects and at different levels of details, but also we summarize a single document, show the words and topics, detect events, and create storylines. However, there are some gaps betwe
teddy-g 2020/03/01
TextBlobを使って英文を解析し、Positive/Negative感情を解析する。割と面白い。

NLP

python

python3

sentimentanalysis

TextBlob
リンク
Knowledge Graph: Data Science Technique to Mine Information from Text (with Python code)
Knowledge Graph: Data Science Technique to Mine Information from Text (with Python code) Introduction Examine doable tactics for reducing tension, increasing self-assurance, and cultivating wholesome relationships. Discover how to employ continuous learning, mindfulness, goal-setting, and knowledge graph python to help you reach your objectives. Whether your objective is greater purpose, job succe
teddy-g 2020/03/01
多言語形態素解析ライブラリのspaCyを使って文章の主語(S)、目的語(O)、述語(R)を解析してグラフ化。なかなか面白いが結構わけわからん結果になる。

NLP

python

python3

spaCy

graph
リンク
1 2 次のページ

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx