[B! Python][python][NLP] xefのブックマーク

GitHub - digital-go-jp/kanjikana-model: 氏名漢字カナ突合モデル

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

xef 2025/10/26

リンク

Python自然言語処理テクニック集【基礎編】

自分がよく使用する日本語自然言語処理のテンプレをまとめたものです。主に自分でコピペして使う用にまとめたものですが、みなさんのお役に立てれば幸いです。環境はPython3系、Google Colaboratory（Ubuntu）で動作確認しています。 Pythonの標準機能とpipで容易にインストールできるライブラリに限定しています。機械学習、ディープラーニングは出てきません！テキストデータの前処理が中心です。前処理系大文字小文字日本語のテキストにも英語が出てくることはあるので。 s = "Youmou" print(s.upper()) # YOUMOU print(s.lower()) # youmou 全角半角日本語だとこちらのほうが大事。全角半角変換のライブラリはいくつかありますが、自分はjaconv派。 MIT Licenseで利用可能です。 import jaco

xef 2021/03/30

Python
NLP

リンク

有価証券報告テキストマイニング入門 - 株式会社ホクソエムのブログ

はじめにこんにちは, ホクソエムサポーターのKAZYです｡先日猫カフェデビューをして, 猫アレルギーであることがわかりました🐈｡次はフクロウカフェに挑戦してみようかなと思っています🦉｡ところで皆様, 有価証券報告書は読んでますか？私は読んでいません｡読めません｡眺めていると眠くなります💤｡私は眠くなるんですが, 有価証券報告書ってテキストマイニングするのに向いているんです｡企業の事業や財務情報が詳細に書かれています｡ XBRL形式で構造化されています｡数千社分のテキストが手に入ります｡おまけに無料です｡どうです？興味湧いてきませんか？本記事ではPythonを使って有価証券報告書をテキストマイニングする方法を紹介します｡有価証券報告書をダウンロードするところからご紹介するのでご安心を｡こんな方が見たら役に立つかも企業分析をプログラミングでやりたいが何してい

xef 2020/10/10

Python
NLP

リンク

cutlet: a Japanese to Romaji Converter in Python

A few months ago I released cutlet, a Python library and application for converting arbitrary Japanese text to romaji. Katsu curry illustrated by Irasutoya Update: Check out the online demo for cutlet! You can check the results in your browser. Compared to other libraries cutlet has several advantages: it uses fugashi, so you can re-use your existing dictionary words of foreign origin optionally u

xef 2020/07/12

NLP
Python

リンク

[2003.07082] Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We

xef 2020/03/28

NLP
Python

リンク

長富蓮実ちゃん元ネタ調査ツールを作りました - いはらいふ

はじめに『あら、イタズラな風さんが…ふふっ♪ふたりの仲にジェラシーでしょうか』 SSレアの長富蓮実ちゃん登場です！https://t.co/mIoEjCBQs4 #デレステ pic.twitter.com/loELdD6e6w— スターライトステージ (@imascg_stage) 2020年3月19日蓮実ちゃんお誕生日＆SSRおめでとう。蓮実ちゃんといえば昭和アイドルの歌詞などを元にした発言が多いことで有名ですが、残念なことに平成生まれの自分は昭和アイドルの知識が足りず元ネタが分からないことが多いです。「長富蓮実セリフ＋元ネタまとめ」を更新しました！ ①セリフ情報の更新・グッドラックマリンズ・2019アニバーサリー・2019クリスマス・2020初詣・でれぽ(10/01~01/14) ②各種集計の更新・集計・交流まとめ・楽曲データhttps://t.co/2jg5g

xef 2020/03/24

Python
NLP

リンク

spaCyを使ってルールベースの記述をシンプルに！ - Qiita

この記事は自然言語処理アドベントカレンダー 2019の12日目です。昨今自然言語処理界隈ではBERTを始めとする深層学習ベースの手法が注目されています。一方それらのモデルは計算リソースや推論速度の観点で制約が大きく、プロダクション運用の際は留意すべき事項を多く持ちます。（googleが検索にBERTを導入というニュースを見た時はとても驚きました）そこで本記事では自然言語処理タスクのシンプルかつ運用しやすい実装方法を考えていきます。実装にはpythonと以降説明するspaCyとGiNZAの2つのライブラリを使います。環境: ubuntu18.04 python 3.6.8 ライブラリインストールはpipから行います今回行うタスク実務で需要が多いと思われる以下の2タスクを取り上げます。固有表現抽出フレーズ抽出固有表現抽出とは固有表現抽出(NER)をWikipediaから

xef 2019/12/16

リンク

汎用言語表現モデルBERTを日本語で動かす(PyTorch) - Qiita

Deleted articles cannot be recovered. Draft of this article would be also deleted. Are you sure you want to delete this article? 今DL for NLP界で、BERTというモデルが話題です。PyTorchによる実装が公開されていたので、日本語Wikipediaコーパスに適用してみました。コードはこちらに公開しております。 2018/11/27 作成したBERTのモデルを使って内部動作の観察とその考察を行いました。単語の潜在表現獲得の部分で感動的な結果を見せてくれました。ご興味あればご覧ください↓ https://qiita.com/Kosuke-Szk/it ems/d49e2127bf95a1a8e19f この記事ではBERTのポイントの解説と、ポイントごとの実

xef 2018/11/08

リンク

好きな品詞の組み合わせのフレーズを抜き出すPythonパッケージ「negima」を作った - ぴよぴよ.py

日本語の自然言語処理が絡んだ作業をする際に、名詞だけ抜き出したい名詞だけ抜き出したいが、接頭詞の「未」「非」とかもくっつけて抜き出したい形容詞を抜き出したいが、否定の「ない」もくっつけて抜き出したいみたいに形態素解析をしたあとに形態素同士をつなげてフレーズの抽出をしたいシチュエーションがよくあると思う。そういった特定の品詞の組み合わせをルールを定義することで、フレーズを抜き出せるPythonパッケージ「negima」を作った。概要例えば複合名詞を抽出したい場合、このようなルールを定義する。 id min max pos0 pos1 pos2 pos3 pos4 pos5 nouns 0 2 接頭詞 1 4 名詞一般|サ変接続|数 0 2 名詞接尾このルールをnoun.csvとしてファイルに定義したとすると、 0個以上2個以下の接頭詞ではじまり、 1個以上4個以下の名詞(

xef 2018/08/20

Python
NLP

リンク

Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.o

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

xef 2018/03/03

Python
NLP

リンク

GitHub - Kyubyong/neural_japanese_transliterator: Can neural networks transliterate Romaji into Japanese correctly?

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

xef 2017/06/12

リンク

GitHub - miso-belica/sumy: Module for automatic summarization of text documents and HTML pages.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

xef 2016/02/27

NLP
Python

リンク

GitHub - ryankiros/skip-thoughts: Sent2Vec encoder and training code from the paper "Skip-Thought Vectors"

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

xef 2015/07/08

リンク

Monthly Challenge: Natural Language Processing - The Engine Room - TrackMaven

Our topic for this month's Monthly Challenge meetup is NLP! In this post, we'll get you started with one possibility: using pandas and Python's Natural Language Toolkit to analyze the contents your own Gmail inbox. For those of you who are continuing projects from our last monthly challenge on Elasticsearch, we'll also include some code to make use of Elasticsearch as well at the end of the post.

xef 2014/12/05

Python
NLP

リンク

NLUlite — A natural language database

NLUlite is a database that reads english texts and answers questions about the texts. A public alpha for developers This framework is released as a public alpha for develpers, useful for simple english texts or feeds Licensing NLUlite is a client/server application. The client is open source (BSD license), while the server is only available as a closed source option.

xef 2014/09/06

NLP
Python

リンク

Parsing English with 500 lines of Python

A syntactic parser describes a sentence’s grammatical structure, to help another application reason about it. Natural languages introduce many unexpected ambiguities, which our world-knowledge immediately filters out. A favourite example: They ate the pizza with anchovies A correct parse links “with” to “pizza”, while an incorrect parse links “with” to “eat”: The Natural Language Processing (NLP)

xef 2014/04/28

Python
NLP

リンク

Jupyter Notebook Viewer

xef 2014/03/30

リンク

How to Write a Language Detector in 50 Lines of Python | Ebook Glue Blog

Ever wonder how Google Chrome knows the language of a web page and offers to translate it when the page is written in a foreign language? Or how Facebook offers to translate your friends’ posts in a foreign language? Detecting languages is surprisingly easy, and it can be used to improve user interfaces without having the user do any work. I stumbled across this ActiveState recipe for a language d

xef 2014/01/02

Python
NLP

リンク

word2vec in yhat: Word vector similarity | Daniel Rodriguez

A few weeks ago Google released some code to convert words to vectors called word2vec. The company I am currently working on does something similar and I was quite amazed by the performance and accuracy of Google's algorithm so I created a simple python wrapper to call the C code for training and read the training vectors into numpy arrays, you can check it out on pypi (word2vec). At the same time

xef 2013/10/01

Python
NLP

リンク

Tutorial: What is WordNet? A Conceptual Introduction Using Python | stevenloria.com

Tutorial: What is WordNet? A Conceptual Introduction Using Python In short, WordNet is a database of English words that are linked together by their semantic relationships. It is like a supercharged dictionary/thesaurus with a graph structure. TextBlob 0.7 (changelog) now integrates NLTK's WordNet interface, making it very simple to interact with WordNet. This tutorial is a gentle introduction to

xef 2013/10/01

Python
NLP

リンク

はてなブックマーク

タグ

関連タグで絞り込む (0)

PythonとpythonとNLPに関するxefのブックマーク (28)

お知らせ

今週のはてなブックマーク数ランキング（2025年10月第4週）

今週のはてなブックマーク数ランキング（2025年10月第3週）

今週のはてなブックマーク数ランキング（2025年10月第2週）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス