somemoのブックマーク / 2020年11月2日

Google Colab

Sign in

somemo 2020/11/02

リンク

Common Crawl - Open Repository of Web Crawl Data

Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.Common Crawl is a 501(c)(3) non–profit founded in 2007. ‍ We make wholesale extraction, transf ormation and analysis of open web data accessible to researchers.Overview Over 250 billion pages spanning 15 years.Free and open corpus since 2007.Cited in over 10,000 research papers.3–5 billion new pages added ea

somemo 2020/11/02

https://commoncrawl.org/terms-of-use/

リンク

JWT の最新ベストプラクティスに関するドラフトを読み解く

IETF の OAuth Working Groupは、アイデンティティ分野における標準の作成と改良に熱心に取り組んでいます。この記事では JSON Web Token (JWT) の最新ベストプラクティスについて書かれた直近のドラフトについて取り上げます。対象のドラフトでは、JWT の使用に際して陥りがちな落とし穴や、よく見られる攻撃方法に加えて、そうした問題に対する軽減策の実施方法を紹介していますので、ぜひご一読ください。 "JWT を標的とする特に一般的な攻撃方法と、具体的な保護対策が紹介されています" はじめにJSON Web Token (JWT) 仕様は、2 者間でのクレーム (属性情報) の伝送を目的とした、JSON ベースの形式について規定したオープン標準 (RFC 7519)です。 JWT を補完する標準として、JSON Web Key (RFC 7517), JSON

somemo 2020/11/02

リンク

RFC 8725: JSON Web Token Best Current Practices

RFC 8725 JSON Web Token Best Current Practices Abstract JSON Web Tokens, also known as JWTs, are URL-safe JSON-based security tokens that contain a set of claims that can be signed and/or encrypted. JWTs are being widely used and deployed as a simple security token format in numerous protocols and applications, both in the area of digital identity and in other application areas. This Best Current

somemo 2020/11/02

リンク

RFC 8725 JSON Web Token Best Current Practices をざっくり解説する - Qiita

ritou です。今回は RFC 8725 JSON Web Token Best Current Practices を紹介します。みんな大好き JWT (JSON Web Token) の BCP ときたらチェックせずにはいられないでしょう。概要 JWTは署名/暗号化が可能な一連のクレームを含む、URLセーフなJSONベースのセキュリティトークンです JWTは、デジタルアイデンティティの分野および他のアプリケーション分野の両方の多数のプロトコルおよびアプリケーションにて、シンプルなセキュリティトークンフォーマットとして広く使用/展開されていますこのBCPの目的は、JWTの確実な導入と展開につながる実行可能なガイダンスを提供することですということで、何かのフレームワークでもプロトコルでもなければJWTを使ったユースケース考えたよって話でもなく、JWTを導入する上で基本的な部

somemo 2020/11/02

リンク

Summary of the tokenizers

On this page, we will have a closer look at tokenization. As we saw in the preprocessing tutorial, tokenizing a text is splitting it into words or subwords, which then are converted to ids through a look-up table. Converting words or subwords to ids is straightforward, so in this summary, we will focus on splitting a text into words or subwords (i.e. tokenizing a text). More specifically, we will

somemo 2020/11/02

リンク

Huggingface Transformers 入門 (8) - トークナイザー｜npaka

1. トークナイザー「トークナイザー」は、「テキスト」を「トークン」に分割し、それを「ID」に変換する機能を提供します。「テキスト」はそのままではニューラルネットワークで処理できないため、IDに変換する必要があります。 2. トークン化の方法テキストのトークン化は見た目以上に大変な作業で、トークン化の方法は複数あります。・単語・文字・サブワード2-1. 単語によるトークン化◎ スペースによるトークン化一番簡単なトークン化の方法は、「スペースによるトークン化」です。 "Don’t you love 🤗 Transf ormers? We sure do." ↓ ["Don't", "you", "love", "🤗", "Transf ormers?", "We", "sure", "do."] これは良い第1歩ですが、"Transf ormers? " や "do. " というトーク

somemo 2020/11/02

リンク

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

AbstractThis paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models dir

somemo 2020/11/02

リンク

What's the difference between wordpiece and sentencepiece? · Issue #339 · google/sentencepiece

somemo 2020/11/02

“WordPiece is the closed source version (Google internal) used for training BERT. You can find the exact comparison between SentencePiece, WordPiece, and subword-nmt in the Comparisons with other implementations ”

リンク

Google Colab

Sign in

somemo 2020/11/02

リンク

GitHub - arXivTimes/arXivTimes: repository to research & share the machine learning articles

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

somemo 2020/11/02

リンク

Leading NLP Ninja • A podcast on Spotify for Podcasters

Leading NLP Ninjaでは最近のNLP (Natural Language Processing）に関連する論文をjojonkiが短く紹介します．気になったこと・質問・間違い等，フィードバック頂けると嬉しいです．紹介する論文は，基本的に下記の論文まとめから取り上げる予定です． github.com/jojonki/arXivNotes/issues

somemo 2020/11/02

リンク

はじめての自然言語処理 T5 によるテキスト生成の検証 | オブジェクトの広場

前回はテキストマイニングの手法と OSS を用いた実践について紹介しました。今回は、Google の T5(Text-to-Text Transfer Transf ormer) によるテキスト生成について、学習や推論のコード例と実験結果を交えてご紹介します。 1. はじめに本記事では Google の T5(Text-to-Text Transfer Transf ormer) 1によるテキスト生成について、学習や推論のコード例と実験結果を交えてご紹介します。実験としては livedoor ニュースコーパス2での文章分類、やさしい日本語コーパス3及びやさしい日本語拡張コーパス4を用いたやさしい日本語変換を行いました。今回も Google Colaboratory で動かすことを想定したコードスニペットを入れていきますので、実際に動かしたり対象を変えてみたりして試して頂けると良いかと思います

somemo 2020/11/02

リンク

What’s new in h5py 3.0 — h5py 3.11.0 documentation

somemo 2020/11/02

リンク

What’s new in 1.1.4 (October 30, 2020) — pandas 2.2.2 documentation

somemo 2020/11/02

リンク

はてなブックマーク

タグ

2020年11月2日のブックマーク (15件)

Google Colab

Common Crawl - Open Repository of Web Crawl Data

JWT の最新ベストプラクティスに関するドラフトを読み解く

RFC 8725: JSON Web Token Best Current Practices

RFC 8725 JSON Web Token Best Current Practices をざっくり解説する - Qiita

Summary of the tokenizers

Huggingface Transformers 入門 (8) - トークナイザー｜npaka

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

What's the difference between wordpiece and sentencepiece? · Issue #339 · google/sentencepiece

Google Colab

GitHub - arXivTimes/arXivTimes: repository to research & share the machine learning articles

Leading NLP Ninja • A podcast on Spotify for Podcasters

はじめての自然言語処理 T5 によるテキスト生成の検証 | オブジェクトの広場

What’s new in h5py 3.0 — h5py 3.11.0 documentation

What’s new in 1.1.4 (October 30, 2020) — pandas 2.2.2 documentation

お知らせ

月間はてなブックマーク数ランキング（2024年7月）

今週のはてなブックマーク数ランキング（2024年7月第4週）

今週のはてなブックマーク数ランキング（2024年7月第3週）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス