[B! unicode] milk1000ccのブックマーク

Regexp

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

milk1000cc 2017/10/19

リンク

Unicode Normalization in Ruby

If you want Ruby's string methods to play nicely with Unicode, it's a good idea to normalize them. This article is a brief introduction to Unicode normalization for Rubyists. I recently published an article in which I tested most of Ruby's string methods with certain Unicode characters to see if they would behave unexpectedly. Many of them did. One criticism that a few people had of the article wa

milk1000cc 2017/04/17

リンク

「ユニコード」で予期せぬ目に遭った話 - moriyoshiの日記

自分の知らないCJK Ideographのバリエーションがまだあったことに戦慄している pic.twitter.com/kUlyRLDDTM— moriyoshit (@moriyoshit) March 9, 2017 などというツイートをしたところ、思ったより反響があったのでまとめておく。上記ではあいまいに「バリエーション」などと書いたが、Unicodeとそれを扱う環境においては、バリエーションと一口に言っても次のような状況がある。意味論的に等価な異なる字形の集合同じ字形で異なるコードポイントの集合 aは結構なじみ深いと思う。 a-1. 異なるコードポイントにそれぞれ異なる字形が割り当てられているもの例: 「東」(U+6771) ⇔「东」(U+4E1C) 「斉」(U+6589) ⇔「齊」(U+9F4A) 「高」(U+9AD8) ⇔「髙」(U+9AD9) a-2. 同じコードポイ

milk1000cc 2017/03/13

unicode

リンク

Regex Tutorial - Unicode Characters and Properties

Unicode is a character set that aims to define all characters and glyphs from all human languages, living and dead. With more and more software being required to support multiple languages, or even just any language, Unicode has been strongly gaining popularity in recent years. Using different character sets for different languages is simply too cumbersome for programmers and users. Unfortunately,

milk1000cc 2015/03/16

unicode

リンク

東アジアの文字幅 (East Asian Width) の判定 - 中途

Unicodeの文字が全角で表示されるか半角で表示されるかは東アジアの文字幅特性値がヒントを与えてくれるそうです。（日本語の場合は）この値がNa（狭）、N（中立）、H（半角）だと半角、W（広）、F（全角）、A（曖昧）だと全角として扱うことが推奨されているようです。 Pythonではunicodedataモジュールを使うとこの特性値を取得できますが、JavaScriptにはそのような関数は見当たりません。ですが、Unicode Consortiumが、どの文字がどの東アジアの文字幅を持つかのデータファイルを公開しているので、そこから判定用のコードを機械的に生成できるはずです。で、以下が実際に生成したコードです。データファイルに、データファイルに出現しない文字はNとなるとあるので、以下ではN以外（F、H、W、Na、A）についてのみチェックを行い、それ以外をNと判定するようにしています。コメ

milk1000cc 2011/11/28

リンク

はてなブックマーク

タグ

関連タグで絞り込む (6)

unicodeに関するmilk1000ccのブックマーク (5)

お知らせ

今週のはてなブックマーク数ランキング（2024年7月第2週）

はてなブックマーク透明性レポート（2024年 2月-2024年4月）

今週のはてなブックマーク数ランキング（2024年7月第1週）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス