[B! OCR] kzfmのブックマーク

MOONGIFT: » 待ちに待ったオープンソースの日本語OCR「NHocr」:オープンソースを毎日紹介

OCRという技術はアナログなデータをデジタル化する上で欠かすことができない。しかし様々な特許が絡み、オープンソースやフリーウェアとしては発展しづらい分野でもある。しかしそこに風穴を開けられるかも知れない技術が登場しそうだ。デモサービスで試せます今回紹介するオープンソース・ソフトウェアはNHocr、日本語OCRシステムだ。Google Code上にホスティングされ、まだソースコードは一部しか開示されていないが、デモサービスは公開されている。デモサービスでは、BMP/JPEG/PBM/PGM/PPMのファイル（さらに各ファイルをGZip圧縮していても可能）をアップロードすると、それを解析した結果を日本語表示してくれる。日本語OCRとあって、漢字/ひらがな/片仮名/英語などが判別可能になっている。読み取らせた画像手書き文字であっても認識率はそこそこ高い。正式リリースがまだという段階にあ

kzfm 2008/09/12

OCR

リンク

GitHub - ocropus-archive/DUP-ocropy: Python-based tools for document analysis and OCR

OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do some image preprocessing, and possibly also train new models. In addition to the recognition scripts themselves, there are a number of scripts for ground truth editing and correction, measuring error rates, determining confusion matrices, etc. OCRopus command

kzfm 2008/08/20

OCR

リンク

tesseract-ocr - Google Code

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

kzfm 2008/07/09

OCR

リンク

Linux でも OCR を使うぞ！

イントロ OCRとは，Optical Character Recognition の略．「光学文字認識」と訳されている．スキャナで読み取るデータは，基本的に画像イメージなので，たとえオプション機能でPDF に変換してくれるソフト付きのスキャナであっても，データとしては，画像イメージのPDF化でしかないことがある．最近では，このようなソフトウエアにOCR 機能を持たせているものもあり，画像イメージから文字認識をして，文字コードに変換してくれる．こうすれば，文字から成る文書として，全文検索の対象になる．元来，文字認識は，パターン認識(Pattern Recognition)の一種で，コンピュータ処理としては，かなり高度なものに属する．以前は，Omnipageとか， Recognita のような優秀だが非常に高価なソフトウエアと高分解能スキャナが前提とされていた．しかし，近年

kzfm 2008/06/24

OCR

リンク

Ocrad - GNU Project - Free Software Foundation (FSF)

Ocrad - The GNU OCR [ English | Español ] Introduction GNU Ocrad is an OCR (Optical Character Recognition) program and library based on a feature extraction method. It reads images in png or pnm formats and produces text in byte (8-bit) or UTF-8 formats. The formats pbm (bit map), pgm (greyscale), and ppm (color) are collectively known as pnm. Ocrad includes a layout analyser able to separate the c

kzfm 2008/06/24

OCR

リンク

はてなブックマーク

タグ

関連タグで絞り込む (0)

OCRに関するkzfmのブックマーク (5)

お知らせ

今週のはてなブックマーク数ランキング（2024年5月第3週）

今週のはてなブックマーク数ランキング（2024年5月第2週）

今週のはてなブックマーク数ランキング（2024年5月第1週）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス