asaokitanのブックマーク / 2008年3月11日

asaokitan id:asaokitan

2008年3月11日のブックマーク (6件)

＠IT：PDFファイルからテキストを抽出するには
PDFファイルからテキストを抽出するには、pdftotextコマンドを使用する。pdftotextコマンドは、Xpdf（http://www.foolabs.com/xpdf/）に含まれている。Fedora Core 3（FC3）にはXpdfのパッケージが用意されており、
asaokitan 2008/03/11
Xpdf の pdftotext コマンド
リンク
Converting Word documents to text « Python recipes « ActiveState Code
import fnmatch, os, pythoncom, sys, win32com.client wordapp = win32com.client.gencache.EnsureDispatch("Word.Application") try: for path, dirs, files in os.walk(sys.argv[1]): for doc in [os.path.abspath(os.path.join(path, filename)) for filename in files if fnmatch.fnmatch(filename, '*.doc')]: print "processing %s" % doc wordapp.Documents.Open(doc) docastxt = doc.rstrip('doc') + 'txt' wordapp.Activ
asaokitan 2008/03/11
windows 向け。MS Word 必要
リンク
Pure Python PDF to text converter « Python recipes « ActiveState Code
import pyPdf def getPDFContent(path): content = "" # Load PDF into pyPDF pdf = pyPdf.PdfFileReader(file(path, "rb")) # Iterate pages for i in range(0, pdf.getNumPages()): # Extract text from page and add to content content += pdf.getPage(i).extractText() + "\n" # Collapse whitespace content = " ".join(content.replace("\xa0", " ").strip().split()) return content print getPDFContent("test.pdf")
asaokitan 2008/03/11
リンク
pyPdf
Out of date! This page is no longer updated. Development and maintenance of this project has continued and you can find the most recent information here: https://pypi.org/project/pypdf/. About A Pure-Python library built as a PDF toolkit. It is capable of: extracting document information (title, author, ...), splitting documents page by page, merging documents page by page, cropping pages, merging
asaokitan 2008/03/11
python で pdf を手なずけたい。日本語は
リンク
各ページに共通な部分（ヘッダ、メニューなど）の簡単な作成、一回で更新する方法 --- ネットビジネス便利ツール
各Webページの同じ部分を共通化する PHPを使って共通部分を表示する Webサイトの各ページにある共通のメニュー、このページでいうと左にあるメニュー、各ページに表示するのって面倒だと思いませんか？ ↑これのことです。 2〜3ページならコピー＆ペーストしようかと思いますが、何ページもあるとうんざりします。 1ページずつ新しく作っているときはそこまで面倒だとは思わなくても、一括して変更したいときはある程度のページができているので全てのページを変更するのは本当に大変です。経験がありますが、腱鞘炎になるかと思いました。私と同じようになんとか簡単にできないものかと思っておられる方に、次の方法をお薦めします。この方法を使えば、メニューだけでなく、ヘッダタグやコピーライトなどの共通する部分にも応用できます。メニューを変更するときも、1つのファイルを変更すれば全てのページに即時反映されます
asaokitan 2008/03/11
リンク
いやなブログ - 文字列操作の比較表: Ruby, Python, JavaScript, P...
文字列操作の比較表: Ruby, Python, JavaScript, Perl, C++ Ruby, Python, JavaScript, Perl, C++ の文字列操作の比較表を作りました。配列操作の比較表の続編です。間違いなどがあったらご指摘いただけると助かります。 Ruby (String) Python (str) JavaScript (String) Perl C++ (std::string)
asaokitan 2008/03/11
great
リンク
- 2008年3月12日
- 2008年3月11日
- 2008年3月10日