タグ

ブックマーク / tika.apache.org (1)

  • Apache Tika – Apache Tika

    Apache Tika - a content analysis toolkit The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more. You can find the latest release on the download page. Please see

    nilab
    nilab 2010/11/15
    Apache Tika : a content analysis toolkit : Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how
  • 1