Written March 10, 2012 updated: April 16, 2012 Introduction Let's make a concurrent web scraper! We will use Haskell, because it allows easy concurrency. We will use the HXT library to do the scraping. If you want to follow the HXT bits, you should be comfortable with Arrows in Haskell. If you're not, take a moment to read up on Arrows. If you don't care about the scraping bits, jump straight to t
{-# LANGUAGE OverloadedStrings, QuasiQuotes #-} import Control.Applicative import Control.Monad import qualified Data.ByteString.Lazy.Char8 as B import Network.HTTP.Conduit import System.Cmd import System.Environment import System.Process.QQ import Text.HTML.TagSoup import Text.HTML.TagSoup.Tree import Text.Printf import Text.Regex.TDFA baseUrl = "http://dumps.wikimedia.org/" extractLinks url rege
Shpider is a web automation library for Haskell. It allows you to quickly write crawlers, and for simple cases ( like following links ) even without reading the page source. It has useful features such as turning relative links from a page into absolute links, options to authorize transactions only on a given domain, and the option to only download html documents. It also provides a nice syntax fo
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く