まえに作ったWeb::Scraperのjavascriptバージョンwebscraper.jsとXPathをてきとうに作ってくれる機能を追加したwebscraperp.jsにHTMLのドキュメントから繰り返し部分をみつけてSITEINFOをつくるAutoPagerize Iteration Detectorみたいなみためをくっつけて、取... 続きを読む
The sbox program encountered an error while processing this request. Please note the time of the error, anything you might have been doing at the time to trigger the problem, and forward the information to this site's Webmaster (webmaster@www... 続きを読む
This is inspired by an email from Renée Bäcker asking how to get content inside javascript tag. Because Web::Scraper's 'TEXT' mapping calls as_text method of HTML::Element, it doesn't get the content inside script and style tag. Here's the co... 続きを読む
The sbox program encountered an error while processing this request. Please note the time of the error, anything you might have been doing at the time to trigger the problem, and forward the information to this site's Webmaster (webmaster@www... 続きを読む
Yet another non-informative, useless blog As seen on TV! Scraping websites is usually pretty boring and annoying, but for some reason it always comes back. Tatsuhiko Miyagawa comes to the rescue! His Web::Scraper makes scraping the web easy a... 続きを読む
Web::Scraper プレゼン@YAPC::EU YAPC::Europe でウィーンにきています。1日目の夕方に Web::Scraper のプレゼンをしました。 時間が20分なのに前半に時間をかけすぎて尻きれトンボになってしまいましたが、いろいろフィードバックをもらえたのでよかったです... 続きを読む
Web::Scraper はいたれりつくせりの仕掛けが仕込んであって、便利ですね。私が、割と良く使っている機能は以下 2 つです。process の第一引数に、CSS セレクタだけでなく、XPath も指定できます。ただし、XPath を指定するときは先頭を必ずスラッシュ(/)で始め... 続きを読む
Today I've been thinking about what to talk in YAPC::EU (and OSCON if they're short of Perl talks, I'm not sure), and came up with a few hours of hacking with web-content scraping module using Domain Specific Languages.Journal of miyagawa (16... 続きを読む