[B! ruby][scraping] manabouのブックマーク

manabou id:manabou

rubyとscrapingに関するmanabouのブックマーク (6)

RubyでSeleniumを使ってスクレイピング - Qiita
Deleted articles cannot be recovered. Draft of this article would be also deleted. Are you sure you want to delete this article?
manabou 2015/12/10
ruby

selenium

scrape

scraping
リンク
seleniumでスクレイピング [Ruby] - 酒と泪とRubyとRailsと
RubyからChrome/Safari/IE/Firefoxを操作してスクレイピングする方法として、『Selenium』を使う方法があったので、ちょっと試してみました。今回はその時のメモです。 Rubyからブラウザを操作する方法としては、他にGem『Watir』を使う方法もあるので、よかったらこちらもご検討下さい！
manabou 2015/12/10
selenium

ruby

nokogiri

webdriver

scrape

scraping
リンク
Ruby製の構文解析ツール、Nokogiriの使い方 with Xpath - プログラマでありたい
RubyでHTMLやXMLをパースする構文解析ツールの定番は、Nokogiriです。スクレイピングする際の必需品で、なくてはならないモジュールの１つです。ただ色々なことが出来る反面、どこから取りかかれば良いのか解り難い部分もあります。自習を兼ねて、Nokogiri概要と主要な機能を紹介してみます。 Nokogiriとは何か？ ReademeによるとNokogiriとは、「HTMLとXMLとSAXとXSLTとReaderのパーサー」で、特徴としては、XPathとCSS3セレクター経由で探索する機能を持つことのようです。他にもHTMLやXMLのビルダーの機能を持っていますが、HTMLとXMLのパーサー（構文解析器）と覚えておけばよいでしょう。 Nokogiriのクラス構造 Nokogiriは、なかなか巨大なライブラリです。10以上のモジュールと70以上のクラスで構成されていて、yardでダイア
manabou 2014/04/15
nokogiri

xpath

scraping

ruby

xml
リンク
Ruby, Rails, Web2.0 » Blog Archive » Data Extraction for Web 2.0: Screen Scraping in Ruby/Rails, Episode 1
This article is a follow-up to the quite popular first part on web scraping - well, sort of. The relation is closer to that between Star Wars I and IV - i.e., in chronological order, the 4th comes first. To continue the analogy, probably I am in the same shoes as George Lucas was after creating the original trilogy : the series became immensely popular and there was demand for more - in both quant
manabou 2007/02/05
ruby

scrape

scraping
リンク
Ruby Screen-Scraper in 60 Seconds - igvita.com
By Ilya Grigorik on February 04, 2007 I often find myself trying to automate content extraction from a saved HTML file or a remote server. I've tried a number of approaches over the years, but the dynamic duo of Hpricot and Firebug blew me away - this is by far the fastest way to get what you want without compromising flexibility. Hpricot is an extremely powerful ruby-based HTML parser, and Firebu
manabou 2007/02/05
ruby

hpricot

scrape

scraping

firebug
リンク
Hpricot からテキストを取り出す - nazokingのブログ
scrAPIよりも使いやすい感じのHpricotですが、「innerText」が上手くHTMLエンティティーを戻してくれないので、違うメソッドをつけてみました。 require "rubygems" require 'hpricot' class Hpricot::Elem def [](a) CGI.unescapeHTML(get_attribute(a)) end def to_text r = [] traverse_text{|text| case text when Hpricot::CData r << text.content else r << CGI.unescapeHTML(text.inner_text.gsub("\n"," ").gsub(/ +/," ").strip) end } r.join end end hp = Hpricot('<html><bog
manabou 2007/02/05
ruby

scraping

scrape
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx