[B! python][xml] edvakfのブックマーク

lxmlで日本語のWebページのタイトルを取得する - Pyro Memo

日本語が化けて大変苦労したのでメモ。結論として、XML（またはHTML）を解析する前にunicode関数に通しておく、ということで良いのかな？相変わらず文字コード関連はよく分からない。 from urllib import urlopen from lxml import etree html = urlopen("http://b.hatena.ne.jp") charset = html.headers.getparam('charset') html_data = unicode(html.read(),charset) et = etree.fromstring(html_data, parser=etree.HTMLParser()) title_element = et.xpath("./head/title")[0] title = title_element.text.e

edvakf 2009/09/14

python
xml

リンク

IBM Developer

IBM Developer is your one-stop location for getting hands-on training and learning in-demand skills on relevant techno logies such as generative AI, data science, AI, and open source.

edvakf 2009/09/14

python
xml

リンク

lxml

Introduction lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique in that it combines the speed and feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. See the introduction for more information about background and goals. Some common questions are answered in the FAQ. This pa