In preparation for my PyCon talk on HTML I thought I’d do a performance comparison of several parsers and document models. The situation is a little complex because there’s different steps in handling HTML: Parse the HTML Parse it into something (a document object) Serialize it Some libraries handle 1, some handle 2, some handle 1, 2, 3, etc. For instance, ElementSoup uses ElementTree as a documen