Introduction HtmlCleaner is an open source HTML parser written in Java. HTML found on the Web is usually dirty, ill-formed and unsuitable for further processing. For any serious consumption of such documents, it is necessary to first clean up the mess and bring some order to the tags, attributes and ordinary text. For any given HTML document, HtmlCleaner reorders individual elements and produces w