Converting HTML text into a data object A webpage is just a text file in HTML format. And HTML-formatted text is ultimately just text. So, let's write our own HTML from scratch, without worrying yet about "the Web": htmltxt = "<p>Hello World</p>" The point of HTML-parsing is to be able to efficiently extract the text values in an HTML document – e.g. Hello World – apart from the HTML markup – e.g.