You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular
![RegEx match open tags except XHTML self-contained tags](https://cdn-ak-scissors.b.st-hatena.com/image/square/98d6f053a97a87156775f60757c60865d0f2c47d/height=288;version=1;width=512/https%3A%2F%2Fcdn.sstatic.net%2FSites%2Fstackoverflow%2FImg%2Fapple-touch-icon%402.png%3Fv%3D73d79a89bded)