While working on a side project, the need to parse HTML came up, and to save time, I tried the fastest HTML parsers I could find. After fighting and trying to hack them, I realized I needed a custom or super customizable one to fit all project needs. Unfortunately, I had no luck. So, I created one.
I thought it was a simple enough thing to do…
For my specific project, I needed something fast, which was easy to find, but I needed to be customizable enough. However, everything I found mainly failed in two areas:
- They offered no way to tap into nodes while they were being parsed — That’s something I desperately needed.
- They offered no ability to specify custom API for the parsed result, forcing me to learn something new they came up with or remain stuck with really non-performant APIs. — This ability would allow me to adapt the parser to the project, not vice versa.
Here is a list of best parsers I tried: html-parser, htmljs-parser (good callback options), html-dom-parser, html5parser, cheerio (really good offering more than just a parsing solution), parse5 (as good as cheerio), htmlparser2, htmlparser, node-html-parser (really fast)
Is my parser better? (disclaimer)
I am not claiming my parser is better than any of the above or that everyone should use it instead. I created a parser specifically to solve a problem I was having. I don’t believe in single solutions, and one should always try to find the best tool for the job— or create one if necessary.