Member-only story

How to Build a Fast HTML Parser Using Regex and TypeScript

Elson TC
10 min readNov 13, 2023

Photo by Jay Zhang on Unsplash

While working on a side project, the need to parse HTML came up, and to save time, I tried the fastest HTML parsers I could find. After fighting and trying to hack them, I realized I needed a custom or super customizable one to fit all project needs. Unfortunately, I had no luck. So, I created one.

I thought it was a simple enough thing to do…

The Motivation

For my specific project, I needed something fast, which was easy to find, but I needed to be customizable enough. However, everything I found mainly failed in two areas:

  • They offered no way to tap into nodes while they were being parsed — That’s something I desperately needed.
  • They offered no ability to specify custom API for the parsed result, forcing me to learn something new they came up with or remain stuck with really non-performant APIs. — This ability would allow me to adapt the parser to the project, not vice versa.

Some offer customizations that often come with performance loss — I wanted both performance and customization. Additionally, I needed it to work in any JavaScript runtime environment, and because I was going to use it in a client library, it needed to be light.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Elson TC
Elson TC

Written by Elson TC

Software Engineer sharing knowledge, experience, and perspective from an employee and personal point of view.

No responses yet

Write a response