It’s my pleasure to announce that today, Postlight is open-sourcing the Mercury Web Parser.
Mercury Parser allows for better reading experiences, easier content migration, and endless opportunities for remixing the web, by making semantic sense out of any article. Mercury Parser sees web pages the same way you do: It sees titles, content, authors, and lead images, and makes all of that extracted data easily available to your software, which, unfortunately, sees only a sea of HTML markup, where page navigation, advertising, and the like are indistinguishable from content.
Get Mercury Parser for use in your projects on GitHub:
📜 Extracting content from the chaos of the web. Contribute to postlight/mercury-parser development by creating an account on GitHub.
Try Mercury Parser
Wanna see Mercury Parser in action in your own command line? First install it:
$ yarn global add @postlight/mercury-parser
Then parse an article and check out the results:
$ mercury-parser https://postlight.com/trackchanges/mercury-goes-open-source
Now, as an open-source project — and with your help — we hope to make the Mercury Parser even better. Say, for example, Mercury’s done a less-than-perfect job parsing an article from your favorite web site. You can write and submit a custom site parser guaranteed to get it right quickly, every time. We’re excited about all sorts of ways the Mercury community will contribute to this project.
What about the API?
Over time, we will deprecate the Mercury Parser API. We’ll do it slowly, with lots of warning and advance email notifications, and drop-in replacement code. We’ve committed to creating an easy path for people who want to use Mercury in any way they see fit, using open source, well-documented code that can be easily rolled into any other service or API. We want to put our energy there, making a more tractable web together—not behind a private, hosted API.
Indeed, one of the main drivers for this choice was API users asking us to open source Mercury—and asking how they could help improve it.