Reading on the internet can be tiring. There are pop-ups and ads — everything is trying to get your attention. Enter Postlight Reader, a browser extension that removes distractions from any article. This week Chris LoSacco and Michael Shane sit down with one of Postlight’s most senior engineers, John Holdun, to talk about how they developed Postlight Reader and the intricacies of developing a modern open-source browser extension.
Chris LoSacco: This podcast is not sponsored by, you know, libraries of America [laughs]
[POSTLIGHT INTRO IN]
Chris: Welcome to the Postlight podcast. I’m Chris LoSacco, the president of Postlight, and today I am joined by two of my Postlight colleagues. And I’m very excited to talk about a very fun project that we recently released—re-released out into the world. First, let me introduce our head of digital strategy, Michael Shane. Michael, welcome back to the show.
Michael Shane: I’m so glad to be back and I love reading things on the internet, so I—I’m just thrilled about this whole episode and everything we’re gonna talk about.
Chris: You’re well positioned to do this discussion today. And we’re also joined by one of our senior most engineers here at Postlight, John Holdun. Welcome John.
John Holdun: Thank you. I’m also glad to be here. I also love reading things and reading fewer things than I would’ve if I didn’t use what we’re gonna talk about today. So I’m also…
Chris: That’s right!
John: …happy to be here. [laughs]
Michael: The suspense is killing me.
Chris: Yeah. So let’s lead with the headline. We are here today to talk about Postlight Reader, which is an app that we designed and built and released, which we’re gonna talk about in detail in a moment. But before we do that, John, tell the world, why are you here to talk about Postlight Reader?
John: Yeah. I am here because I’ve sort of taken up the mantle of overseeing the development and maintenance of Postlight Reader, which is a product that’s been around for a while as we’ll get into. But we have this sort of renewed effort in it that I am, yeah, sort of shepherding or stewarding or what have you. So I currently have some of the most institutional knowledge, but this has been a project that’s been spread out over many, many people. And I’m—I’m here representing all of them.
Chris: So let’s dive into it. Assume I know nothing about Postlight or its products, its various projects. What is Postlight Reader?
John: Yeah. Postlight Reader is a browser extension. It works in just about any browser. Chrome, Firefox, Edge. You use it when you’re looking at an article or any cluttered page full of text, or even not necessarily cluttered page, but just a page that’s harder to read than it could be. You activate the extension and it strips away all of the unnecessary stuff on the page. Any ads or UI or anything that isn’t just the content, you’re there to read the article or the blog post and shows it to you in a really clean and controllable way.
Chris: So I’m reading an article on, let’s say Buzzfeed and I invoke Postlight Reader, and the nav goes away. The sidebar goes away. Basically, anything that’s not the article that I’m reading is wiped from the page.
John: Exactly. The text is all that stays. You can change the size and color and typeface in ways that are more sophisticated than just the sort of standard browser tools to make sure that is exactly what you need it to be, to read it really easily.
Michael: You almost read my mind, John, because I was gonna say, “but John, there’s Insta paper” and “there’s, you know, now native sort of quote unquote readability capabilities….”
Michael: “….in some browsers.” What is special about Postlight Reader?
John: The—yeah—there are a lot of similar products like this. I think this one is in particular—it’s immediate. You know, with Insta paper you have to sentence to paper with this Safari reader, which works very similarly. But something that’s special about Postlight Reader is what we call extractors, meaning like the custom logic for particular websites so that they pull out the stuff that needs to be pulled out and leave the stuff that needs to stay. It’s all open source, meaning that if you maintain a website or you just really like a website and you’re trying to read it in this extension and it’s pulling out too much stuff or not pulling out enough stuff, you can submit a pull request to this project and tweak it to work perfectly for that website based on whatever—however it was built or whatever markup is there, and make it show up exactly the way it needs to show up. And that then works for you, but works for everybody else that’s using the extension. So I think that’s something particular.
Michael: So this is really interesting because a lot of these types of platforms, you know, readability platforms are sort of unilateral in their approach, in terms of they decide what a better reading experience is for all content. And it’s generally kind of a one size fits all approach. And the internet is not one size fits all. And so push like Reader to me sounds a bit more thoughtful and considered, and like you said, open source so that qualified folks anywhere in the world can interact with it and, and make it better and more effective for their audiences.
Chris: How many custom extractors do we have today?
John: Yeah, there are, I think it’s like 140. 150.
Chris: Dang, okay.
John: Yeah. So each one is for a specific domain. So for example, on the postlight.com website, we have a specific one that knows that when you’re on a one of our insights articles, these are the things that should be preserved. These are the things that should be removed. And that is true for every, every single one of these is for a different website tweaked just right for that content.
Michael: So John, why is it worth making the effort to create a custom extractor? How many people out there are using Postlight Reader? What’s the addressable audience here of Postlight Reader Connoisseurs?
John: Oh gosh. We have over a million installs in the Chrome Web store. It’s also on Edge and Firefox. I haven’t looked at those numbers recently, but it’s a lot of people.
Michael: It’s a lot of people. A lot of people who care about their reading experience.
John: Yeah. In addition to the custom extractors, it does work—it works anywhere. It just—it does—it makes its best guess if you don’t have a custom abstract extractor. Just—just to make that clear, it doesn’t only work on those 150 websites.
Chris: Yeah. And anecdotally using it, the guests out of the box is pretty darn good. Most of the time it’s gonna get you the—the right content just using default heuristics to try and pick out what’s important. But for those special cases, you know, having the a hundred plus custom extractors really helps. Is this an ad blocker? Is that what we’re talking about here?
John: In a sense, it can be used that way. I think it’s a similar purpose. I think a lot of people use ad blockers to remove clutter from their browsing. I think, you know, ads are one of the things that are removed here, but it’s also things like—like a related post link or like a read more link. Not actually more of the article, but more of some other article. All of those things, all of the—the cruft and clutter that is just part of the web is removed here. I think this is. Not equivalent to, you know, your standard ad blocker that is just sort of working all the time. This is something where you are ready to read the thing that is loaded on your page right now, and that’s the only thing you wanna do. And so you—you take this action to enable reader to do that. They’re, they’re cousins.
Michael: So for all of our—our ad ops audience members, everyone working at ad ops. Postlight Reader works after ads have been fired and the page has been…
John: That is true. You still get the impression.
Michael: That’s right. Yeah. It’s not really an ad blocker, I would say, but it’s…
Chris: I totally agree. And I think, John, you said something really great, which is the web of today just has a lot of clutter. I mean, when I think of what’s annoying when I’m trying to read nowadays, it’s not even so much the ads. It’s like, sign up for my newsletter, and here’s this popup to subscribe and blah, blah, blah. And there’s so many publishers now that are just, you know, throwing every single thing they can on the page. And it’s—it’s at the detriment of just trying to read the darn thing. And that’s where I think Postlight Reader really shines and why it’s worth it to have a browser extension like this. Because it’s like, “no, I don’t want—I don’t want any of that other stuff. I just, I wanna, you know, read the piece that this headline drawing me into.” And it’s—it’s not about, you know, just blocking an ad or something. It is, “let me get to the content.” Which is really at the core of, of why this thing exists.
Michael: John, if you would, I would love for you to talk about—I wanna do a little “explain like I’m five” segment here. But not literally five, because we’re talking about Postlight Reader. You click a button, the reading experience is much cleaner, it’s much more enjoyable, but that’s actually glossing over a lot of somewhat magical stuff that the technology is doing. And so I would love for you if you would, here comes a mini challenge. Pretend that Chris and I are fourth grader. Or perhaps 10th graders. Tell us how does Postlight Reader do what it does? How does it work?
John: So on a webpage, there’s a lot of code behind the scenes. It’s called HTML that describes the text that you see and the images and all of the—all of the things that show up on your screen. And describe those things in different ways. Like this is a headline, this is some bolded text, this is a paragraph. And when you activate Postlight Reader, It’s looking at all of that information and determining, sometimes a, a headline is not actually headline. Sometimes it’s a title for a sidebar or something else, and Postlight Reader is smart enough to know that this particular thing that the page says is a headline, is actually the title of your article. This is the content of your article. But then this part inside of it, based on different kinds of code that are behind the scenes is, is a link somewhere else. And it’s not important. And so it—it goes through every single thing on that’s loaded on your browser one at a time and decides whether to remove them or keep them. And then with all the stuff that it’s kept, it wraps it all up in a, a new container to make it look a little bit nicer for reading and then shows you just that stuff and it gets rid of all of the other stuff.
Michael: What else goes into a better reading experience?
John: So there’s font choices. You know, sometimes a font is chosen because it looks nice and not necessarily because it’s nice to read. Or a font can be nice to read in small doses, a sentence at a time, but not so much for pages and pages. Things like line height tracking and letting, and all of the typography stuff that we are, we know it when we see it, when it’s working, it’s—it’s nice. And then sometimes things look nice and, but suddenly 20 minutes later you finish reading a really long article and your eyes hurt.
Chris: You need to go take a nap.
John: Yeah. It needs to go look at a pasture or something that’s somewhere away from a screen, which, you know, that’s, we can’t fix that for everything, but we can make it easier to read the text that’s on the screen. And even just, you know, making things bigger sometimes. It’s the decisions that go into making a news article show up on a news website are not necessarily the decisions that would be made to publish that same piece in a book or on a page, in isolation of all the other requirements that need to show up in your browser.
Chris: What if I wanna take this article outside of my browser? Can I do it?
John: You can. Yeah. So there is a “send a Kindle” feature as part of Postlight Reader. That’s just, once you’ve activated it, whatever is showing up on your page, you click send to Kindle. Just the stuff that was preserved is gonna be wrapped up into an ePub file and delivered via email to your Kindle. And then it’s there. And then if you don’t wanna read it on your—your screen, you can take it one step further and read it on your, your nice eating screen that won’t beam light into your eyes.
Chris: This is so great. I mean, cuz sometimes you get a 10 or 20,000 word article and you’re like, I wanna sit on my couch and I want to read this like I’m reading, you know, the short book that it is.
Chris: And being able to just shoot it over to your Kindle is huge. And it’s pretty easy to set up, actually, which is really cool.
John: It’s lucky that this works the way it does and we—we can’t take a ton of credit. I mean, you know, we made it work, but yeah, every Kindle just has an email address and you can just send anything to it and it just shows up there. It’s pretty, pretty simple.
Michael: It’s amazing that—that—that’s been part of the Kindle experience for so long. If we could just take a—a moment—a momentary aside to praise the—some of the more adorable features of the Kindle. That that would definitely be one of them that I think was kind of ahead of its time and it endures. You know, I think I just got an email from Amazon telling me that my free, I don’t know what antenna it was, it was the second generation Kindle. It can—it had a cellular antenna and it had free cellular forever.
John: Oh yeah.
Michael: You know, it was like a Sprint or something. I finally got an email, yeah, like last year or something saying that for your second generation Kindle—which, you know, died long ago—the cellular service will be going offline, you know, soon. So pour one out for the free cellular service on the old school Kindle. But yeah.
Michael: It’s a great feature. And certainly in New York we see a ton of people reading Kindles on their commutes.
Chris: This is a digression, but one other fun little feature of the Kindle that I love that I feel like not enough people know about is you can rent library books on your Kindle.
Chris: Like you, if you have a library card, you can rent an ebook and send it to your Kindle for the same length of time that you could take a book out of the library physical book, right? Two or three weeks or whatever it is. You send it to your Kindle, you read it, you return it to the library. It’s great. There are apps. The one I use is called Libby. I think there’s another one called Overdrive. If you put your library card in there, and it is super easy and it’s built into the Amazon ecosystem, and I think a few other e-readers too.
John: I have a Cobo. I read a lot of library books
Michael: if this sounds interesting to you, but you’re not sure where to get access to a library that supports this. I think that the New York Public Library offers digital library cards. Maybe it may only be students throughout the United States, but Google it. Check into it. Libraries are more accessible than you may think.
Chris: That’s great.
John: Support your library.
Michael: A little, a little digression.
Chris: Support your library.
Michael: A little…
Chris: I wanna come back to the—the underlying tech that you were describing, John, about—about how these articles get parsed. And I—and I wanna talk a little bit about how we build and maintain and distribute the core parsing engine and these extensions. Because for a minute, they were separate, right? The extension and the parsing engine. The parsing engine was hosted on servers that got called remotely. So talk a little bit about that and how that’s evolved over the, you know, the past couple years.
John: Yeah, it’s been pretty interesting. So it used to be there was an API N point that you send it, I believe you just send it a URL and it sends you back a JSON payload of the parsed content. And that worked really well. It’s—I think it’s the first thing that I would think to do if I was starting this from scratch. Put this behind an API. That ended up being extremely expensive. [laughs]
Chris: [laughs] I wanna go back. You said it’s open source. Are both the reader and the parser open source?
John: The parser is open source. So the—yeah, the parser is the—all of the parts that are actually processing the website. So you give it either a—a URL to fetch or a full already fetch DOM, which is handy for websites that are changing after they load versus just an HTML payload. And then it—it turns it into a JSON object of title, article, author, published date, et cetera. It pulls out all the metadata that is the parser that is open source. The reader is the browser extension that uses the parser, which is not open source, but is freely available in all of the browser extension stores for use.
Chris: Got it. So if I’m a developer and I wanted to use the parser, I can go grab it and fork it and, you know, set up my own instance of it to call it behind an HDP API?
John: Yeah. The the old host set endpoint that we used to serve is also open source, so if you wanted to, you could just take that and just get it up and running on, on Lambda or whatever serverless choice you want. I believe we also have a—a version of it that runs in express JS if you wanted to do it that way. You could also just, yeah, you could take the library itself and integrate it into whatever you wanted to use. Anything that needed to remove clutter or also, a lot of people just use it for identifying content on websites.
John: Like you think about like an RSS feed, for example, is—An RSS item for an article is similar to the page that the article is on, except that RSS is much more structured. It has title and author and content fields, all of the things that are—that parser is doing. If you needed to, for example, generate an RSS feed from a site that doesn’t have one, parser could be a really good tool to start with because it does most of that work for you, and then you just have to expose that as an RSS file somewhere.
Chris: I love that. That’s a great use case. I feel like another good use case is migration.
Chris: If you’re moving a bunch of content, especially from like a custom CMS that doesn’t have a good export function and you want to grab all that stuff and then put it into WordPress or content fold, or wherever you’re going next, using the parser to get at the core content as opposed to the rendered pages for all of these things could really make your migration easier. So I actually think that there are a handful of use cases where the parser is really, really valuable and it’s—and it’s available. You could just go and GitHub and grab it.
John: In fact, we—we did release a WordPress plugin that uses parser. It’s called Calisto.
Chris: I didn’t even know that. That’s great.
John: [laughs] Does exactly that. Yeah. It’s—you just give it a URL and it creates a—a WordPress post format.
Chris: There you go. Two steps. I forgot we even did that. Two steps ahead. Wonderful.
John: Yeah, it’s just in the—the toolbox.
Chris: Talk to me about what it’s like to build a browser extension in modern times. Here we are in 2022. All of these browsers, Chrome, Firefox, and Edge have, you know, they’re very mature at this point. They’ve been around for years. They have well established paradigms around building extensions and what you can and can’t access. And I’m curious, you know, for those who are thinking about getting into this market or getting in, you know, building something that extends the web that we all, you know, interact with every day in our browsers. How is it? What is it like out there? What are the sort of best practices, what works? What doesn’t? That kind of thing.
John: You know, it’s easier than it used to be. There was a time when browser extensions or plugins or add-ons or whatever, you know, they’ve been called many things over the time. It used to be really complicated and really fragmented, and some browsers supported them, some didn’t. It’s gotten—yeah, a lot more streamlined and consistent. The APIs for different things that you can do, like accessing tabs or opening new tabs or exposing a setting screen on your—in your—your browser settings. All that is pretty consistent now. Getting your development environment up and running is a lot easier than it used to be. You can just—you can go into your extension settings near browser, and this is true for all of them. By all, I mean Chrome, Firefox and Edge. Safari used to support add-ons. I’m pretty sure they don’t anymore. We don’t have a an add-on for Safari because I think, I think it’s not possible anymore, but for the others, you just, you go into your extension settings, you basically side load your extension from just a folder on your computer and it just runs from there. And so you have, like the reader uses webpac, so you have the Webpac development mode just running somewhere and, just, it’s just generating a file and the browser is reading that file in real time. And it’s—it’s really nice to develop for. It refreshes, you know, automatically to some extent.
Chris: That’s really nice.
John: Yeah, it just—it just kind of works. One tricky thing is, so there’s this manifest file that is required that sort of describes how your piece of code is exposed as an extension. There is manifest V2 and manifest V3. As of literally this moment, or I guess a couple days ago, the last time I checked this, cuz it is constantly changing, Chrome accepts V2 or v3. I think Edge requires V3 to publish a new extension in the store, but will accept V2 for legacy reasons. Firefox has talked about supporting V3 but does not yet. They’re almost the same, but there are small differences between them. So we—A recent thing we had to do was just make it so you could publish V2 and V3 versions of the extension with different commands so we can publish to all the stores at the same time. That is annoying. Hopefully that clears up in the future. If you don’t need to support Firefox, then just go with V3 and you’re good. There are also some polyfills that I think are, again, as of the last couple weeks, are not really necessary anymore because the APIs are getting closer and closer together. So that is something to to think about is this—this new functionality you get with an extension is fairly consistent but not perfectly consistent, and so you’re gonna have to deal with some platform issues, but overall, pretty straight forward.
Michael: Safari does support extensions, by the way? We might have to, might have to take a look at the roadmap. I don’t know. Audience.
John: Oh, okay.
Michael: Hello@postlight.com. Should there be a Safari extension for Postlight Reader?
John: Yeah, we could probably do that. Hopefully they use all of the same APIs, but if they don’t, you know, we can, we can put the work in.
Chris: You said submitting to the stores. Does that happen automatically? Or do, do we do that manually? Like once you’ve gotta build, how do you actually publish it? For the Chrome Extension Store?
John: Yeah. We—we do that manually. You package up your—your file, which is just a zip file with all of your—your code and your manifest file in it, and you upload it to—they have an interface for—for publishing. It’s similar to, you know, the Apple App store or the iOS app store, or you know, any of those sorts of things. It’s a—a marketplace. You just upload your code. There’s a review process that is opaque, but in my experience, pretty quick. It’s, you know, a couple business days to get a new version up. The nice thing is it does, I think all the browsers automatically update your extensions for you when you’re just using them. So users don’t have to worry about getting the new versions. They just sort of push automatically. But yeah, it is a manual process that is, it’s tedious because, you know, there are three places we need to publish to each of them, and they all have, you know, there’s separate logins and everything, but not so complicated.
Chris: If people are listening and they want to try it out, how do they get it?
John: Go to reader.postlight.com. You’ll see some, some screenshots. That’s more information, but you’ll also see download links to the three respective stores, and you can get it for whatever browser you’re using.
Chris: Well, except Safari.
John: Except Safari. Yes.
Chris: For now.
John: And it’s, like, Opera. Sorry, to the Opera users.
Chris: Oh, I forgot about Opera. Is opera still around?
Michael: Oooh! You really had to, you had to bring that up, huh? You had to make those folks feel left out. Don’t they have enough challenges?
John: I don’t know. I bet they’re doing great. They’re happy with their choices. Shout out to Opera.
Chris: There are also a bunch of Chrome derivative browsers that I think the Chrome still—support chrome extensions.
John: Yeah. Yeah. Like with Edge, you can actually load from the Chrome Web store, so you have two options. You can get the Native Edge version or just load from Chrome. Yeah, so that’s, that’s pretty cool because they’re, they’re basically the same thing under the hood, just different logo.
Chris: It’s fascinating to me how—I mean, it’s interesting to hear you describe for the way you were talking about with the manifest files and how it’s still changing. So up to like—it’s every, every minute, every day…
Chris: …is a different story. So it’s—it’s very much an active ecosystem to think about developing for the browser. Which makes sense because, so, I mean, so much of our lives are lived online and the window to the—the internet is, you know, Google Chrome, or Safari or Edge or whatever you’re using. And so being able to supercharge these, these windows to the internet, it’s really fascinating. And, you know, hearing firsthand what it’s like to develop for these platforms and figure out how to, you know, do it in a—in a sort of clean and efficient way is interesting. And I think, I think, you know, it’s gonna continue to be something people think about, especially as they have services that you know, naturally fit alongside the browsing people are already doing.
Michael: We’ve talked in other podcast episodes about how the number of native mobile apps they use on their devices is—is not that great. And a small number of apps dominate the amount of time spent within native mobile apps. Which is to say that the web browser is still the internet. For most of us.
Michael: A lot of the time when we’re not doing native appy things and it’s very much worth investing in, making the open internet running on top of http or https enjoyable and fast and safe and stable and beautiful and all those things.
John: Yeah. It’s interesting you mentioned mobile because this—this is something that is—it’s desktop only. And I think—I’m not an expert on this, but I believe that it is not really possible to do this kind of thing in a mobile browser, at least on iOS, I’m sure. On Android it is probably cuz that’s—that’s more just a more permissive platform. But yeah, this is for desktop, but also mobile views of most articles are a little less chaotic than I think the average desktop website. So this becomes..
Chris: It’s true.
John: …maybe less critical on your mobile device.
Michael: I would agree with that. And certainly on—on iOS, you’ve got Safari Reader built in natively, so you’re, you’re covered there. And that, and that’s a good experience. But it’s—it’s definitely true. I mean the desktop, you’ve got more real estate, which means more opportunities to make bad decisions.
Chris: And boy, do they.
[POSTLIGHT OUTRO MUSIC IN]
Chris: Well, this was great. John, thanks for coming on and—and walking people through it. If you’re listening again and you—and you want to try this out for yourself, please go to reader.postlight.com or you can also check out our GitHub to fork the parser and go do something great with it yourself. We encourage that and if you run across a site that doesn’t parse cleanly in Reader, Please open a pull request for a custom extractor so we can fix it. Or just shoot us a note at firstname.lastname@example.org and we will put some energy to fix it up. John, thank you so much. Michael. Thanks for joining.
Chris: And as always, we close the show, you know, if you’re, if you’re listening to this and something resonated or you want to give us feedback. Or you just have a big technology project that you’d love to work on with us, we wanna hear about it, hit us up at email@example.com and we will be glad to chat. Thanks very much and we will see you next week. Bye!.
[POSTLIGHT OUTRO MUSIC]