Rich Ziade I–I was at a barbecue once with a higher level executive at Pepsi, and Pepsi—I mean for those that don’t know also own—
Paul Ford It’s a beverage company.
RZ It’s a beverage company [yeah] but they also own like Doritos and Cheetos—
PF Yum! Brands!
RZ Yum! Brands, right? And I—I—wanna—
PF When you’re at the combination Pizza Hut and KFC and Taco Bell?
PF You’re inside a Pepsi planet.
RZ Yeah. I mean [yeah] it’s—it’s something and I—this executive, I think he was in marketing and I’m trying—I, you know, you’re at a barbecue and you just met someone, you gotta small talk, Paul [right]. You gotta just talk about stuff and he said—so I said to him, I said, “You know, do you worry?”
RZ And he said, “About what?” And he—I said, “Do you worry about the sort of trends around food and—and being healthy and eating healthy and you guys sell Doritos and Cheetos?” And I’m thinking, you know, this is a good conver—[chuckles] I’m holding, you know, mashed potatoes and a—and a hot dog and I was like, this is good—good [yeah] conversation. And he looked at me and he wasn’t—didn’t feel burned by it or offended and he just said to me, “We just keep eating chips.” [Laughter] [Music fades in, plays alone for 18 seconds, ramps down].
PF Rich, let’s give people a 30 second history lesson.
RZ Years ago I hacked together a job descrip tool called Readability. It was put up as a bookmarklet.
PF Mm hmm.
RZ [Chuckles] Which—which is great and . . . it exploded very quickly, everybody was using it.
PF Meaning it didn’t—yeah, it didn’t explode in a breaking way, it exploded in popularity.
RZ Yeah, I’d never seen anything like it.
PF Tell—tell people what it did, just for clarity.
PF It just became this piece of web infrastructure.
RZ You couldn’t query the service—that’s probably terrible but it—it was running so hot.
PF It was so big it was hard to handle.
RZ It was tens of thousands of dollars in Amazon, AWS uh usage.
PF We started Postlight and Readability came with that and then [correct] suddenly we were like, “Oh! This is really expensive.” [Chuckles] And—
RZ It was also a little brittle, and it needed love.
PF It was brittle and—and look as a—as a startup it was in that stage where it’s like, is there gonna be some big giant path for this? [Correct] Or should we figure out something else?
RZ Correct. And one of the first things we did when we took it was shut down the consumer service which was hard. It’s not—I mean [stammers] it really—
PF It powered all of these sort of apps and—
RZ Apps and services, there were some very popular reading apps that were using it.
PF You built all kinds of things.
RZ Yeah and we said, “Look: we’re gonna drop the app,” and then created a little project—
PF Well now we had Postlight and we said, “Let’s do this but let’s do it really light and really easy to manage . . . and—
RZ Really well. It was a service; it was also still a service. So we got a key.
PF And that was called—that was called Mercury. Is called Mercury and the person [right] who created that is a Director of Engineering here at Postlight named Adam Pash.
RZ And Adam . . . is actually on this podcast right now.
PF From Los Angeles!!
RZ Los Angeles, California.
PF Adam, welcome to Track Changes.
Adam Pash Thank you.
PF Adam’s fun because you give him a task and then it’s quiet for a minute, and then there is a very rapid pace of GitHub commits.
RZ Yeah! And I have to say, what’s also impressive is he just became very committed to the thing he owned. He cared deeply about it and, you saw that, and you know for me it’s like your daughter is now 15 . . . and she brought home a dude. And it’s really—I mean it’s kinda odd, it was out of my hands—
PF No, you had an emotional connection to Readability but you gave up some control. Adam, tell us what you built. Like what is the thing that you—when we said, “Go, build this.” What did you build?
PF That’s the readability curse. I mean and we should point out: we love Python. Python’s a great language but the infra—
RZ I’m a big fan.
PF That’s right. I mean, Adam, what is the advantage? Why—why this stack? Why—why did you make the choices you made?
PF So how many people ended up using this thing in this incarnation?
AP When it was sort of running at its hottest, I think we had roughly five million API requests a day. So, you know, in a month that’s like 150 million or so. So it really go to be a very heavily used thing pretty quickly.
PF Well, and what people need to understand is that’s like 25,000 developers and those API requests might be powering an individual reading view or they might be powering something that another thousand people would look at. So—
RZ It speaks to its versatility, right? It could be leveraged by an app, it could also be a wonderful tool for porting information from one place to another. So if you’re stuff has been living in an old website and you need to just get it all over, you might use it for two months, but it’s hugely—
PF This is the thing, I mean we should emphasize: the web’s a mess!
RZ Web’s a mess.
PF Actually, since Adam spent an enormous amount of time in the messy web like talk about that. What did you find when you went to clean up the internet?
AP I mean it’s funny, you know, Rich’s original bookmarklet is a tool that I’ve used before—and I used to work at this website Lifehacker. I remember when we first posted about that bookmarklet and at that time the web was less messy than it is now but it was still very messy. Now, even when you are cleaning up, when you’re like selecting articles from the page, even within that it’s full of—of imbeds and ads and messy pieces of data.
PF It’s bad. Is what you’re saying.
AP It’s a real mess out there.
RZ It’s because uh I think the web transitioned from, “Let’s get visual ads in front of people that we can sell,” to uh, “Let’s use this as a platform to understand people’s behavior.”
RZ I think this is fascinating puzzle, a puzzle that will never be solved but be the program for a second, talk about how you do it.
A What Mercury does is it looks at a webpage and it tries to see the webpage sort of the same way that we, as humans, see the webpage. So, it tries to pull out from that like the title; and the image; the content of it; and not the ads and sidebars and whatnot. And some of that stuff is relatively easy to do. Like, there’s a lot of metadata in the top of a—of a page that will tell you like what the title is for that and things like Twitter and Facebook, they have sort of guidelines that people follow in order to have their imbeds show nicely when you link to them. And so those things have made a lot of those pieces really easy. The particularly difficult part of parsing a page and—and getting the content out of it is the actual like body, like the article content. And, the way that you do that is really through just a lot of heuristics that sort of look at the page, and the page is made up of DOM elements and all of these elements are sort of like in a tree and so on, but you basically go through all of those elements and start running heuristics on each individual element to get a sense of are there a lot of words in this? Are there commas in those—in those words? Does it look like sentences? Does this look like actual content or does this look like, you know, uh a sidebar? Or does it look like navigation? So it’s really a matter of applying a lot of sort of well thought out heuristics to a bunch of like gobbly gook.
PF How close does that get you? What’s the best you can do with out-of-the-box Mercury?
AP I mean that gets you really close on 90 percent of the websites out there. What you often will end up with is maybe like an imperfect parse, where like it’ll get the content but it might include some pieces that you didn’t want or it’ll get the content but it might miss just like the last piece of it or the first piece of it because some weird way that they’ve done the markup on the site. So we actually have tooling within Mercury which is, this is a plug for any developers listening who wanna contribute to a really great open source project where you can write custom parsers for any website that will essentially take the guesswork out of it and just say, “Here is where the content is. Here’s how to clean that content. Here’s how to transform pieces of it like lazy loaded images so that they’re actual images that you can see and not just like scripts.” So that’s—that’s sort of how it works and how we kind of make [music fades in] sure we get the best parse possible when we can [music ramps up, plays alone for six seconds].
PF Let’s interrupt this podcast—
RZ To talk about [music fades out] ourselves more!
PF That’s right. Listen: we’re your partner.
RZ We are your partner.
PF You come to us, it’s not just that you need to build something. You come to us because you just—you need—you need a plan.
RZ Yeah, you need a plan and you need to—you need to boil that vision down.
RZ To something meaningful and tangible that you can chase and go after and then if you want us to continue to help you, design and build and ship something great, whether it be a platform or an app, you should talk to Postlight.
PF You know people walk in and they say, “We know we have a lot to do in the next couple of years. And we’re just lookin’. We just wanna understand how people think.”
RZ Yeah. And we love those conversations.
PF Yeah because it’s—fine. Let it take six months.
PF But that’s it. Like we are here for the conversation—it’s not just like, “Come in and we’ll build your software! [In agro tone] Right away! Let’s go!” We are here to learn what you need and to figure it out with you and to partner with you and [yes] to help you to like when we give it to you, it’s not like we go away, we’re still there to stand by the work.
PF That’s Postlight. Let’s keep talking to Adam about Mercury [music plays alone for seven seconds, ramps down].
RZ It’s worth mentioning [music fades out] the appeal of something like this and why it’s appealing and why there’s this sort of battle, right? You know, usability is a funny thing because usability can actually be very disruptive. Good usability can actually run counter to business interests and business motivations. People obviously when they wanna read something, you know, they want a nice, comfortable experience but people have to make a living as well.
PF Well don’t—don’t forget ten years ago, any long article was split into like 30 separate pages so that you could put an ad on every one of those 30.
RZ That was a huge update for us, actually. Um was the pagination update where we stitched together multiple pages. They’ve stopped that, more or less, articles [you know] don’t do it much anymore.
PF You know where you see it now is like “The Ten Things You Never Knew About Angelina Jolie’s Thyroid”.
RZ “What Does Andrew Dice Clay Look Like Today”
PF And then they get—you get there and they’re like, “Lots of people get older,” [laughter] and they just show you a pic—you know and then like click. And then [yeah] like you’re 45 slides in and you’re like, “Where’s Andrew Dice Clay?”
A I don’t understand. How—how is it the chum of all of the content on the internet has—has been able to sustain that sort of advertising model because the only reason everyone else got rid of it was because it became no longer useful.
RZ I—I think there’s always gonna be that bit like—
PF I think, also, incredibly cheap labor overseas. Like you’re not hiring editors or writers anymore. You’re saying, “Here’s Wikipedia [yeah] and go steal some content and we’re gonna just—let’s see what happens.”
RZ I think people like to just eat garbage on—sometimes.
RZ You’re on the toilet and—and I don’t wanna get craa but you just don’t feel like really going into—
PF Well sometimes they getcha. You’re reading the article and you get to the bottom and they’re like [yeah], you know, “These Child Stars Have Skin Conditions Now.” And you’re like, “Oh my God!”
RZ “What’s going on?!”
RZ Yeah. And, you know, there’s the weight loss stuff and then there’s, you know, that this new study came out. It just it goes on and on and on, right? So, anyway, people like junk. I do wanna say: I am proud of one last thing, since this a farewell to Readability, I really think it actually changed how designers thought about reading on the web.
PF Well I don’t think it’s—it’s—you didn’t think that. It happened.
RZ It did.
PF We heard from prominent designers who built really big reading experiences [yes, yes] that you all know in your household that it was very influential and that it made them rethink how they were putting text online.
RZ The columns. I remember a designer at a prominent New York City newspaper said, it became a mandate, “Get the crap off the columns.”
PF That’s right.
RZ And that was the beginning of it and then, you know, the redesign and we should have an entire podcast on Snowfall and how it will be looked back upon as the Citizen Kane of longform web reading [laughs].
PF No it won’t [laughter]. It’ll be—it’ll be like, “Do you remember they did that thing where like the skiers all died?”
RZ I just—I just know they spend 150 grand on that guy.
PF Oh no, no, no, [God bless] they—they—
AP I mean does—does Snowfall still work?
RZ Well let’s not get into that, Adam! [Crosstalk]
PF No, probably not. First of all, they spent way more than 150 grand.
RZ That’s—I don’t even wanna—it’s just, I think it’s a wonderful thing.
PF I don’t think they sold any ads on it. For the pe—[crosstalk from Rich] for the people who don’t care about New York City media which is literally everyone [Rich laughs], um the New York Times one day just out of the blue published this huge rich media exciting, wonderful thing—
RZ It snowed at the headline.
PF Oh yeah—
RZ Let’s just talk about that—
PF Yeah. And it had like videos. Oddly, it was about a bunch of people like dying in avalanches which was just—it was just—
RZ They were trapped.
PF Yeah, and like I wanted it to be about like, you know, how to pick a better car but, no, avalanches.
RZ I thought it was really good.
PF Ok. Did you read it?
RZ I did read it.
PF You read all the words?
RZ Yeah, because it was terrible. There were people stuck and they weren’t gonna get out, and you’re reading more and more and it’s weird because it’s this sort of life and death story but then you’re like, “Oh, what a cool infographic” [laughs].
PF Adam, what did you think about Snowfall?
AP I, you know, it’s funny, I was working in media at the time and I think like it influenced everyone like we were like, “What’s our Snowfall?” You know? Like, [laughter] “What can we do?”
RZ I thought it was great. I don’t know.
AP No, yeah, I mean I was very impressed. You know like you’d scroll—part of the—part of the joy of reading it was like you keep scrolling and you’d be like, “Oh what’s—what’s this now?”
RZ They’re still doing it today. It’s still being done. Some people do it poorly but it’s still being done.
PF You know what was cool about it was they went, “Ok. Everyone’s been talking about sort of this rich media like, you know, [yeah] amazing experiences that you can create online,” and [mm hmm] they went, “Let’s do it.”
PF Everybody was waiting for someone to do it and they did it.
RZ And they did it well.
PF And let’s be clear: Mercury is the opposite of Snowfall.
RZ It just felt like a clean experience [right, right]. I think is—was the point we were getting at but we went really inside baseball on this but Snowfall is a fascinating story [it is]. I think it’s an interesting, interesting experiment.
PF But this is the other direction. This is: create a piece of infrastructure that can pull the simple, clean stuff out [yeah], and make sense of it.
RZ Well it’s treating the web like data which is really kind of, you know, back to the essence of the web.
PF Still that was going to be the whole point and then advertising showed up—
RZ And then advertising showed up and—
PF And social and so on. So now, Adam, alright, you did this thing. It kind of quietly explodes which is always a weird thing because you have this giant platform but it’s not—it can’t be a Postlight priority because we’re doing all this work for clients and it kind of—it rolled along working for I guess probably about 18 months. Two years.
RZ Mm hmm. A couple years and—and—
AP Yeah, over two years.
PF And then we just looked at it and we’re like, “Why maintain this when a community really wants it?” And people kept coming to us and saying, “Hey! Do you need any help? Are you gonna open source that? Don’t take that away. We like that. That’s important to our entire business.” Hey? Like it was a very—I would get these calls and people would be like, “Yeah, no, Mercury that’s—that’s gonna be ok, right?”
PF Because people tried to build their own but they just don’t put the energy in and they don’t have the background and they don’t [yeah]—they don’t get their own Mercury. So—
RZ And—and it’s a weird conversation. And this is—whether to open source it or not is a funny thing. Open sourcing—Postlight is a commercial business and it is a piece of intellectual property, right? [Mm hmm] And—and you end up asking the question even though you feel like it’s the right thing to do is like, “Are we now giving away a lot of value that we could hold onto and keep?”
PF You know what I see is Adam’s worked on things like starter kits and we have one for WordPress as well, like a headless WordPress starter kit on GitHub. And . . . that brings an enormous amount of attention and credibility towards the firm and we’re using so many open tools anyway that it’s kind of understood like, “Oh, you’re gonna contribute at this level? That’s cool.” [Yeah] “Ok. I respect that.” So, to me, putting this out open was going to be a way to say to larger organizations, “We’re comfortable playing in this altitude.”
RZ Yeah. Exactly. And—and so there is a net benefit is what we’re saying. So from a purely like business perspective, the halo effect of going out there and really introducing ourselves to different communities out there that we otherwise wouldn’t meet is really powerful. Beyond that, it’s a good feeling. It’s nice [it is]. I mean it’s—not many private businesses get to actually put energy towards something that goes open source.
PF Well and release something big that has impact.
PF So, talk people through the process of a piece of intellectual property is inside the firm and now it’s going to go out into the world, what work needed to happen?
RZ I was like, “What’s the big deal?”
RZ “Just co—move the code over to GitHub and call it a day.”
PF It was already in GitHub. Can we just say “public”?
RZ Yeah [laughs] like I didn’t understand why there was all of this scrambling and discussion. Tell us what’s behind taking something open source.
AP A big thing about open source software, especially if you’re a developer, you can get a sense of this really quickly is if you’re using something and you install something and you can tell from like the GitHub repo that this is dead software; that no one’s maintaining it, that it’s got like a thousand issues that are unresolved, and a bunch of pull requests that no one’s tending to, you know that it’s not a reliable piece of software. You don’t have a lot of faith in it. So, a lot of the goal of open sourcing Mercury was organizing a little infrastructure and also making sure the code was in such a state that we could manage it, manage the open sourcing, you know, manage the community in such a way that it wasn’t going to require all of our time, and we could be like a good steward of this code and ensure that . . . the community who uses it can trust it; can contribute to it; and can feel just really good about it.
PF Ok so I mean you start with like a big lump of code.
PF What—what do you do next?
AP So one thing you do is you go clean out all of the unnecessary comments that you had in it, but then there’s also just, there—there are piece of code—anyone knows this who has a project that is sort of like a more personal project, there are pieces that are sort of unresolved or like don’t make sense that you would need to update. Another thing you wanna do is like update the dependencies of a project. So like, you know, almost every project that anyone spins up on the internet uses other tools, such as Mercury, in their project but like those go out of date and there’s security updates and so on, and it’s easy to sort of deal with that stuff on your own, sort of quietly, but like once it’s public you wanna make sure it’s like got the latest dependencies of everything and everyone can feel really confident about the quality of the code. So, you know, you start doing that.
PF The dependencies have dependencies. Like Mercury probably [yes] interacts with, you know, thousands of other pieces of code.
AP Yeah and when you—
RZ Oh yeah.
AP When you’re upgrading those dependencies, then like things break because dependencies change, their APIs adjust. So you have to then go into your code and like after you’ve updated then make sure everything actually still works. You run your tests again. And so it was a lot of process around just sort of doing all of those things and preparing the code so that we could feel really good about it when we released it to everyone.
PF How do you know when you’re ready to go open source?
AP [Chuckles] Well I think what I learned also is that you don’t. I would have certainly been happier with like two more weeks just to like sit around with the code and feel good about it—
RZ [Through laughter] Sit around with it.
AP [Laughs] Just get comfortable—
RZ Just print it out and—
AP Cuz you know—you know cuz every time you make changes to—to code you sort of feel like you need time to feel a hundred percent sure that you didn’t totally mess it up.
PF Mm hmm.
AP It would—it feels nice to have that time. It really—
PF But you felt the firm thumb of Postlight leadership sort of—
AP The firm thumb and, you know, it really helps get things out the door [Rich chuckles]. I should mention another thing that we did in the process of open sourcing it was we were in the process of sun setting the Mercury parser API which was that API that was doing, you know, millions and millions of requests a day. So, one of the things that we also needed to do in the process of open sourcing it was open source that API code so that people who wanted a drop in replacement for the API that we’re currently hosting, could really easily download that code, deploy it to AWS and have like literally the same code, running on the same infrastructure as it currently is just on their AWS account rather than ours.
PF Yeah we tried to set them up for success. You know actually there’s a big question that comes up with all these projects which is licensing, how you pick a license.
AP We have a dual license of Apache and an MIT license. The purpose of that being, as not a lawyer, to make it the code just as accessible to whoever wants to use it as possible.
RZ Got it.
PF So, it’s—it’s worth noting, too, like you spent an enormous amount of time on this but you had support from the team here in the US and also the Lebanon team.
AP Yeah, definitely, like the open sourcing of Mercury was a pretty big undertaking and, you know, we had a lot of help from the team both here and in Lebanon and, you know, it certainly couldn’t have gotten done without them.
PF Well you can actually see everybody who contributed in the contributors on GitHub.
RZ Yeah. Correct.
PF It’s cool. A lot of people in the company have touched this thing.
RZ Do you—what could we tell the community? What should people do with Mercury? Like how can they contribute?
AP As an open source project, we are extremely excited about getting contributors. We already have gotten some good contributions on GitHub. One of the easiest ways that anyone could start contributing would actually be to create a custom parser. I referenced this briefly earlier but a custom parser allows Mercury to—to know exactly where to find all of the relevant information for any website but you just define a specific website. So if I wanted, you know, to write a parser for the New York Times, which, you know, spoiler alert: that one already exists but I could—I could do one that does exactly what the New York Times needs to get the best output possible. We have a really good README about how to create custom parsers and it’s—it’s honestly one of the best ways that—that anyone could start contributing to Mercury.
RZ You don’t really even have to—you’re not really programming, per se, that’s what’s nice about it.
AP It’s really—if you—if you are comfortable like even writing CSS like the way that CSS selectors work, you can write a custom parser pretty easily.
PF That’s right. Well, Adam, it was great work.
PF And now you have something that you’ll have to take care of for the rest of your life.
AP [Sounding nonplussed] Yeah, that’s very exciting. I appreciate the opportunity [all laugh] [music fades in].
PF Well, Rich, open source Mercury.
RZ I’m very proud of this project.
PF You should be. It’s the end of the road but it’s a new beginning. Dun dun dun!
RZ I’ve very proud of what the team did. You know, I, you know, it’s really been in their hands and it’s their baby for the last couple of years and it’s quality work and it really shows off what we can do.
PF Lemme tell you as a nerd. As a nerd, I love this thing. It’s got a command line tool; I can get the content out of—out of any webpage [yeah]; and I can do whatever the hell I need.
RZ It’s pretty hot. Yeah.
PF Like, if you told me, “Paul, I need to migrate a thousand sites into one WordPress instance in the next two weeks,” now I have—I could do something crazy like that.
RZ You also love binging on data.
PF I like a lot of data! I like big data. I like big, messy, textual, semantic, sloppy nightmares. It’s fun for me.
RZ It is fun. That’s the web.
PF And then can you put it all together and make something new and exciting? So this is what we really did here. It’s not just about the reading view, we put out this power tool for pulling millions of data points together. Billions, if you want . . . that anyone can use, totally for free, for as long as the web and GitHub is around. And longer.
RZ Power tool is a great way to put it.
PF Yeah, it’s not—it’s—you built a good experience with Readability and so that became a big part of it but the infrastructure is what—experiences do come and go. Infrastructure [yes] lasts for decades [mm hmm] and this thing is edging up on a decade.
RZ It sure is!
PF And it probably will have two or three more to go.
RZ You know as long as the web is around.
PF The web’s not going anywhere.
RZ [Laughs] The web’s not going anywhere.
PF Everybody—everybody’s trying to kill it everyday [Rich laughing]. It won’t die.
RZ It won’t die. It does put on some weird outfits sometimes [it does but you just—] but it won’t—it won’t die.
PF—make your peace with it and if you need to get content out of a big, messy, sticky webpage produced somewhere . . .
RZ Go get Mercury.
PF That’s right. Ok. Let’s get back to work.
RZ Have a great week!
PF Bye [music ramps up, plays alone for five seconds, fades out to end].