The history and the future of geotagging: this week Paul Ford and Rich Ziade talk to Aaron Straup Cope, a programmer who works with maps and geographical datasets. The conversation covers his time as one of Flickr’s earliest employees, data visualization, gazetteers, the evils of Wal-Mart, geocoding (and reverse geocoding), and one of the most controversial decisions in online mapping — Google’s decision to cut off the poles and make the world a square.
Paul Ford: Rich Ziade!
Rich Ziade: Paul Ford!
Paul: It’s Track Changes, the official podcast of Postlight agency. Rich, tell me what Postlight does.
Rich: I love that we say “official,” because there are all those —
Rich: Bootleg podcasts.
Paul: Bootleg podcasts.
Rich: That claim to be for Postlight.
Paul: Don’t interrupt me with your, with your…
Rich: All right.
Rich: Postlight is a digital studio that designs and builds amazing apps and platforms.
Paul: Check us out on the global information superhighway at postlight.com.
Paul: And we are very, very lucky today.
Paul: Because we have a man in the studio who has three names.
Rich: Those are the best, man. Go.
Paul: Aaron Straup Cope.
Rich: Not only that, but they’re like, sharp names.
Paul: Yeah, you get out there with — Aaron, hi, how are you?
Aaron Straup Cope: Hello.
Paul: OK, let me tell you about your name. You get in there with the Aaron, and you’re like, oh, yeah, I’ve been here before. And then, STRAUP. Which actually sounds like — you sharpen a razor on a Straup. You know, you Straup a razor. Is that right?
Aaron: I don’t actually know what the etymology of the name is.
Paul: Somebody’s gonna tell us, at email@example.com. And then Cope! Which is just, like, you’d better, you’d better get ready. You’d better actually prepare yourself. Because you’re going to need to Cope with this individual. Aaron Straup Cope. Hi, thank you for being here.
Aaron: It’s a pleasure.
Paul: You’re hard to explain. Because you’ve been around for a while, and you do a lot of different stuff. Let me see if I can tell you what you are and then you can tell me if I’m wrong.
Rich: Jump in there anytime.
Paul: Yeah, feel free. I do this a lot. I would say your baseline is that you are a technologist. You’re a programmer and you think, you work with computers and have worked with them from your upbringing in Canada, and your life in Vancouver. You, you were…what was your first computer?
Aaron: Timex Sinclair.
Paul: Oh, like a 1000 with the membrane keyboard?
Aaron: I think so, yeah. My father tried to get me into computers very young, like around the time that they were out, and it didn’t stick.
Paul: No, those were terrible, terrible computers. If everyone ever — they actually had the programming language built into the keyboard, and so print, you’d had to like, jam this membrane with your little tiny boy thumb, or girl thumb.
Rich: There were these weird non-standard keys at the top…
Paul: It also had 2 —
Rich: Chicklet-y keys —
Paul: 2K of memory, which represents about 148,000th of a second of this broadcast.
Rich: I’m a fan of constraints.
Paul: That’s…that…that’s not a constraint. That’s like one page.
Rich: Oh man.
Paul: So I think I first met you at an event like, at The New York Times. They had an open hack day for the first time ever.
Paul: I met you there. I don’t know if you started with this, but like, one of your, one of your jobs early days was at a little organization that was creating a game called Game Neverending.
Aaron: Yeah, I arrived after GNE, and before Flickr. I arrived in month 10 of Flickr.
Paul: OK, so Game Neverending was a game that people played online in Flash, and then it pivoted, suddenly — It was created by…what was the company?
Paul: And it pivoted and it became Flickr, and then you came along.
Paul: So what was Flickr like early days? How many people were there?
Aaron: I was employee nine or ten.
Paul: Oh my.
Aaron: I think.
Rich: Very early.
Aaron: Yeah. It was…small, heads down, everybody worked super hard. It’s a weird thing to explain to people, because the expression that I’ve always used is a terrible expression, because it comes across as all the worst habits of the way people talk about technology now, which is the sort of macho “killing it,” “crushing it” rhetoric which is pretty tiresome.
Paul: Bonin’ it.
Rich: I have never heard —
Paul: Kickin’ it. Throwin’ up on it.
Aaron: I mean the way that I described it was, and again it’s a terrible description, but it was that there was a culture of shame, which meant that you felt bad if you weren’t working as hard as everyone else, like…the people that I worked with? You were just sort of in awe of what they had accomplished.
Paul: I mean, this sounds amazing and it sounds very productive but also sounds like a terrifying cult. [laughter]
Aaron: I suppose. [laughter]
Rich: You’re all kind of working on the same thing, and you’re trying to…you’re seeing everybody work hard, and you’ve got to push up your side of the house, right?
Paul: You’re fairly young, too, right? Like, you’re in your twenties, or…
Aaron: No, I was in my thirties.
Paul: You were in your thirties?
Paul: So you, at this point in your life, I’m assuming, like, occasionally you wanna like, go to bed. See a movie.
Aaron: Yeah, and I didn’t.
Paul: You didn’t —
Aaron: I mean it wasn’t…you know, everybody had lives, but we also had this thing that was growing faster than anyone quite knew what to do with.
Paul: Oh, so that’s fascinating, right? I mean, we’re talking about all the hard work. But if you’re seeing something scale up, that is, like, a once-in-a-lifetime experience.
Aaron: And it was also, I mean, the thing that I used to say to people was the great advantage that Flickr, or now, like, any other photo-sharing website has, is that we dealt in photos.
Aaron: Everybody loves photos. Everybody does photos. So, you know, at the end of the day, there would be all the technical challenges and then, you know, some days there would be issues around moderation and, just, content, but mostly we just had this enormous, beautiful monster at the end of every day, and you would stop and think, well I’ve seen all the photos, and then you’d turn around and be like, oh look at this. And it was amazing.
Paul: There was always another picture to see, right?
Aaron: Yeah. I mean, the first experience I had of that was, you know, I worked on this, what we joked as Flick with no ambition, I worked on a project called The Mirror Project with Heather Champ.
Paul: Oh, I’m in The Mirror Project.
Paul: I submitted a picture. OK.
Aaron: And for, you know, for people who don’t know, The Mirror Project was, the byline was “Adventures in Reflective Surfaces.” And it was pictures —
Paul: It made everybody really angry, too.
Paul: Yeah, I remember people in my world just being, oh, the narcissism project. Like, the idea that people would take pictures of themselves and put them on the internet was seen as the basest, narcissistic awfulness.
Aaron: Well so I don’t wanna —
Paul: I did it. I was in there.
Aaron: I mean, the story that Heather tells is she used to go on vacation and she would have pictures of everybody else that she was with except herself. So she had no memories of her presence on that trip. So she started taking self-portraits in the bathroom mirror, and the first leap of faith is that the next logical step is to take those pictures and to put them online.
Paul: That was a big leap of faith back then.
Aaron: And then the second leap of faith that you just have to accept is that random strangers will start sending you pictures of themselves in bathroom mirrors.
Paul: But we were so naive, right, because we didn’t think they would just constantly send pictures of their penises, which is what they do now.
Aaron: Right. Well, we didn’t let people just upload stuff. Everything went through an editorial filter. I mean, Heather did —
Aaron: Pick and choose.
Paul: And she later became sort of the moderator of moderators for Flickr.
Aaron: For Flickr.
Paul: So you rode the tidal wave of both good and bad internet behavior. You were there in this culture as thousands and thousands of images are suddenly coming in and going out into the world at once…
Rich: I’m sure there’s some horrible stuff coming in always. Good and bad is coming in always.
Paul: But you’re also seeing that at scale. Like, I mean, that was one of the first times people had probably seen people misbehaving visually at scale.
Rich: Aaron, what was your role, specifically, at Flickr?
Rich: What corner of the world were you thinking about or working on?
Aaron: Engineering. So I was engineer #2.
Aaron: For the, for the application. There were other engineers. But Cal Henderson, who was the architect of the site, did everything.
Rich: Mmm hmmm.
Aaron: And then I came along to relieve a lot of the burden, and so I did —
Rich: You were touching — you weren’t, like, focused on metadata or some corner of it. You’re so early on that you’re touching everything.
Aaron: And so I did a lot of just grunt work and leg work, to keep the site up and to give Cal the time to work on other things, and then as the team grew and there was more space, I was working on all of the backend for the geotagging, and then sort of side projects like machine tags at Flickr.
Paul: You’d had a website for a long time, aaronland.net. There was always interesting stuff. It always felt very close to my world. You’d be, like, messing around with XML one day, or just document stuff. And I was like, oh, I kind of get that guy, that’s cool, that’s impressive. And then you just started to deal with ever larger sets of data, like, and get, do stuff like geo early. This is like 2008, you were doing…?
Aaron: Geotagging launched…it will be the ten-year anniversary on August 28th of this year.
Paul: So 2006?
Aaron: Flickr did geo in 2006. I was doing geo when I was scraping The New York Times…
Paul: So like, Google Maps is out, and like, Flickr is out.
Paul: And then those were —
Rich: The game’s changing at that point.
Paul: The game — that was intense stuff, and it was big, and I remember you were doing things like…I don’t know, just sort of drawing maps of the world using Flickr’s data, right? You were doing…
Aaron: That was called the shape files, or the alpha shapes.
Aaron: So what we had was Yahoo had this enormous gazetteer.
Paul: Tell the listeners what a gazetteer is.
Aaron: A gazetteer is a big list of all the places on earth with stable identifiers and pointers to other places that have a relationship with that place.
Paul: Like what would a record be in a gazetteer?
Aaron: So New York City might be ID12345.
Aaron: And if we imagine that it’s a JSON file, then it’s just structured data, so we’ll have a name, a stable ID, hopefully a geometry, so you can see the shape of New York, and then it will have pointers to New York City as part of the region of New York, which in turn has its own ID and its own JSON file. And then it’s part of the United States…
Paul: So it’s not just like the shapes, it’s not just the names, but it’s those things in relationship, because New York City is part of a state called New York, which is part of a country called the United States.
Aaron: Yeah, I mean, so part of the reason that we glommed on to gazetteers at Flickr was just purely practical reasons, which is in 2006, there was not globally available geographic data at the scale we needed.
Aaron: So it was just a question of coverage.
Paul: I mean, you couldn’t just go out and get a bunch of pictures of the world, like you can now with OpenStreetMap and things like that.
Aaron: Right. I mean…
Aaron: It just didn’t exist.
Aaron: People didn’t have it. I mean, when we launched geotagging, the map that was done by Yahoo, it was pretty embarrassing for everybody. They fixed it quickly enough, but when we launched, we didn’t have street-level data for London.
Aaron: So what people saw when they tried to go to geotag their photos was a giant grey blob.
Paul: Well that’s —
Rich: Oh boy.
Paul: That’s not unlike London sometimes. [laughter]
Aaron: Yeah. But so if you fast-forward to 2009, 2010, when Yahoo had decided to give all of the mapping services to Nokia, or Navtech, what you discovered was Navtech didn’t have street-level data in 2009, 2010 for Tokyo, because they had never driven it, because they weren’t able to create a deal. So there was always this question of the sort of this toxic relationship of coverage and licensing and quality. And so because we wanted to be able to tell people where all their photos in, you know, Kips Bay or the Upper East Side were taken, we needed to just, as a practical matter, store a fixed number of IDs per photo.
Paul: So you had to work backwards to a map of the world in order to get your job done.
Aaron: Kind of.
Paul: Yeah. That’s a nice, nice sort of constraint. First create a map of the world. Second…
Aaron: Well so one of the things we were were able to do was, you know, as people geotagged their photos and as we were able to tell them where they were, the service that we had for asking where someone was would return bounding boxes, so it would return rectangles. And the problem is that if you have multiple overlapping rectangles — for neighborhoods, for instance — the question then is, which one did you mean?
Paul: OK, so I live in Ditmas and I take a picture, but that could — it kind of overlaps a little bit with Kensington in peoples’ imagination, and then…
Paul: I get them both back and — you had the responsibility as an application person to give me an answer.
Paul: Not to be like, we don’t really know. Computers aren’t allowed to say that.
Paul: We don’t want this ambiguous stuff back.
Aaron: The problem was that we had bounding boxes, and so if you take the state of California, and you draw a box around it, what you end up with is a quarter of the possible results are just wrong, because they’re in Utah or Nevada, and the other quarter are, for the purposes of photo sharing, impossible because they’re in the water.
Paul: Oh, so you’re talking, literally, a big rectangle around California.
Paul: That’s what you’re dealing with, because that, the computer is good at that kind of math.
Aaron: That was all the data that, that we were able to return out of the service and so what we started to do was we thought, well, we have all of these geotagged photos, we have all of these photos with coordinates that have IDs associated with them, maybe we can draw something better than a bounding box.
Aaron: Maybe we can start to draw the shape of them.
Paul: Oh, so, because you know where people are?
Aaron: Country, we could do it well. City, we could do it most of the time. Neighborhoods, we had to turn off neighborhood display two weeks after geotags launched, because the best way to pick a fight with someone is to tell them they’re in the wrong neighborhood.
Aaron: And so just as a function of priorities and time and all of the other things we were doing, it took us about 18 months to finally add a feature to the photo page where you could be like, this is in Ditmas, not in Park Slope, or wherever.
Aaron: And once we could do that, we had some sense that we weren’t just gonna provide an echo chamber by tracing the shape of where people said they were.
Paul: How did the users react to all this geography?
Aaron: Unless we told them they were in the wrong neighborhood, they loved it. I mean —
Paul: What did they do with it? Like, what’s that — what does that data do for them?
Aaron: Well, the short, truthful answer is I don’t know, because the Flickr audience contained multitudes.
Aaron: The slightly longer answer is in the first 24 hours, people geotagged a million photos.
Paul: See that’s right — I just remember it became part of the product in, like, five seconds. It was just like, one day, Flickr got geo.
Aaron: Well, so the backstory is that Dan Catt, who was living in the UK at the time, set up a website called geobloggers.com, and what he said to people was, you need to add three tags to your photos. One of them was geo:lat= a latitude. Same thing for longitude. And then there was what I always refer to as an anchor tag, which was just geotagged. And what Dan said was, I will harvest the Flickr API every night for new photos tagged “geotagged,” pull out the latitude and longitude, and plot your photos on a map. And we saw that, and we thought, we should hire this guy.
Paul: So you have this culture where somebody’s like, hey, you guys know there’s all these places in here. Let’s do something with it. And the culture of Flickr was like, yeah! Great!
Paul: Which, to me I always, I associate you with organizations, because after, you went to Flickr, that got bought by Yahoo, which as everyone knows is one of the great success stories of the last millennium, and you left Yahoo, which I still don’t know why anyone would ever do that… [laughter] And you went to, I believe, Stamen Design.
Paul: There’s a thing that’s interesting which is you, you think really big thoughts, and you think in a very abstract way, but it always translates back to you writing code that then ends up on GitHub or ends up somewhere. Like, or giving a talk or whatever. But you’re an unusual combination of very, very abstract thinking and very, very specific shipping.
Aaron: Hopefully. I mean, on good days…
Paul: No, I’m giving you a compliment, but it’s also real. Like, if I go look at GitHub associated with — what’s your GitHub handle?
Aaron: These days, it’s thisisaaronland.
Paul: Right. There’s a lot of stuff there, right? And there has been, and it’s, it’s, for years. You’ve been giving stuff and then it comes — so you go to Stamen.
Rich: It’s worth describing what Stamen is.
Paul: Yeah, what is Stamen?
Aaron: Stamen is a data-visualization studio. It’s a design studio.
Paul: In San Francisco…
Aaron: In San Francisco that has been doing, I mean, in many ways, and this predates my arrival, Stamen sort of singlehandedly created the notion of data visualization as a thing that people wanted.
Paul: Yeah, if you’re gonna have a livestream of tweets visualized on the background of a giant screen at, like, a sporting event on national TV? That was a Stamen product for a while, like that was — it would be like, whoa! That was really awesome and real and it would be by Stamen.
Aaron: Yeah. And you know, Eric Rodenbeck, who’s the founder, is always fond of saying data visualization is a medium, that the volume of data would tell you something, and that was Stamen’s great strength, of being able to show you what was there, and to make it compelling. That it wasn’t just a number-crunching exercise. That there was quite a lot of fashion and spectacle and design in it.
Rich: It was art. I mean, it bordered on art. There was a definite care to the impactfulness of what was being produced, which was really cool. I mean, you could tell, technically, there was some cool stuff happening underneath the hood, but there was definitely a care to the aesthetic of the thing.
Aaron: Yeah. Mike Migurski, who was one of the other partners, he did a slide in 2007 where he was, basically the argument was he was saying, the design and math are no longer on parallel tracks, they are beginning to converge. And that was the space that Stamen was able to occupy.
Paul: And I noticed, I remember there, all of a sudden, very pretty maps started to come out of Stamen, and out of you! Not physically out of you, but out of the work that you…you didn’t…
Rich: Well we don’t know that for sure. [laughter]
Paul: No. I don’t — I honestly don’t think that Aaron ever produced a map physically. Um… So the geo-bug had bitten you hard?
Paul: At that point. Like I would say that, if there is a theme, like, place is a theme in your career.
Paul: But then you kind of took a turn. You were — I was like, oh, Aaron’s a map guy, and then you went and worked at the Cooper Hewitt Museum of Design.
Paul: It’s part of the Smithsonian, it’s here in New York City.
Paul: And you did something really weird, which is you put their whole collection on GitHub.
Aaron: That also predates my arrival, but, yeah.
Paul: WHAT! SOmeone else did that?
Aaron: That was Seb.
Paul: Oh, OK. Seb…?
Aaron: Seb Chan was the director of…the original, the director of digital and emerging media, at the museum. He was hired by the then-director, Bill Moggridge, to come in and, as part of the renovation, because the museum was closed for three years, to try and imagine, in meaningful and practical terms, what it means to have a museum that is genuinely part of the internet and vice-versa.
Paul: Watching you guys work over a while, because you did things like accession in iPad app, and try to figure out how to bring that into a design museum as code.
Aaron: As a design object, actually.
Paul: Right, but you acquired the code as well.
Paul: Right, sorry, yeah, I should be clear. But it was, I learned something, which I don’t think most other people apply, but people talk a lot about authentic engagement and connecting, and what they’re often thinking about is broadcast, and what really seems to work online is not this, like, oh we’re gonna have a good Twitter account and a good Facebook account, but we’re going to engage with people, kind of in the commons, in a meaningful way. And even, I’m not a fan of Walmart, but Walmart Labs is a good participant in the open-source world, and if you go and you look at their GitHub page, they have lots of stuff and they give stuff back and it’s pretty well documented, and it actually, it’s like, I really have very few positive associations with the Walmart brand, but that’s one, I’m just like, OK, well, there’s a thing they’re doing right, that I respect, like I have to take that into account when I think about…
Rich: I…I like Walmart.
Paul: Great. Good for you. …. Get out — get out of here. [laughter]
Rich: Just wanted to throw that in there.
Paul: What do you like about Walmart.
Rich: I think it’s a…it’s, you just go in there, you spend like $70 and your whole shopping cart’s full.
Paul: Sorry, I do like shopping at Walmart, but the company kind of sucks.
Rich: Do they?
Paul: Oh yeah, they’re bad. They’re bad. I mean it’s just…
Rich: Oh. This could…this could overwhelm the podcast…
Paul: Let’s not derail. It’s just a lot of, like —
Rich: We’ll come back to this one.
Paul: It’s a lot of, like, really bad minimum wage, or things where they were, at one point they were buying insurance, life insurance, on their employees without letting them know…because they were just sort of playing them, so when their people died, they’d get a payout.
Rich: OK. So about that museum you work at, Aaron.
Paul: Did. Did work at. Formerly worked at.
Rich: Oh, OK. You’re not there anymore.
Rich: Where are you today?
Aaron: I am doing maps again.
Paul: What is Mapzen?
Rich: Mapzen’s an interesting project, yeah. Describe the mission.
Aaron: The way that I’ve said to people is, Google’s great slight-of-hand — and there’s no, it’s not a pejorative, Google’s earned it — is to convince people that when it comes to geo, that there isn’t an iceberg. That there is only a tasteful material design tip, right? It’s this beautiful little box that just, you type magic into, or like, a map that just works, that all the stuff is, it’s easy, it’s simple.
Paul: It’s all the stuff we say software should be. That we talk about as being usable and good.
Rich: Hiding the ugly.
Aaron: Well they’re also hiding the iceberg. And that’s not a criticism, they’ve done an incredible job of hiding the iceberg, but when it comes to geo, what you’re talking about is an existing dataset, and then you get into quality coverage licensing, then you get into features and functionality around that, so search is the big one, geocoding and then potential reverse geocoding if you need to…
Paul: Wait, tell people in the world what geocoding is.
Aaron: Geocoding is typing in an address or a string or “pizza” and a zip code —
Paul: Oh, OK.
Aaron: And having the computer —
Paul: Pizza 10010.
Aaron: People do it.
Aaron: And having the computer return a useful result.
Paul: OK, so it’s like, OK, this person wants pizza, that’s a zip code, let’s give them a list of all the…
Paul: And what is reverse geocoding?
Aaron: Reverse geocoding is taking a latitude and longitude coordinate and telling somebody where it is.
Aaron: So if you’re standing on corner of…
Paul: 26th and Broadway.
Aaron: Right. So are you in…Gramercy, are you in Flatiron, are you in…what do they call it no, NoMad?
Paul: Yeah, but so we’re back to the bounding box.
Aaron: So that’s reverse geocoding.
Aaron: It’s trying to figure out what somebody means.
Paul: So there’s all these things — there’s all these things and services and data that is underneath something like Google Maps.
Aaron: Yeah. And then even beyond that, there’s routing engines for giving directions, and then there are map tiles, and then there’s producing the map tiles, and there’s storing the map tiles, and there’s delivering the map tiles. So it’s this, it’s actually a huge project, and so we’re doing it all.
Rich: I mean, hundreds, if not thousands of people, work on it, if I’m not mistaken.
Aaron: The story that I’ve heard, and I don’t know if these are accurate numbers anymore, but Google has 1,000 full-time people working on maps.
Rich: That’s incredible.
Aaron: And so about 6,000 contractors.
Paul: Well —
Rich: And 6,000 contractors?
Paul: Right, but think about the drivers, too, I mean, it’s, for the like, for the Google Maps cars, and all that stuff. Just to like —
Rich: 7,000 just working on…that’s incredible.
Aaron: So we’re doing the, we’re building the entire iceberg as an open data, open-source project. And…
Paul: So if I wanted to do something with geo, but I don’t want to use anything that, like, Google or anyone else has, I use Mapzen.
Aaron: You could use, we have, um, pre-baked services.
Aaron: So both for search and for routing, and we also host tiles, so you could use —
Paul: So tiles are pictures of the world. They’re rectangles.
Aaron: They are rectangles. What the great revolution, or the great insight that Google had in 2005, and it’s to the consternation of geographers everywhere, was they said, what if we just didn’t worry about the poles?
Aaron: They’re just like, whatever. And what if —
Paul: Screw Antarctica?
Aaron: Pretty much.
Paul: To hell with penguins?
Aaron: And they were like, what if we could just turn the world into a square?
Aaron: So if you zoom out and you could take the entire world and put it in a single square and lop off the top and the bottom, then it makes doing the math really, really easy to subdivide that.
Paul: Oh, so just screw McMurdo Station and just go for it?
Aaron: Pretty much.
Rich: Kind of OK with that.
Paul: I mean, you know, people pay way too much attention to Antarctica for what it actually does.
Aaron: It’s just a different way of doing things. I mean…
Paul: Well…no. Screw Antarctica. That’s where we gotta get with this.
Aaron: So basically what you can do at that point is you can represent the world as little 256-pixel square tiles.
Aaron: And so zoom level one is a single tile. Zoom level two is four tiles, and it just goes on.
Paul: So a given digital map is made up of all this different kind of data, all these different kinds of services, and literally millions and millions of things like tiles, like, all these little bits of data that then get put together of the user?
Aaron: Tiles become a very, very efficient way to transfer lots of data about a place over the web. So yeah.
Paul: And allowing zooming. So there’s different levels of access.
Aaron: And different amounts of data at different zoom levels.
Aaron: And so originally when Google started, they would send down 256-pixel PNG files.
Aaron: And now what people are doing is they’re actually encoding data in those tiles, so if you actually looked a tile, you wouldn’t see a picture, you would see a blog of JSON, and there are client-side libraries that will do all of the rendering. And what that means is that you can start to take advantage of all those technologies that are in the browser for doing dynamic styling, swooshy-swooshy stuff…
Paul: So this stuff goes —
Rich: It’s also that quasi-3D effect is…
Aaron: Yeah. Yeah.
Rich: As you’re sort of panning, you can see the building, the sides of the building, and all that.
Aaron: Yeah, because as geo becomes baked into the web browsers, and as you have the information about elevation and size and you can do 3D modeling on the fly.
Rich: I mean, obviously Google’s achieved a lot here. How far along is Mapzen, like, side-by-side comparison, Mapzen, Google Maps.
Aaron: Google has been at geocoding for 10 years, and it’s a hard problem, it’s basically trying to read peoples’ minds.
Paul: And they have 7,000 people.
Aaron: And they have 7,000 people, but we have an open-source geocoder that just keeps getting better.
Rich: How many people is Mapzen?
Aaron: We are about 50 all in all right now.
Rich: All right, so it sounds like you’d probably need to hire a few people to compete.
Aaron: That’s always the great question. It’s 2016, it’s kind of remarkable what you can get done with a small team.
Aaron: Versus being realistic about expectations for peoples’ work-life balance.
Aaron: And burning out.
Rich: Let’s say I had an idea to start a new start-up where you can ask for a car to pick you up and take you to another location, and I need mapping technology. Can I use Mapzen?
Rich: And it would work, reliably?
Aaron: Most places, most of the time.
Rich: That’s great. I mean, that’s a lot. That’s directions, that’s…
Aaron: I mean, part of the issue is, this has always been the problem with geo, it’s quality coverage and then licensing.
Aaron: You know, our goal is to do that for the entire planet. Have we done it yet? Not quite. But we’re getting there.
Paul: But that’s the goal?
Aaron: That’s the goal, and we do a lot of work, we use OpenStreetMap for both the map tiles and the routing directions, so that’s a dataset that just keeps getting better —
Paul: That’s sort of like the Wikipedia of streets. Everyone contributes…
Rich: Another app I wanna create that lets you find restaurants. Mapzen: can I do it?
Aaron: That’s one of the things we’re working on.
Aaron: The short answer is today…mmm…maybe?
Aaron: Depends on where you are.
Aaron: The longer answer is if we do our job right, absolutely.
Rich: Well where would that data come from, anyway? Where are you getting, like, business and…
Aaron: So this is the thing about venue data, is there is no open venue data.
Aaron: And what I mean by that is, OpenStreetMap does have a lot of open venue data, but OpenStreetMap as essentially a viral license that says, if you use our stuff to make something else, you have to license it under the same terms as OpenStreetMap.
Aaron: Which is OpenStreetMap’s prerogative, but it’s not necessarily what we wanna do. We would much rather have a license where we’re like, we’re doing this as an open data source, because it’s the right thing to do, and if you need to make a commercial product, that’s your business.
Paul: So you can’t, like, go to the IRS and say, give me a list of all the, all the businesses?
Aaron: You can do it…uh…
Rich: You can do that, they may not open the door.
Paul: Yeah, no, sure.
Aaron: I mean, again, it depends on what your target is. If you’re just talking about the United States, then there are lots of data sources that you can pull from.
Aaron: That you can do it over time and harvesting. And we’re definitely looking at some of that. But you asked about a gazetteer, I mean, the project that I’m working on immediately at Mapzen is an open gazetteer. And so —
Paul: A list of every place in the world.
Paul: OK. How big is, how many places? How big is the world.
Aaron: Well, we have a fraction of it right now. There are ballpark 3,000,000 postal codes, probably 4 when we factor in the things that we don’t have, probably 2, 2.5 million administrative places, so cities, neighborhoods, regions, and then venues, it will only ever get bigger. We have been importing an old open dataset from 2010 that a company called Simple Geo released, which is business listings. And so right now, that’s 20,000,000 venues.
Paul: A lot of hair salons, a lot of, like…
Aaron: A lot of hair salons.
Aaron: A lot of plumbers. So yeah, we probably don’t have the hipster bar that opened up last week.
Aaron: But on the other hand, if you need a plumber in Kansas or Missouri or rural California, we probably have it.
Rich: Have you thought about the sort of getting an engine going that seems to power a Wikipedia? Meaning there’s sort of this motivation out there to put the right information in the right place?
Paul: It’s tricky, because OpenStreetMap has a lot of that. But it’s not…as open…
Rich: I don’t know, but does it have the tooling to say hey…because if I’m opening a little cafe somewhere, I’m pretty motivated to put it in somewhere, but I, I don’t know of that text box anywhere.
Aaron: We haven’t done that yet.
Aaron: It’s a thing we talk about.
Rich: Do you imagine the motivation is there? Is that…I think it would be there.
Rich: I mean…
Paul: Oh, people will list their businesses.
Rich: Commercial businesses want to be everywhere, so.
Rich: Why the heck not?
Aaron: Yeah. What we don’t have in place right now, and this is a really important thing, I mean, this is what, this is what Heather sort of pioneered at Flickr, is we don’t have a community management support infrastructure for that volume of…
Aaron: User-contributed data.
Rich: It’s gonna be bananas.
Aaron: And people are gonna argue about stuff. And so right now we’re working on the scaffolding, to just manage something that big.
Paul: Oh, so you’re gathering all this data together for the gazetteer, and programming, and doing some work to, like, get that data into a good place?
Paul: But you’re assuming that eventually, human beings are gonna show up and have opinions about the data that you’ve gathered.
Aaron: Yes. So the gazetteer is called Who’s On First.
Aaron: We’ve said two things. One is that it is not a gazetteer of geometries or geographies, per se, it is a gazetteer of consensual hallucinations. [laughter] We may disagree about where the Flatiron starts and stops, but we all know it’s there.
Paul: Yeah, no, that’s true. Everyone would there is a neighborhood called Flatiron in Manhattan.
Aaron: And so the other thing we want to do is that it is a gazetteer of signal fires, so as much as possible, we are trying to separate interpretation about the facts from the facts themselves. Where Who’s on First doesn’t try to solve the debate, but it would like to just be a place where the debate can be managed and reflected, and leaving decisions and interpretations about that to actual consumers of the data.
Paul: What do users of geo tools fight about the most?
Aaron: Uh…spherical mercator and Google’s decision to turn the world into a square.
Aaron: And whether it’s lat-long or long-lat.
Rich: Somebody’s cried about that. There are tears that have been shed over the square.
Paul: What do you think the users are going to do when this data’s in front of them?
Aaron: Immediately, not much. But you’re not meant to have a gazetteer experience, you’re not meant to wake up in the morning and, “You know what I wanna do? I wanna, I wanna Who’s on First as a verb!”
Paul: Well…you’re lookin’ at me, saying that…I mean, I really do. I’m like, give me all the places. That sounds really fun. OK, but you’re right. Your average, your civilian —
Rich: It’s gonna slip into their workflow.
Aaron: Well the way to answer that question is to say that nobody understands why a gazetteer is important until they suddenly need one, and then they’re like, wait! Oh…what…how do we…
Paul: And then, no, that’s, that’s been the miracle of the web to me, right? Is that you would, you’d be like, I wanna build this thing, and then you very rapidly stumble into the need for a large set of data, with a lot of tasks, like, I need historical texts, or I need a list of places, or whatever. It’s, it’s just amazing how often you get back to that, and that whole part of our world is surprisingly untended, right? Like, you go, and you’re like, ah! Get this list of businesses — but it’s from 2010.
Aaron: Mmmm hmmm.
Paul: And no one has adopted it since. I’ve actually been thinking, like, there isn’t really, as far as I can tell, maybe, you’d know better than I would, but there’s an idea of like, I’m gonna adopt this open-source project, or I’m going to give this into the commons, or I’m gonna open this thing, but there’s no culture of adopting big datasets and taking care of them, in the same way that there is, like, putting stuff on GitHub and doing releases of open-source software. That I know about.
Rich: I’d jump in on that. I mean, Wikipedia is a big dataset.
Rich: But the geologist really cares about that rock article.
Paul: They do, but what I’m talking about is like the…the open list of fossil sites, is like always one person, right?
Rich: That’s what — that’s good.
Paul: It is, but then there’s kind of no culture where you go, like, hey, I don’t wanna maintain this list anymore. That’s it. Like, a big dataset goes out in 2010, and then it doesn’t get updated again.
Paul: And nobody — there’s no real culture where somebody goes and is like, I’m gonna take this over.
Paul: There’s no, like, heroic narrative, where it’s like, we’re gonna do it…
Aaron: Well —
Paul: I think it’s sad.
Aaron: I guess the…the example of people who are doing that are the New York Public Library.
Paul: They are. That’s true.
Aaron: That’s a good example of trying to deal with both just, processing the data, in whether it’s the menus project, or the theater bills, or building inspector…
Paul: Their labs is very strong, yeah, that’s very true.
Aaron: Yeah. And then providing tools for letting people work in little atomic units.
Aaron: But even then, I mean, some of it’s a question of scale. I mean, for all that the NYPL does amazing work, they’re pretty reluctant to offer those services outside of New York City.
Paul: No, of course. You know, it’s…what’s bugging me is that I think that everyone sees code as the infrastructure for creativity and doing new work online, and I think it’s also data, and we don’t really, that’s not a conversation that people have that much.
Aaron: Yeah, I mean, one of the things that I would say to friends that worked in the news industry, you know, five, six years ago, when the zeitgeist was the death of all news publications —
Paul: Well, and there was a news industry back then.
Aaron: Right. But, you know, the thing about, if you talk to people in the newspaper industry, they’re like, sure, maybe the economics of ink on paper aren’t what they used to be, but I don’t think news is going away, and by the way, you’re sitting on, some of you, like The New York Times or The Guardian, you’re sitting on 200 years of history.
Aaron: With people who know how to find stuff in there and understand the connections and the relationships. Maybe that’s your business.
Paul: No but they see the articles as the thing rather than the network that’s inside of there.
Aaron: Well then yeah, I mean, the market forces of distribution and publication will probably catch up with them.
Rich: Can someone acquire Mapzen? I guess that anything can happen.
Aaron: Anything can happen.
Paul: Well right now, it’s funded by Samsung, right?
Aaron: We are part of Samsung Research America, yes.
Paul: Which I, you don’t have to talk about this, I just see that as Samsung going, like, OK, there’s Apple…
Rich: Just in case.
Paul: There’s Google…[laughter] Do we have a global mapping solution? And someone is like, “Oh God, we gotta get one of those.” They just bought Joyent, so they could have a cloud.
Aaron: I think it’s part of a larger project, just to make sure that we have the things that we need in house. I mean, you know, as a company we make all the things. We make cargo ships.
Aaron: Which is —
Paul: Oh they have a — Samsung has a, an amusement park with skiing.
Paul: Yeah. Just in case. [laughter] No, but I mean, honestly, like, bless them for doing it this way, which is like, here, dump it all back into the commons in a nice, structured format that, like…
Rich: The motive, park the motive for a second, this is a really cool open project for now. It happens to be funded not by government money, but private money. So who knows how the movie ends, right?
Paul: Do you think that’s a good thing, though? Do you think that the corporate benefactor is a good kind of benefactor?
Aaron: Yes. Or rather, I’d like to live in a world where people can say that without…
Rich: I think…
Aaron: You know, with a straight face, I mean, I think that we’re better off if that’s possible.
Paul: No, you know, I’ve been looking at Mapzen on and off for a while, and, like, it’s exactly what you would want a global open mapping project to be like if it’s goal is to be of utility to as many people as possible in the future.
Aaron: Yeah, I mean —
Paul: Regardless of who they are, like, come in and get this data, come in and use these services.
Aaron: One of the things that we talk a lot about is we’re not the first company to try to do open geo. Other companies have tried and they failed for whatever reason. And unfortunately, the response has usually been, you know, I would look at you Paul, and be like, oh, OK, so this open geo company’s gone away, do I have the Google API key, or do you?
Paul: That’s right, see, you guys, you have adopted geo, like you’ve adopted this world.
Aaron: I mean, the goal is to succeed. The goal is to do all the things.
Aaron: And hopefully we will, and maybe we won’t, and if we don’t, then the goal is for it not to be a complete reset to zero, that there will be something that someone else can pick up and run with without having to start all over again.
Paul: So let’s, let’s talk about that, that’s actually a great way to conclude this: how are you preparing your gazetteer, which isn’t yours, it will belong to the world.
Aaron: Mmmm hmmm.
Paul: How are you preparing that for the future, where you may not be around, Mapzen may not be around, the United States of America may have become something very different. There’s going to be this thing, this data thing, that you made.
Paul: And you want other people to use and add and do things with it.
Aaron: Yeah. I mean, to start with, one of the early decisions we made was that we wanted a gazetteer that was portable and robust and durable across time. So beyond —
Paul: How do you do that?
Paul: Print it on brass?
Aaron: We’ve joked about that.
Aaron: In the interim, it’s text files. It’s a gigantic bag of text files, because it turns out —
Paul: God, those are the best.
Aaron: They’re the best.
Aaron: I mean, there’s lots of other super-efficient, very clever data formats, but they often require a whole bag of Google on your computer, or another company, you know, they’re proprietary, binary formats —
Aaron: That are optimized for something.
Paul: There are two good ways to store data: text files and SQLite. [laughter] I’m just sayin’ that. I wanna, I wanna —
Rich: Now that’s a way to close a podcast.
Paul: Yeah. So flat text files.
Paul: That I can just go get a text file — it’s on GitHub?
Aaron: It is all on GitHub.
Paul: So you have built this thing to be shared, added to, and used. It’s in Git, and there’s a whole set of cultural understandings about collaborating, working together in Git, and taking things away, adding to them, bringing them back in.
Aaron: Yeah, but things that are in Git, or GitHub, is more the reflection of the kinds of features and functionality that we want, that Git provides right now, or does the best approximation of providing. It has some problems. Git doesn’t really do well with a million tiny files.
Paul: And there’s millions of tiny places.
Aaron: There are lots of places in the world.
Aaron: But we point to Git as a way to say, this is how it should be, and we’re gonna sort of use bubblegum and duct tape of this service, and either the technology will catch up and Git will make advances and suddenly it will be fast and smooth and easy, or we will figure out…
Paul: Figure something out.
Aaron: Figure something out.
Paul: All right, so…how long are you gonna be working on this gazetteer?
Aaron: As long as it takes.
Paul: You’re in. You’re down to work on this gazetteer for a while?
Aaron: You know, the gazetteer was the bane of my existence at Flickr. I mean, I knew the people who were working on it, and they were good people, but it was always an issue. And I don’t wanna ever have to do this again.
Paul: So a decade in, you are solving a cultural and personal problem.
Aaron: It turns out, you know, things have a way of coming back. I don’t think that it should take a decade to finish Who’s On First.
Aaron: I think that there’s a lot of grunt work to do right now, and just for practical purposes, it should be able to get to a point where it’s mostly, you know, doorknob polishing and incremental updates, because the world keeps changing.
Aaron: But it needs to be a thing that we can just do the first version and put it down and live with it for a while and get on with other things.
Paul: So right now you have to catch up with the world, and then you can sort of chill out a little bit.
Aaron: Yeah. We might deal with historical places after that.
Paul: Oh, that’d be fun. Like Carthage.
Aaron: Uh…yeah, or the example that I always use is there’s a project that came out of George Mason, George Washington, one of the universities on the East Coast, called Manifest Destiny, which was a snapshot of the United States at every moment that a state or a new bit of land joined the Union. And so that means —
Aaron: Between the years 1759 and 1879, or something like that, the US changed 141 times. And we will have Who’s On First records for each one of them, and it will just be a big linked list, pointing to the thing that came after it.
Paul: That’s a worthy goal. So if I wanna get in touch with you, what do I do?
Paul: Oh, that sounds pretty nice. And if I want to go look at your gazetteer, what do I do?
Aaron: whosonfirst.mapzen.com. And all of the links are there.
Paul: And if I wanna learn anything else about Aaron Straup Cope in general, where should I hit?
Aaron: I guess…
Rich: You could say nowhere, that’s fine.
Aaron: No, I guess the weblog. That would be aaronland.info/weblog.
Paul: It’s a classic, classic site. Aaron Straup Cope, thank you.
Aaron: Thank you.
Rich: Thank you, Aaron.
Aaron: Thank you.
Paul: Well, Rich Ziade.
Rich: I feel like I know my place in the world now.
Paul: I do, too. I love the work that Mapzen’s doing.
Paul: I really do. I think this is exciting stuff. There’s something about data that you can use and that has real caretakers keeping an eye on it, that’s so foundational to the internet, and I’m glad we got to talk about it a little bit today.
Rich: Mmmm hmmm.
Paul: So if anyone wants to get in touch with us, here at Track Changes, the official email’s firstname.lastname@example.org.
Rich: It is.
Paul: Send us an email.
Rich: We love email.
Paul: We love hearing from our listeners, and we will be back really soon. Rich.
Rich: Have a lovely week. Thank you, Aaron, again.
Aaron: Thank you.
Paul: All right. Bye everybody.