Our dangerous reliance on big data: in an episode recorded before the election, Rich Ziade and Paul Ford talk to Cathy O’Neil, author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. They discuss Cathy’s origins in the math world, her years at a hedge fund on the brink of the 2008 financial crisis, the lack of transparency in the Department of Education’s data, and the various examples of “weapons of math destruction” in her book — all the ways that data is used to harm.
Paul Ford: Hello, and welcome to Track Changes, the official podcast of Postlight, the product design studio — wait, what are we now?
Rich Ziade: We’re a digital product studio, Paul. [laughter]
Paul: OK. We’re locking it down. We’re locking it down. We are a digital product studio.
Paul: At 101 Fifth Avenue. You have any problems, you just get in touch. Any kind at all, even if, like, your leg hurts.
Paul: We’ll figure out a digital solution.
Rich: We take meetings.
Paul: So my name is Paul Ford. I’m a co-founder of Postlight.
Rich: Rich Ziade, also a co-founder of Postlight.
Paul: And Rich, have you ever heard of a phrase, let me, let me see, this is an unusual phrase, you may never have heard of it before: “big data.”
Rich: I have heard of this phrase before.
Paul: OK, that’s popped up in your career so far?
Rich: It’s popped up in a few…articles, headlines, futurism essays.
Paul: I think we have in the studio today America’s leading critic of big data.
Rich: Critic. Interesting.
Rich: I’m have not —
Cathy O’Neil: America’s?
Paul: The world’s. [laughter]
Rich: I didn’t know big data had critics. I thought it was critic-immune, because it’s big data.
Paul: Well you know who’s gonna tell us something about that is probably not you or me, because we don’t know a damn thing about it. [laughter] But Cathy O’Neil, who is, Cathy, where do we even start? Hi, welcome to Track Changes.
Cathy: So glad to be here. Thanks for having me.
Paul: You’ve written several books. This is just the most recent.
Cathy: Well I wrote, I wrote another book. Let’s not exaggerate. [laughter] And just a, just, are we allowed to swear on this podcast?
Paul: Sure. They…yeah.
Cathy: I was just gonna say a buttload of, like —
Rich: A buttload! Green light on buttload. [laughter]
Rich: Don’t worry about buttload.
Cathy: That’s just where it’s startin’.
Paul: I always had this fantasy, if I had a New Yorker cartoon, where they’re doing surgery on someone and they’re holding up a mass from the person’s body and they go, “Wow, he had a book in him all along.” [laughter]
Rich: That’s strong.
Paul: It’s good, right?
Rich: You need, you probably know somebody twice-removed from The New Yorker. That needs to get in there.
Paul: I think somebody may have done it, too, like, I can’t tell.
Rich: In, like, the 40s.
Cathy: Is it a memory or an idea?
Paul: Yeah. But it’s a good, it’s really good, I’m gonna take credit until someone on the internet points it out. So the most recent book that you’ve written of your many, many two books, is Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. And that is brought out by Random House, by the Crown imprint in Random House.
Cathy: Yes. Exactly.
Paul: Good people at Random House. Good people at Crown.
Paul: Just making it work and creating a critical framework for big data. What is big data?
Cathy: And they allowed me to have a really long subtitle, which is always a good sign.
Rich: The title, could we talk about the title for just one second?
Cathy: Yeah, yeah.
Rich: What a killer title.
Cathy: Thank you.
Paul: Weapons of Math Destruction?
Rich: I mean…
Paul: Well also I gotta say, you go like, OK, weapons of math destruction, but then you go all the way. It’s WMDs all through the book.
Cathy: Yep. I’m not kidding.
Paul: No, you are in there for that — you, I appreciate that in a title. It wasn’t just, like, hey this is cool.
Rich: It was catchy.
Paul: No. We’re going in. We’re gonna, we’re gonna set up a pattern and a framework —
Cathy: I’m delivering. You know, it’s because I’m a math nerd. Because, like, I wanted to perform triage on algorithms, which are the things that I care about. Like everyone always has these vague, sort of unfocused discussions about the harms of data collection, and I’m like, which harms? Exactly how is it gonna happen? Can we talk about that?
Paul: You say math nerd, but math nerd is like, [mocking voice] “I like puzzles.” You’re like a math educator who, you got a some degree in some sort of math. [laughter] That’s where my brain just stops.
Rich: Well let’s spend a minute and talk about how you got here.
Cathy: OK. All right.
Cathy: School, yeah. I went to UC Berkeley, um…
Paul: When did you first know that math was your love —
Cathy: Oh, oh —
Paul: Let’s go all the way back.
Cathy: When I was five.
Cathy: When I was five, my mom said —
Rich: Where are you, geographically?
Cathy: Oh, I’m in Watertown, Massachusetts at the time.
Cathy: And my mom set up spirographs, because, like, my mom was a computer science professor and was too busy to hang out with me, so she was like, let’s use these really sharp pins, you know spirographs really have, are pretty dangerous.
Cathy: And she set it up for me and I started playing with spirographs. Do you guys remember what that is?
Paul: Oh yeah.
Rich: I do remember.
Cathy: Like, little gears inside larger gears.
Cathy: I remember —
Paul: And they draw all these sort of beautiful patterns.
Cathy: It’s really cool —
Paul: And when you’re five it’s the most beautiful thing you’ve ever seen.
Cathy: But also like, and I love colors, so I was using different color pens, but they’re periodic.
Paul: Mmmm hmmm.
Cathy: And I remember understanding prime numbers. I remember thinking, like, here’s something that’s period six —
Rich: At five?
Cathy: Yeah. I just remember getting it. I was like, oh, period three, that happens twice in a period six, and I was like, these are the basic constituents of a number, are like the way they can have sub-periods of a period.
Paul: Rich, what were you doing when you were five?
Rich: Let me really actually think about it. I’d just got to the country. I was…trying to learn the English language.
Cathy: That’s a biggie.
Rich: Yeah. So that kind of threw me off.
Paul: I was —
Rich: I’m not the typical ‘what were you doing when you were five.’
Paul: Yeah. I was refusing to learn how to read. I was told that I could learn how to read and I was like, I have no interest in that, I’ll do it when I’m six, that’s when you do it. [laughter]
Rich: You don’t tell me when I’m gonna read.
Rich: I’ll tell you when I’m ready to read.
Rich: Fair enough.
Paul: That’s, that’s —
Cathy: Yeah, I get that. I’m a stubborn person myself.
Rich: Three really unique, interesting people right here in the room. [laughter]
Paul: Or at least one. Um…
Cathy: And then so like, skip ahead 10 years, I end up at math camp when I’m 15.
Paul: Mmmm hmmm.
Cathy: It’s because, like, not randomly, I chose to go to math camp because I liked math. But that’s when I really fell hard, because I learned how to solve the Rubik’s Cube using, like, group theory, and I was like, this is it, this is all I’m gonna do.
Paul: So this is math math. This is like, oh, hey, group theory, cool, good way to solve a Rubik’s Cube.
Cathy: And any other kind of cube puzzle, like, Rubik’s-type puzzle…
Paul: Whereas I would smash it against the wall and slowly put it back together.
Cathy: Well you can do that too.
Rich: Which also works.
Cathy: You can even just take the stickers off sometimes, yeah.
Paul: That’s more chaos theory.
Cathy: Yeah. So then I was like, I love math, and math is pure and beautiful and, like, at the same time, you know, I was in a, you know, good school system, but like, the teachers I had weren’t consistently very good, and I just remember very strongly this feeling, around the same time, where my social studies teacher was talking about Manifest Destiny as if it was like, true. Like, not a historical interesting thing we should talk about, like, should we really kill all the Native Americans. It was just like, and then we realized…
Paul: That we were, we needed to get off that city on a hill and head west…
Cathy: And it was just like, to me, it was so fraught, you know, like obviously you believe this, but you’re wrong, and I believe something else but I’m not gonna talk you into it, and it was like this, this kind of realization that there’s, like, the dirty world of reality and, like, history, and like, law, and all those things that are confusing, and politics, and then there was math, and math was, like, perfect, because even if I dislike someone desperately, like, we’d have to agree if something was true.
Paul: Right. That is very tricky in high school, right, because you’re not fully prepared to reject consensus reality. There are parts of it that still matter. You want friends, you want to participate, you don’t want to be rejected from the system, but at the same time you’re getting these clues that all is not as is supposed to be.
Cathy: And it’s a tough time when you’re 14, 15, 16, like, you are, like, someone who’s laser-sharp at noticing hypocrisy?
Cathy: And inconsistency, and you just, you want to correct everything in the world, and you have no power. So you have, like, this amazing power of observation, but no actual power to change anything.
Paul: See as you’re saying that, now I’m old and I’m on the other side, and I’m like, teens are exhausting.
Cathy: They are exhausting.
Paul: Oh my God.
Cathy: But I get them.
Paul: Yeah. Fair. Fair.
Cathy: I have two teenagers myself, and when they point out my hypocrisy, I’m like, yup.
Paul: [very long sigh]
Cathy: You’re right, you’re right.
Rich: Yeah, I think computers were like that for me in terms of, like, just shelter.
Cathy: Yeah, it’s a refuge.
Paul: Yeah, it was a ritual for me. I could go somewhere…
Cathy: And I could make something happen.
Cathy: I could program when I was that age. I would program many, many adventure games on my…
Paul: Mmmm hmmm.
Cathy: Apple IIE.
Paul: So you go to math camp.
Cathy: So I go to math camp, and I’m like, this is it, people.
Paul: How math campy was math camp?
Cathy: It was unbelievably math campy.
Paul: I would have to imagine.
Cathy: It happened at Hampshire College, which is like a hippie —
Paul: Oh yeah. You can get a degree in, like, world systems there.
Rich: I love that it’s in…you’re in the forest?
Cathy: You’re in the forest.
Rich: And I love that.
Cathy: And the cornfields.
Rich: I love the math camp aspect of it.
Paul: Yeah, there’s no grid in Hampshire.
Rich: But did you swing off tree, off branches and play in the woods and stuff like that as well, or are you just holed up doing math?
Cathy: Oh no, so we would do math in the mornings and then we’d have the afternoons free to, like, I played bridge all the time.
Rich: Bridge…with cards?
Cathy: With cards, yeah.
Cathy: And then we did, like —
Rich: So you’re not swimming and playing volleyball?
Cathy: Problem sets in the evening. Oh God, do I look athletic to you? [laughter]
Paul: As a former, as a former nerd who might still be a little nerdy…
Paul: Maybe. I’m imagining what the boys were like in math camp, and your, like, overall…you must’ve been the coolest person at math camp.
Cathy: I was really, really, really cool at math camp. By the way, I have a blog, which I started, like, four or five years ago, called Mathbabe, and there’s like an origin story for that name, Mathbabe, which was basically I went to math camp and I was the Mathbabe. I was a babe.
Rich: That’s funny.
Cathy: It was so inconsistent with my previous life. [laughter]
Paul: Oh yeah, no, I mean, it just…we don’t need to drill too deep in this…
Rich: We’re in college…
Paul: We go to college…
Rich: We’re deep. We wanna get to this book.
Cathy: OK, so went to college, majored in math, loved it. Went to Harvard, got a PhD in number theory. Then I went to MIT as a post-doc and ended up at Barnard College being a math professor, and loving teaching and loving number theory.
Rich: What year are we now?
Cathy: But by the time I, you know, I was, like, 20 years later, I was 35 or something, I was like, you know, I don’t know if this is actually what suits me. I’m a kind of person who wants feedback, I wanna know that I’m, like, having an effect on the world, and number theory, math, like academics is slow, but number theory is the slowest above everything.
Paul: This is the thing, like, I’m hearing you tell this and you’re about to leave, but when you’re in that world, you went and got that PhD, you’re not really supposed to leave that world.
Cathy: No. Especially if you have —
Paul: You got —
Cathy: What I got, which was amazing. A beautiful situation.
Paul: Yeah, you’re teaching at Barnard.
Paul: OK, this is like —
Cathy: I could take students at — it was part of the Columbia math department, so I was teaching, you know, wonderful students…
Paul: Total cultural pull position, like, as a mathematician who loves pure math, it’s hard to do better than that.
Cathy: It is, and I wasn’t trying to do better than that, in that sense, right?
Cathy: What I was realizing was that it wasn’t my tempo, right? So I was, like, I wanna be a businesswoman. Like, so anyway —
Paul: Wait, did you like teaching? Did you like undergrads?
Cathy: Loved, loved teaching.
Cathy: And I loved advising, to be honest, and I would have been happy to do my share.
Cathy: Yeah. But I just didn’t want to be a martyr. Again, I did actually want a different pace. So I was like, oh, I’ll just go work at, you know, D. E. Shaw, which is a hedge fund which a lot of my friends have already gone to work at, and like, it’s 2006 and everything looks great.
Rich: I mean, it’s like…value this hedge fund for me. Dollars. What are they managing?
Cathy: You mean their investments, like, something along the lines of $25 billion.
Rich: Right. Huge.
Cathy: It’s big, yeah. And it was a big player then. It’s probably just as big now.
Cathy: Yeah. So I went there and you know, then the world fell apart.
Rich: ’06, ‘07.
Cathy: I mean, I got there, you’re an academic, you get a job in 2006, you take it in 2007. I joined in June of 2007.
Rich: Right —
Paul: Oh wow.
Rich: Oh wow.
Cathy: Within two months it was a situation.
Paul: You literally walked into a burning building and, like, sat at your desk.
Paul: OK. [laughter]
Cathy: I walked into a burning building where everybody was, like, really smug. Happy to see me and really smug, and were like, we’re so good at this. And then everything fell apart, and they were like —
Paul: It’s true, you found the one —
Cathy: We have no idea what we’re doing.
Paul: You found the one, the one cohort that’s more exhausting than academics [laughter] is hedge fund quant traders.
Rich: That must’ve been cool to watch, though.
Cathy: It was. It was interesting.
Paul: Because you weren’t in the game that much, so like, now you’re just, you’re like, flying the plane through the hurricane.
Cathy: I’m not flying the plane. I’m on the plane. [laughter]
Cathy: But I didn’t, yeah, it was like —
Rich: You’re still kind of an outsider.
Cathy: I’m an outsider, I’m an anthropologist, like…
Cathy: Observing the natives, essentially.
Paul: What did it feel like on the ground.
Cathy: I think it was mostly just bewildering.
Cathy: At some point relatively soon after I joined, after the 2007 kerfuffle, which was not really noticed by most of the people, but noticed within the street, we had a discussion about mortgage-backed securities, and how these AAA ratings of mortgage-backed securities in particular were given to certain kinds of packaged mortgages, like, pools of mortgages, once they were tranched, and then the lower tranches, like the really risky stuff, was then reconstituted, new kind of sausage was made with them, and the highest tranches of that shitty stuff was again given a AAA rating. I just remember that moment when the guy who was, like the managing director who was explaining this to us, said that, and I was like, well, why would that get a AAA rating if we know it’s bad?
Paul: I’m literally seeing the hmmm emoji in my head as you’re saying this.
Cathy: Yes! You should.
Cathy: And he sort of was like, yeah, that’s just how it works, and I was like, that sounds stupid. That sounds terrible.
Rich: Well, the analogy I always love to use is the breakfast buffet, when there’s like, still a little bit of oatmeal at the bottom of the pan, and you make new oatmeal for the next day.
Rich: You just keep that oatmeal. It’s all good. You just mix it right in.
Cathy: Well that’s a little bit different. I think…
Rich: I know…
Cathy: I think the better analogue would be, like, you have 15 restaurants that all make oatmeal, and they all have a one-inch thick old oatmeal at the bottom, and they scrape that old oatmeal together, and then re-serve it.
Cathy: Like they take all 15 different pots.
Rich: It just —
Paul: So the oatmeal —
Rich: The thinking is, it’s masked.
Paul: Is that an oatmeal tranche at that point?
Cathy: It’d be one thing if you were mixing it with good oatmeal, but you’re just mixing all the shitty oatmeal together and saying it’s good.
Rich: Yeah, yeah. Oh yeah, it’s even worse.
Cathy: It’s worse.
Paul: So you’re like, this is AAA oatmeal.
Paul: And you’re in there going, that oatmeal just tastes like shit.
Cathy: That oatmeal is hard. It’s hard oatmeal.
Rich: So your reaction to them saying, well, this is what we do —
Cathy: It’s not what we did, to be clear. It’s what “we” as a financial industry was doing.
Rich: Industry was doing, yeah.
Cathy: And we were, like, trying to hold it at arm’s length. We were, like, oh, we’re not investing in this stuff, it’s rotten stuff, and, you know, it was rotten, but what was bewildering was that even in spite of the fact that we had insulated ourselves, or so we thought, from that market, we hadn’t at all.
Cathy: At all. And so really what was happening was that we were realizing that all the other institutions that had a toe in the mortgage market and in all the other stuff that we did, they were unwinding their positions in the stuff we did, so our positions were looking worse and worse. There was no way for us to actually be independent.
Rich: So you’re two or three degrees of separation, but still connected.
Cathy: We were very well embedded in that whole thing. And everyone else was, too.
Rich: Well, everybody was.
Cathy: So we lost a lot of money in spite of the fact that we thought we were clean. And none of the previous historical data would have told us you’re gonna get screwed, because, you know, that’s what we followed. We believed in that like a religion.
Cathy: So basically two things from my experience there. Number one is that, you know, we were really very arrogant about understanding what the data is saying and what we think we can conclude from that, but also that we were sort of all stymied and blindsided by this sort of mathematical lie, this AAA rating thing, and when I say “we,” I mostly mean investors. [laughter]
Paul: Mmmm hmmm.
Cathy: That were outside of Wall Street, and, like, you know, Norwegian Fund or whatever, all the people that —
Paul: But now you’re in the world of actual big data.
Paul: Big piles of investment data that is being bundled up and people are going, yeah, the model says that we can classify that as AAA and get away with it.
Cathy: Yeah, yeah.
Paul: So this is like, this is a moment, right, when you’re going, like, all right, this is actually nonsense.
Cathy: So go back to my moment when I was like, I’m in math. Rubik’s Cube, you know, it’s beautiful because it’s true.
Paul: Mmmm hmmm.
Cathy: I was like, this isn’t true.
Paul: It was just fundamentally —
Rich: The alarms went off.
Cathy: This is a lie. This is in fact a weaponized mathematical construction.
Paul: Is it fair to say that when people talk about you, the words “go with the flow” don’t come up a lot?
Cathy: I’m like a professional quitter. [laughter]
Paul: OK. OK, yeah, I mean, this is like, mmmm, mmmm, mmmmm mmmmm [presumably making some gesture!].
Cathy: I was like, this is not… So I quit, and I went to —
Rich: You did quit?
Cathy: I did, yeah, I quit.
Cathy: Also people were like, why are you quitting that job? Like, you know, same thing.
Cathy: I was like, because I like quitting. And then I went to work in risk to try to, like, double down on math.
Cathy: You know, I still believed in math as a solution. And spending two years there, and then realizing no, math wasn’t the problem, actually. It wasn’t that we had the wrong models. It’s that we really don’t care about the truth. It’s that it’s become a political thing, so when you’re talking about risk, I spent a year sort of fixing the credit default swap risk model for, like, hedge funds and banks, you know? And the value-at-risk model…
Paul: Help me understand — when you say “fixing the model…”
Paul: What is…is the model a Microsoft Word document? What is the model?
Cathy: [deep breath] You mean, like, technically speaking?
Paul: Yeah, like, everyone talks about these models, and I’m like, what is it, just a page of instructions, is it a big spreadsheet — what is the model?
Cathy: OK, so I’ll talk about it theoretically and then I’ll tell you about what it actually, what happens, physically. So theoretically what the model is is something you put in data, and we actually train it on lots and lots of historical data, but the data you put in on given days, like your portfolio —
Cathy: If you’re a bank, and then the output is how much, it’s called the 95 VaR, how much, is like, if you have 100 days, what is the 95th-worst day of returns for that portfolio? What can you kind of expect for a bad day? So that’s kind of the idea of, like, the risk that you’re holding.
Paul: So there’s this set of functions, you’re pouring information in, and you’re aiming to get this one number?
Cathy: One number.
Paul: One number so you can assess that set of investments against other sets of investments.
Cathy: Yeah, so how much would this portfolio lose on a bad day?
Paul: OK, and by doing that with lots of different portfolios, I’m able to understand my risk by comparing.
Cathy: Yes, that’s the idea.
Paul: OK, OK.
Cathy: Using tons of historical data.
Cathy: And that portfolio could be a simple, I own Apple stock and nothing else, right? So it starts there. It starts with individual instruments —
Rich: It could be a portfolio of anything.
Cathy: Yes, but you combine those to different, you sort of model each instrument individually, and how they combine together. Anyway, the point is that there’s lots of assumptions that go into, in particular, how stocks and instruments and bonds and credit default swaps, how they might lose or gain value in synchrony, right? These assumptions were, for the credit default swap case, like, just false. They were very simplistic and, and in particular, they assumed that the returns for a given credit default swap was sort of Gaussian, like normal bell curve kind of shaped as a distribution.
Cathy: Which is to say, like, sometimes things will go bad, but not that often.
Cathy: What I did was very simple: I just looked at the actual returns for credit instruments, like credit default swaps, and it was nothing like a normal distribution, at all.
Rich: So you looked at the historical…
Cathy: Historical returns…
Rich: Returns, yeah.
Paul: Wait, so people weren’t doing that? They were just, like, no it should look like this?
Cathy: Yeah. And not only that, but I think they still do it that way.
Cathy: And they do it that way because it simplifies the calculation.
Paul: Well that’s a bad reason!
Cathy: Yeah, it’s a bad reason. [laughter]
Paul: Yeah, you shouldn’t build the foundations of the entire global economy on convenience.
Cathy: Yeah, well, here’s, here’s what’s worse, is like, at first I thought, they’re doing this because they don’t know how to do better, so like, I worked it out, and I figured out how to do it better. And then, like, I was like, actually they’re doing it this way because the risk is underestimated this way, and they like it that way.
Paul: [long, resigned exhale]
Rich: Well, I mean, that’s optimism. [laughter] Let’s have a good attitude about things here, guys.
Paul: [deep, deep sigh]
Cathy: Anyway, but my point being, like, two years later, I’m like, no, it’s not a math problem.
Paul: OK, so you, you literally, it’s like Heart of Darkness and you’re going up the river towards financial…
Cathy: And I felt like, at that point I felt like, I’m part of this, and I’m, I’m one of those PhDs they keep pointing to in the corner saying, “We got this.” You know? And I just didn’t want to be that. I —
Rich: And where were you at the tail end of this? Where…
Cathy: I was at Risk Metrics, which was just in the Chase building on Wall Street.
Rich: OK. What is Risk Metrics?
Cathy: It was the company that built this risk model.
Paul: So it builds some risk, and is the risk model sort of delivered as software, I wanna go back to that.
Cathy: Yeah, yeah, yeah.
Cathy: So the portfolios are uploaded to Risk Metrics’ grid, and then we do these ridiculous calculations overnight in batch jobs…
Paul: OK, so it’s like a cloud servivce.
Paul: OK. I like to know how the world works or doesn’t — it is just terrifying. Because you’re describing this, and it’s just all very normal, everybody’s going to their job, there’s a lot of bankers going, like, well that curve looks fine to me! And then just one day… [explosion noise]
Cathy: Right. I don’t mean to say that there’s no use, no utility whatsoever in these calculations. There’s plenty of stuff that they do care about. It’s just that there’s plenty of ways for them to not care. And again, I ended up feeling like, oh my God this happened again, I’m basically fronting —
Rich: Wait, what happened again?
Cathy: I’m fronting for a mathematical lie.
Rich: Oh. You felt that…
Cathy: I felt implicated.
Rich: We’re back to square one.
Cathy: Yeah, we’re like, what? What?? I wanna —
Rich: And this is 2011?
Cathy: Yeah. And I just wanted to —
Paul: Now in all fairness, you had a couple years to contextualize that maybe something was a little bit off with Wall Street.
Cathy: Oh yeah, I mean, I left the hedge fund saying I wanna help fix this.
Paul: OK, so you’re like —
Cathy: And then I’m like, I can’t fix this.
Paul: I’m gonna fix this system from inside.
Paul: You did that actual thing.
Cathy: I did that thing.
Rich: So you leave.
Cathy: I left and I wanted to sort of improve the world, I wanted to feel, like, morally positive or at least neutral. So that’s when I started Mathbabe. I was like, I’m gonna expose this corruption to the world, at least to mathematicians.
Paul: With a blooooog.
Cathy: With a blog, of course.
Paul: Of course.
Cathy: It was 2011, man. That’s what we did. [laughter] But in the meantime, I also needed to be employed, you know, because I have three kids and stuff, so I got a job at a startup. I became a data scientist. It was really easy, I just changed my title to data scientist, and I got a job in a company doing stuff with online advertising.
Paul: OK. So what is a data scientist?
Cathy: It’s somebody who uses historical data to build algorithms that predict people, instead of predict markets, I was predicting people, but it was really not that different.
Paul: Well that is a great definition, though, actually. I mean, honestly, 99% of data scientists asked to define what they do would be like, “I….”
Rich: “Do you have 20 minutes?”
Cathy: I mean, that’s not the only thing we do, like I built an overnight report system, you know, the daily dashboard for the company, I did a lot of data visualization using Tableau, like I figured out how to make the data real for the company.
Paul: Do I come to you, then, with a big pile of data and go, hey, Cathy, I’ve got this data.
Cathy: Mmmm hmmm.
Paul: You’re a scientist.
Cathy: Yeah. Yeah.
Paul: That’s pretty much it?
Cathy: Oh yeah.
Cathy: But you have to tell me what the question is that you’re trying to answer.
Paul: What’s a good question?
Cathy: Um…how do I get more people to buy?
Cathy: And then I say, well, OK, what are the moments we’re collecting data about those people?
Rich: Now isn’t that evil?
Cathy: Well that’s the good question, thank you, Richard. Thanks for that question. Yeah, I mean, so, originally I was like, this isn’t evil. I mean, some people buy hotels on Expedia, some people don’t. You know, like…
Paul: I kind of see where this is going now.
Cathy: And I don’t, I still don’t think it was evil, but I’ll tell you what happened —
Paul: Wait, why is it in the past tense? [laughter] Sorry, it’s just there’s a certain trend here with you with these jobs.
Cathy: Oh my God, you’re right. God, you’re making me, like, reflect upon myself. It’s not pretty. So what happened was, like, there was this venture capitalist who came who was thinking of investing in our series B funding round, and he wanted to talk to all of us, so we came and listened to him, and he said something along the lines of, like, here’s what I imagine tailored advertising will look like in the future: it’s gonna be a beautiful place where I get what I want and other people get what they want and everyone’s happy, and then he said, in particular, like, I look forward to the day when all I see are trips to Aruba and jet skis and I’m never gonna have to see another University of Phoenix ad, because those aren’t for people like me.
Cathy: And everyone around me laughed, and I remember thinking, what?
Paul: Oh boy.
Cathy: Like, what happened to the democratizing force of the internet?
Paul: Yeah, that’s not their priority with those guys.
Paul: That’s not really what…he’s literally going like, let’s make it beautiful and show me, like, Sonoma vacations.
Cathy: But you know, it just struck me that, like, you’re saying that everybody wants what they get, but who wants University of Phoenix ads? What does that mean to want that, and what is University of Phoenix really doing? Right? And by the way, I had never seen a University of Phoenix ad. I was like, did this guy really see that? Because I never saw that. And then I looked it up, and I realized that Apollo Group, which is the parent company of University of Phoenix, had actually been the biggest advertiser on Google that quarter.
Paul: Sure. Sure.
Cathy: And I was like, this stuff is huge. But I never see it.
Paul: Mmmm hmmm.
Cathy: So what we’re talking about is, you know, instead of, like, everyone, nobody knows you’re a dog, it’s like, everybody is siloed and segmented and segregated, actually, on their internet experience, and it’s fine for technologists, and venture capitalists, and like, us very well-educated people who are creating this world, but at the far other end of the spectrum, we are actually building a predatory system. Which nobody should be happy about. And then I started researching into the sort of for-profit college industry, which was at its height — it was 2011 — and it was terrible.
Paul: Yeah, that’s bad stuff in there.
Cathy: And then the payday lending also extremely predatory.
Paul: Well that’s, that’s even worse stuff.
Cathy: And since then they’ve closed Corinthian College and ITT Tech most recently, but like, Corinthian was nailed for, like, specifically targeting single, poor mothers of color and, like, telling them that if they wanted to do well by their child —
Cathy: They were going to agree to this, because it’s gonna present their child with a better life. And it was, they were actually told to find the pain point for these people.
Paul: Mmmm hmmm.
Cathy: And like, focus on the pain. And then tell them, this pain is gonna be gone when you agree to this. Like, recruiting was just awful, and the point is that with the internet, the way we use demographics to pinpoint people on the internet, like, it is really easy to do this. Really easy to find those people.
Paul: So how did you quit this job?
Cathy: Well, I didn’t quit in disgust of this job, to be clear. What happened was simultan — it was pretty much simultaneously with that realization, I also had a friend who was a principal whose teachers were being scored by a, um, what’s called the value-added teacher model.
Paul: Mmmm hmmm.
Cathy: And I started looking into that scoring system, because, well I asked her, what is the formula, how are they being scored? And she was like, well I asked my DOE contact, but — my Department of Education contact — but she just said, “It’s math — you wouldn’t understand it.”
Paul: So she’s being judged by secret math she can’t see.
Cathy: Her teachers are. She’s a principal.
Paul: Oh, OK.
Cathy: Her teachers are all being judged by something she couldn’t see, and I was like, what? You know, math is not supposed to be…
Cathy: Yeah. Obfuscate truth. It’s supposed to clarify truth. Like, what the heck? So I was like, could you double down, ask the person again, and tell them that you do understand math, because I’m going to read this for you. You know, she was told that three times before she finally got this white paper, which was unreadable. She sent it to me. I was like, I’ve been, I have a math PhD, I’ve been doing modeling for 10 years or something.
Paul: You’ve seen some bad stuff.
Cathy: And I couldn’t understand this at all.
Paul: Yeah, you’ve been down in the…you’ve seen terrible things.
Rich: Because it’s bad, or because it’s…?
Cathy: It was written so that nobody could understand it.
Paul: To obscure, like, even the worst academic isn’t this bad.
Cathy: Yeah. So around that same time, the New York Post published all the teachers’ names and their scores.
Paul: Ah, that was grisly, I remember that.
Cathy: Active shaming.
Rich: Did they really?
Paul: Yeah, it was hideous.
Cathy: And the way they did that is they did a Freedom of Information Act request. And so I was like, if they can get the scores, I can get the formula, I thought. So I filed a Freedom of Information Act request to get the source code. And I specifically said in my request, do not send me the white paper, that’s not what I’m talking about. I’m talking about —
Paul: Trade secret.
Cathy: The…so yeah, so they didn’t give it to me.
Cathy: They eventually gave me the white paper and said, this case is closed, and I was like, I didn’t ask for the white paper, I specifically said no.
Paul: They’ll throw teachers under the bus, they’re never giving you that.
Cathy: Right. So I actually ended up talking to someone who worked at the place that built this model in Madison, Wisconsin.
Rich: Madison, Wisconsin, because this model is being used not just in New York City.
Cathy: Right, it’s like a…
Rich: This is a major thing.
Cathy: It’s a think-tanky type place.
Cathy: A value-added research center.
Paul: And what is their argument for not sharing the…
Cathy: So yeah, so I said, why didn’t I get this? I feel like you guys are assessing civil servants, I should be able to see this. And they’re like, oh, you’re not the only one who can’t see this. Nobody in the Department of Education can see this. Our contract with the Department of Education stipulates that this is a secret formula. The point being that, like, number one, nobody has access to this formula. But number two, nobody understands this formula. That means that nobody in the Department of Education could actually confirm these scores…
Rich: Or scrutinize…
Paul: It could literally be, like, one line of code that just says random, out of a hundred pick a number.
Cathy: Yeah. In fact there was an example of that, because this stuff is all over the country. There was an example where there was an actual computer error, and they only figured it out because some teachers were getting scores for classes they didn’t teach.
Rich: [laughing] Oh my God.
Paul: So you’re literally — so, in order to understand how you’re being scored, you have to reverse-engineer the test scores back to the algorithm that is determining how a teacher is performing.
Cathy: OK, so what happened was this intrepid civilian journalist, who’s actually a math teacher at Stuyvesant High School, his name is Gary Rubenstein, he did something really smart. He was like, well, you know, since the New York Post got these scores for teachers, I have access to that. They’re now publicly available. He got his copy of them, and he found that many teachers had two scores.
Cathy: Because they taught two classes. So they got a 7th grade math score and an 8th grade math score. The 5th grade English score and the 6th grade English score. He found more than 600 teachers got two scores for the same year. And remember, these scores were just supposed to — I didn’t tell you this, but the truth is these scores were supposed to just tell you whether you were a good teacher or not. So they weren’t supposed to depend on 5th or 6th or whatever.
Paul: Right, so if you’re above X, if you’re above 75, you’re a good teacher. If you’re below 75…
Cathy: Right. That was the idea. So they get two numbers between 0 and 100, right? And I already knew a guy named Tim Clifford who blogged about getting a 6 and a 96 the next year, so I was already hmmmm, like this doesn’t sound consistent.
Paul: That is a discrepancy.
Cathy: Yeah. A little bit.
Cathy: Because, especially because the six came with shame.
Paul: Yeah. You’re the worst teacher in the world.
Cathy: Yeah, you feel bad.
Paul: Yeah, you really only have five ways to fail left. [laughter]
Cathy: He was a veteran, he was 26 year English middle school teacher, but he was like the —
Paul: In New York City?
Paul: Can you imagine?
Cathy: But the young teachers in his school who got bad numbers were really shamed.
Paul: Yeah, because he was just like [very low angry muttering in a good approximation of a veteran teacher].
Cathy: Yeah, he’s just like, oh God, another system, are you kidding me?
Paul: We’re gonna get someone else to teach these little bastards. [laughter] But then, yeah, you’re there for like a year or two and I love my students, and they’re like, eh, you’re a seven.
Cathy: You’re a seven, yeah.
Cathy: So anyways, so Gary Rubenstein, going back to him, what he did is he plotted those teachers on a scatterplot, so X-axis is the first score, and Y-axis is the second score.
Cathy: And they were consistent — if you got, like, a 72 and then a 74, you’d expect all these dots to line up from diagonal.
Paul: Yeah, that’s a good — at least that’s a consistent test at that point, right?
Cathy: Yeah. It’s a test of consistency, right?
Cathy: What he saw was the scatterplot of this was almost uniform distribution.
Paul: Oh, so it’s like static on the TV.
Cathy: Yeah, it’s like you’re just as likely — it seemed to, like, with your eyes, you would look, like, just as likely to get 96,0, which 96 one score, 0 for the other, as to get a 50,50.
Paul: Oh my.
Cathy: There was no consistency.
Paul: Um, OK.
Cathy: So it just, I mean, that’s kind of enough for me.
Cathy: There’s lots of economists, Raj Chetty, who’s like a famous Harvard economist, writing these papers about how great the value-added model is, and how information that really matters is embedded in these scorings, and I’m like, OK but, they’re completely random on a given year for a given person.
Paul: So you might as well just put on a blindfold and throw a dart.
Cathy: So it was actually 24% correlated, so there’s not nothing.
Cathy: There’s not no information there, but it’s very, very much like a random number generator for a given person.
Paul: What is a good correlation? I don’t know.
Cathy: I would like to see 95% correlation.
Paul: OK, so we’re way off.
Cathy: At least 90. So in spite of this, there are people that are getting fired.
Cathy: For bad scores.
Rich: Oh wow.
Cathy: And I interviewed someone, Sarah Wysocki, who was fired in Washington D.C. for a bad score, and so it’s like high stakes combined with randomness.
Paul: So this is a weapon of math destruction.
Paul: What are we looking for when we’re looking at these things. What is — you have a framework in the book for kind of identifying a WMD.
Cathy: Right, because I’m a math nerd, so I like to characterize things. I like to make them well-defined. So there’s three characteristics for weapons of math destruction, an algorithm that’s very worrisome to me. The first is that it’s widespread. So I really care only about algorithms that matter to people. Like this has to make important decisions for a lot of people.
Paul: Right it’s not…it’s not like…
Rich: The mass part.
Cathy: Yeah, that’s the mass, exactly, the scalability of it. The scaledness of it. So weapons, the teacher value-added model, as I said, is being used all over the country in more than half the states, mostly urban school districts. And for high-stakes decisions. Not always firing or hiring, but often tenure. So it’s important.
Paul: And with no transparency, like, they’re not talking to you.
Cathy: So that’s the second characteristic: secret. It’s got to be mysterious. People don’t understand their score, half the time they don’t even understand that they are being scored.
Paul: OK, so you can’t sit there and write a paper about this, because you can’t — I mean, you could write a paper about what you’ve observed, but you can’t take their work and go, like, all right, I’m gonna submit this paper about what they’ve done. They’re out of the whole academic context —
Cathy: So the other, the flipside of secrecy is lack of accountability.
Cathy: There’s, like, no accountability for these algorithms. And there’s no way of auditing them, exactly as you said. There’s no, like, check on them.
Paul: It’s actually fair, in my opinion, in that case to kind of assume the worst motives. Like, because we don’t know, you can be like, well it’s profit or power, but we don’t know why they won’t share the information.
Paul: Or they’re covering something up or whatever. I think that it’s actually, like, with stuff like this, which determines peoples’ lives, you really do just get to ask.
Cathy: You get to ask.
Paul: Why the hell?
Cathy: Yeah, no, exactly. My feeling is that if it’s this important and this widespread, it’s kind of like a law.
Paul: So is it time to sue that organization. What is it time to do?
Rich: What can you do?
Paul: I’m looking at Rich because he’s a lawyer.
Cathy: So let me just finish my description —
Paul: Oh yeah. Sorry.
Cathy: Because I’ve got two out of three. The third one is that it’s destructive. That it’s destroying peoples’ lives unfairly, and moreover, that it’s actually undermining its original goal. It’s setting up a destructive feedback loop. So in the case of the value-added model for teachers, the original goal was to sort of get rid of bad teachers, so that we could fix education and all those things, and to sort of decrease the achievement gap, all sorts of things, but basically get rid of bad teachers. And what’s actually happened is that good teachers have left.
Paul: They just don’t want to deal with this anymore.
Cathy: Yeah, they retire early, they get better jobs in schools that don’t have this regime. So Sarah Wysocki got fired, she got hired the next week in an affluent suburb of Washington D.C. which doesn’t have these scoring systems.
Paul: Right. Because who wants to work for a robot?
Rich: Let’s, let’s dive in for a second, and talk about the motivations behind the secrecy part of it. It’s not in the best interests of whether this is for-profit or partly for-profit or non-profit or whatever to not be a successful tool. I mean, I don’t think there’s payoffs going on to hide this stuff, right? I think if you ask them, well why don’t you be more transparent about how this thing works, they’re gonna come back and say, well, the scrutiny and there’s just sort of constant pings of complaint would be endless.
Paul: Well the easiest argument —
Rich: And therefore, I mean, because we could easily just demonize —
Paul: The easiest argument to make —
Rich: And say there are bad guys in the room.
Paul: Right, right, right, and they’re gonna say, well no, then people will game it. That’s what they’re gonna say. But the same is true of voting machines. It just never ends.
Paul: You can’t fire someone and not tell them why.
Rich: Agreed. But they’re talking about, there are seven — let me be this person for a second. Let me represent, what’s the name of this company?
Rich: VARC. [laughter] I’m senior vice president of varcking at VARC. So I’m at VARC, and I say, look, if we do this, we would be constantly scrutinized, constantly, there would be complaints filed against us on a never-ending basis such that we couldn’t even function as an organization, because when someone gets a low score, they’re gonna wanna audit and they’re gonna wanna, you know, put the magnifying glass on us. We can’t function. So therefore there is an implicit trust, which I think is what they’re banking on here, that this system is going to work, because you trust us, that it will work. I think the problem is that trust is not founded on anything.
Paul: Well they haven’t earned it. It’s an opaque white paper.
Rich: They haven’t earned it.
Paul: And a 24% correlation.
Cathy: So I mean, let me say what I think is true. I think you’re probably right, but I don’t think that’s the primary reason for the secret. I think the primary reason is that they just think this is their secret sauce, and they need to protect it, because if they give it away, then other people could just literally copy it.
Rich: OK, so that sounds very commercial to me.
Rich: So they must be a commercial, for-profit in some way.
Cathy: They want the business. I’m not sure, and by the way, VARC is like, they’re not being used anymore. There’s other companies now being used that also defend their secrecy.
Rich: Do the same thing.
Cathy: Also, I want to say, Paul was right, that there’s a lot of that, oh, we don’t want people to game this. The thing people don’t understand about gaming, you know, is like, it’s a good enough model that you should use it in high-impact and widespread situations, then you want them to game it.
Paul: Well there’s always, like, that’s just a big part of statistics, right, is you are accommodating for that behavior in your model.
Cathy: And if you can’t accommodate for it, you shouldn’t use it. Let me give you an example: credit scores. Credit scores are pretty good, and they’re not perfect, I do have complaints about the way they’re being used, but —
Paul: Oh we should — that would be another podcast.
Paul: I would — that would be amazing.
Cathy: No, let’s do that. But like, at the end of the day, what they’re doing is looking at your behavior paying your electricity bill and your rent. So gaming that would look like, I’ve got to pay my electricity bill on time this month.
Rich: It’s all good.
Cathy: It’s all good. That’s what we want.
Rich: That’s good gaming.
Cathy: And if we had a, if we had a bad proxy in there, like somebody suggested once, oh instead of looking at your electricity bill, because it’s expensive and time-consuming to look at peoples’ records, just count the number of books in their house and use that as a proxy for whether they’re credit-worthy. And the point is once that gets out, you would just buy a lot of books to make yourself look good, right?
Cathy: That means that it’s a bad proxy. Don’t use that, use the electricity bill. That’s the reason we use electricity bills. So I’m just saying that, like, in a situation that is this important, you can’t rely on bad proxies. You have to, you have to make sure that what you’re doing is setting up a system where if people game it, they’re better teachers.
Cathy: That’s not what we have. So actually, the woman, Sarah Wysocki, who got fired, she has reason to believe, and I should mention that Michelle Rhee, who was chancellor of schools at the time, fired people who had bad scores, but also gave bonuses to people with good scores. And Sarah has reason to believe that the teachers of some of her students the previous year had cheated on the test.
Paul: You talk about this in the book.
Cathy: There’s evidence for that. So like, and it totally makes sense.
Paul: So it created an incentive system where if you cheated, you as a teacher could make more money.
Cathy: Yeah, but you would be screwing the next year’s kids.
Paul: Oh, because you’re — she’s gonna inherit those kids but her scores are gonna be lower.
Cathy: Her scores are gonna be lower because the way the scores kind of work, we don’t really know, obviously, we don’t have the formulas, but is like, you compare the kids’ actual scores versus what they were expected to get, and you’re sort of on the hook for the difference. So if the scores of the kids coming in were inflated, then they’re expected to get much better scores than they actually will get.
Paul: OK. So we’ve got an algorithm, we’ve got a way of identifying weapons of math destruction.
Paul: Hooo, it’s a rough one out there.
Cathy: And there’s a lot of them.
Cathy: There’s a lot of weapons of math destruction. Every chapter of my book is, like, a different one.
Paul: You talk about sports, you talk about finance, you talk about education. Yeah. I mean, it’s tough. It’s good, it’s well written and breezy and bright, because the actual news it imparts is that we are sort of sinners in the hand of an angry algorithmic god.
Cathy: Yeah. Actually, I should say my sports example is a counter-example, right?
Paul: That’s true. Because it works.
Cathy: Because sports, the point is it’s not a weapon of math destruction.
Paul: It’s totally transparent.
Cathy: It’s totally transparent, like, I mean, I like to say that, like, you know, because I listen, I watch baseball and I listen to a lot of baseball on the radio, the arguments we have about whether something should’ve been called an error, it’s like the entire public helping people clean data. You know, as a data scientist, you spend 80% of your time cleaning data, that’s like really not very glamorous job to do. But you know, in baseball, like, the entire public helps, they’re like, oh, that shouldn’t have been called an error. That’s a hit.
Paul: What’s your team, who do you follow?
Cathy: Traditionally speaking, Red Sox fan, but now I’m a Mets fan. Because I live here now.
Rich: That’s a big leap.
Cathy: Well, it’s not, I didn’t become a Yankees fan.
Rich: That would be a bigger leap.
Rich: True. Fair enough.
Cathy: I also like the Nats. Even though it’s crazy to like the Nats, but I do it.
Paul: All right. My God.
Rich: The message has been sent.
Paul: I tell you. It’d be fun to be in your brain for a minute, but I’m also — I think it’d be, it’d be a lot. It’d be a lot to look around and see these things all the time. I like living in my world of lies.
Paul: No, I don’t, really. Now I’m actually — but that was, that’s a lot of signal. There’s a lot going on out there.
Cathy: There is. I mean, yeah, and people have told me that, like, ever since they read my book, they’re seeing the world through the lens of weapons of math destruction, which is, like, I don’t know whether to thank them or apologize.
Paul: So we have a lot of programmers and engineers and people in technology who listen to this show, with whom we work and so on. They’re dealing with large amounts of data, they’re dealing with, and we’re increasingly, Postlight as a company is doing more work around culture, around media, around, a little bit around government. So when someone’s giving us a big dataset with a big question, what’s an ethical path? And maybe that’s too big of a question, but like, where do you start to think ethically about, you know, a couple terabytes of data that gets dropped on your lap?
Cathy: Yeah, so I mean, at the end of the book, I’m not hopeless, or else I wouldn’t have written the book at all. I call for ethics to be part of data science curriculum, and like a regular part of conversation around algorithms. I mean, obviously I have a lot of opinions, political opinions, but I’m not, like, I’m not suggesting the way to think about ethics, you know, I’m suggesting that what we need to do is start acknowledging that we are embedding our values in every algorithm we build. Even by our objective function, the definition of success.
The example I like to give is when I make dinner for my kids. The data, that’s an algorithm, because I think about everything in terms of algorithms. [laughter] And like, the data going into that is, like, the food I have on hand, the time I have, the ambition I have. And the definition of success, for me, is when, if my kids eat their vegetables, right? My seven-year-old, if he were in charge, would be like, no, it’s whether I get a lot of Nutella at dinner.
And this matters because over time, we train our models to success. We optimize, right? That’s the point. So if I had a successful meal on Tuesday with lots of vegetable eating, then on Wednesday I’m much more likely to repeat that meal than to do something that my kid wanted instead.
Cathy: So that’s what we do when we build algorithms. We define success and then we, we don’t really spend enough time asking ourselves who’s benefitting from this definition of success and who’s suffering?
Cathy: The other thing is, of course, our data itself is often extremely biased, so we have to be careful about that.
Paul: And I mean, really, aside from buying and reading a book, which is one big part of this ethics question about big data, the other thing is just to take a minute. Don’t just, like, think about some of the inherent biases, think about the sources, think about who put the data there, and as you’re creating your algorithms, they’re going to be interpreted and they’re going to result in other people reacting as you draw conclusions.
Cathy: I mean, just, example, just to be clear, like, if you’re thinking about, like, for-profit colleges and Google and the people who got targeted by them, right? So for-profit colleges made money because they got students. Google made money because they sold advertising for a lot. They get a lot of money for that stuff. The people who were of, like, downwind of that, who actually got targeted, the question is, was this a success? You know, it might have been a success for the first two parties, but not for the third.
Cathy: And that’s often true. Like, because we are often defining success simply by profit, and it’s not clear that’s the only thing we should think about.
Paul: All right, so we’re gonna go away and talk about this for probably the next six to eight months. [laughter] Um….if people want to get in touch with you, Cathy, if they have questions, if they have ideas, if they have unique opportunities for online education…
Cathy: Yeah, I mean, beyond forming book groups to talk it over, I’m available. If you go to Mathbabe.org, which is my blog, my email is on the about page.
Paul: Great. And people should go read Mathbabe.org. I want to tell everyone that they should go and buy, either physically or via the Amazon Kindle store…
Cathy: And I also read the audiobook, by the way.
Paul: Oh! And…
Cathy: If they’re listeners.
Paul: Or the audiobook. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. It’s by Cathy O’Neil. Mathbabe.org. And it’s out from Random House Crown. It’s available now. Lucky, lucky us. All right.
Rich: This was cool, really cool.
Cathy: My pleasure.
Paul: That was awesome. Thank you so much for coming in.
Cathy: Thank you.
Paul: Uh….I think we should get back to the office.
Rich: I think we should, very carefully.
Paul: All right, so I’m Paul Ford. I’m the co-founder of Postlight.
Rich: Rich Ziade.
Paul: And if you want to check us out, hit postlight.com. You can also subscribe to our newsletter, Track Changes. You can get to that by going to trackchanges.postlight.com. If you need anything, you just send us an email. email@example.com. Feel free to rate us nicely on iTunes. We always like that, and we appreciate the contact and the support and the friendly counsel we receive from our many listeners.
Rich: Very soothing the way you wound that down.
Paul: That’s enough. Let’s get back to the office.