The Rights Track: An optimist's view: What makes data good?

An optimist's view: What makes data good?

Mar 23, 2022

In Episode 4 of Series 7 of The Rights Track, Todd is in conversation with Sam Gilbert, an entrepreneur and affiliated researcher at the Bennett Institute for Public Policy at the University of Cambridge. Sam works on the intersection of politics and technology. His recent book – Good Data: An Optimist’s Guide to Our Future – explores the different ways data helps us, suggesting that “the data revolution could be the best thing that ever happened to us”.

Transcript

Todd Landman 0:01

Welcome to The Rights Track podcast which gets the hard facts about the human rights challenges facing us today. In Series 7, we're discussing human rights in a digital world. I'm Todd Landman, in the fourth episode of this series, I'm delighted to be joined by Sam Gilbert. Sam is an entrepreneur and affiliated researcher at the Bennett Institute for Public Policy at the University of Cambridge, working on the intersection of politics and technology. His recent book, Good Data: An Optimist's Guide to Our Future explores the different ways data helps us suggesting the data revolution could be the best thing that ever happened to us. And today, we're asking him, what makes data good? So Sam, welcome to this episode of The Rights Track.

Sam Gilbert 0:41

Todd thanks so much for having me on.

Todd Landman 0:44

So I want to start really with the book around Good Data. And I'm going to start I suppose, with the negative perception first, and then you can make the argument for a more optimistic assessment. And this is this opening set of passages you have in the book around surveillance capitalism. Could you explain to us what surveillance capitalism is and what it means?

Sam Gilbert 1:01

Sure. So surveillance capitalism is a concept that's been popularised by the Harvard Business School Professor, Shoshana Zuboff. And essentially, it's a critique of the power that big tech companies like Google and Facebook have. And what it says is that, that power is based on data about us that they accumulate, as we live our lives online. And by doing that produce data, which they collect, and analyse, and then sell to advertisers. And for proponents of surveillance capitalism theory, there's something sort of fundamentally illegitimate about that. In terms of the way that it, as they would see it, appropriates data from individuals for private gain on the path of tech companies. I think they would also say that it infringes individual's rights in a more fundamental way by subjecting them to surveillance. So that I would say is surveillance capitalism in a nutshell.

Todd Landman 2:07

Okay. So to give you a concrete example, if I'm searching for a flannel shirt from Cotton Trader, on Google, the next day, I open up my Facebook and I start to see ads for Cotton Trader, on my Facebook feed, or if I go on to CNN, suddenly I see an ad for another product that I might have been searching for on Google. Is that the sort of thing that he's talking about in this concept?

Sam Gilbert 2:29

Yes, that's certainly one dimension to it. So that example that you just gave is an example of something that's called behaviour or retargeting. So this is when data about things you've searched for, or places you've visited on the internet, are used to remind you about products or services that you've browsed. So I guess this is probably the most straightforward type of what surveillance capitalists would call surveillance advertising.

Todd Landman 2:57

Yeah, I understand that, Sam, but you know when I'm internally in Amazon searching for things. And they say you bought this other people who bought this might like this, have you thought about, you know, getting this as well. But this is actually between platforms. This is, you know, might do a Google search one day. And then on Facebook or another platform, I see that same product being suggested to me. So how did, how did the data cross platforms? Are they selling data to each other? Is that how that works?

Sam Gilbert 3:22

So there's a variety of different technical mechanisms. So without wanting to get too much into the jargon of the ad tech world, there are all kinds of platforms, which put together data from different sources. And then in a programmatic or automated way, allow advertisers the opportunity to bid in an auction for the right to target people who the data suggests are interested in particular products. So it's quite a kind of complex ecosystem. I think maybe one of the things that gets lost a little bit in the discussion is some of the differences between the ways in which big tech companies like Facebook and Google and Amazon use data inside their own platforms, and the ways in which data flows out from those platforms and into the wider digital ecosystem. I guess maybe just to add one more thing about that. I think, probably many people would have a hard time thinking of something as straightforward as being retargeted with a product that they've already browsed for, they wouldn't necessarily see that as surveillance, or see that as being particularly problematic. I think what gets a bit more controversial, is where this enormous volume of data can have machine learning algorithms applied to it, in order to make predictions about products or services that people might be interested in as consumers that they themselves haven't even really considered. I think that's where critics of what they would call surveillance capitalism have a bigger problem with what's going on.

Todd Landman 4:58

No I understand that's, that's a great great explanation. Thank you. And I guess just to round out this set of questions, really then it sounds to me like there's a tendency for accumulated value and expenditure here, that is really creating monopolies and cartels. To what degree is the language of monopoly and cartel being used? Because these are, you know, we rattle off the main platforms we use, but we use those because they have become so very big. And, you know, being a new platform, how does a new platform cut into that ecosystem? Because it feels like it's dominated by some really big players.

Sam Gilbert 5:32

Yes. So I think this is a very important and quite complicated area. So it is certainly the case that a lot of Silicon Valley tech companies have deliberately pursued a strategy of trying to gain a monopoly. In fact, it might even be said that that's sort of inherent to the venture capital driven start-up business model to try and dominate particular market space. But I suppose the sense in which some of these companies, let's take Facebook as an example, are monopolies is really not so related to the way in which they monetize data or to their business model. So Facebook might reasonably be said to be a monopolist of encrypted messaging, because literally billions of people use Facebook's platform to communicate with each other. But it isn't really a monopolist of advertising space, because there are so many other alternatives available to advertisers who want to promote their products. I guess another dimension to this is the fact that although there are unquestionably concentrations of power with the big tech companies, they also provide somewhat of a useful service to the wider market, in that they allow smaller businesses to acquire customers much more effectively. So that actually militates against monopoly. Because now in the current digital advertising powered world, not every business has to be so big and so rich in terms of capital, that it can afford to do things like TV advertising. The platform's that Facebook and Google provides are also really helpful to small businesses that want to grow and compete with bigger players.

Todd Landman 7:15

Yeah, now I hear you shifting into the positive turn here. So I'm going to push you on this. So what is good data? And why are you an optimist about the good data elements to the work you've been doing?

Sam Gilbert 7:27

Well, for me, when I talk about good data, what I'm really talking about is the positive public and social potential of data. And that really comes from my own professional experience. Because although at the moment, I spend most of my time researching and writing about these issues of data and digital technology, actually, my background is in the commercial sector. So I spent 18 years working in product and strategy and marketing roles, and particularly financial services. Also at the data company, Experian, also in a venture backed FinTech business called Bought By Many. And I learnt a lot about the ways in which data can be used to make businesses successful. And I learned a lot of techniques that, in general, at the moment, are only really put to use to achieve quite banal goals. So for example, to sell people more trainers, or to encourage them to buy more insurance products. And so one of the things that I'm really interested in is how some of those techniques and technologies can move across from the commercial sector, into the public sector, the third sector, and be put to work in ways that are more socially beneficial. So maybe just to give one example of that type of data that I think contains huge potential for public goods is search data. So this is the data set that is produced by all of us using Google and Bing and other search engines on a daily basis. Now, ordinarily, when this data is used, it is to do banal things like, target shoes more effectively. But there is also this emerging discipline called Infodemiology, where academic researchers use search data in response to public health challenges. So one great example of that, at the moment has been work by Bill Lampos at University College London and his team, where they've built a predictive model around COVID symptoms using search data. And that model actually predicts new outbreaks 17 days faster than conventional modes of epidemiological surveillance. So that's just one example of the sort of good I believe data can bring.

Todd Landman 9:50

So it's like a really interesting example of an early early warning system and it could work not only for public health emergencies, but other emerging emergencies whether they be conflict, or natural disasters or any topic that people are searching for, is that correct?

Sam Gilbert 10:05

Yes, that's right. I mean, it's not just in the public health field that researchers have used this, you just put me in mind actually Todd of a really interesting paper written by some scholars in Japan who are looking at citizens decision making in response to natural disaster warnings. So floods and earthquakes that that migration patterns I guess, would be the way of summarising it. Those are things that can also be detected using search data.

Todd Landman 10:31

Well, that's absolutely fascinating. So if we go back to public health then. I was just reading a new book, out called Pandemocracy in Europe: Power, Parliaments and People in Times of COVID. And it's edited by Matthias Kettemann and Konrad Lachmayer. And there's a really fascinating chapter in this book that transcends the nation state, if you will. And it talks about platforms and pandemics. And one section of the chapter starts to analyse Facebook, Twitter, YouTube, and telegram on the degree to which they were able to control and or filter information versus disinformation or misinformation. And just the scale of some of this stuff is quite fascinating. So you know, Facebook has 2.7 billion daily users, it's probably a bigger number now. And you know, 22.3% of their investigated Facebook posts contain misinformation about COVID-19. And they found that the scale of misinformation was so large that they had to move to AI solutions, some human supervision of those AI solutions. But what's your take on the role of these big companies like we've been talking about Facebook, Twitter, YouTube, Telegram, and their ability to control the narrative and at least provide safe sources of information, let's say in times of COVID, but there may be other issues of public interest where they have a role to play?

Sam Gilbert 11:57

Yes, I think this is such an important question. It's very interesting that you use the phrase, control the narrative, because of course, that is something that big tech companies have traditionally been extremely reluctant to do. And one of the things I explore a bit in my book is the extent to which this can really be traced back to some unexamined normative assumptions on the part of tech company executives, where they think that American norms of free speech and the free speech protections of the First Amendment that's sort of universal laws that are applicable everywhere, rather than things which are culturally and historically contingent. And for that reason, they have been extremely reluctant to do any controlling of the narrative and have tended to champion free speech over the alternative course of action that they might take, which is to be much more proactive in combating harms, including but not limited to misinformation. I think this probably also speaks to another problem that I'm very interested in, in the book, which is what we are concerned about when we say we're concerned about big tech companies’ power, because I think ordinarily, the discussion about big tech companies power tends to focus on their concentrations of market power. Or in the case of surveillance capitalism theory, it concentrates on the theoretical power that algorithms have over individuals and their decision making. And what gets lost a bit in that is the extent to which tech companies by providing these platforms and these technologies actually empower other people to do things that weren't possible before. So in some work I've been doing with Amanda Greene, who's a philosopher at University College London, we've been thinking about that concept of empowering power, as we call it. And as far as we're concerned, that's actually a much more morally concerning aspect of the power of big tech, big tech companies than their market position.

Todd Landman 14:11

Yeah. So I like it that you cite the First Amendment of the American Constitution, but interestingly, the international framework for the protection and promotion of human rights also, you know, has very strong articles around protection of free speech, free assembly, free association, which of course, the tech companies will be interested in looking at and and reviewing. But what it raises to I believe really is is a question around the kind of public regulation of private actors, because these are private actors. They're not subjected to international human rights law in the way that states are. And yet they're having an impact on mass publics. They're having an impact on politics. They're having an impact on debate. So perhaps I misspoke by saying control the narrative. What I'm really interested in is we seem to have lost mediation. We have unmediated access to information. And it seems to me that these it's incumbent upon these organisations to provide some kind of mediation of content, because not all things are true just because they're said. So it gets back to that question, what where's the boundary for them? When will they step in and say this is actually causing harm if there's some sort of a big tech Hippocratic oath about do no harm that needs to be developed? So that, so there is at least some kind of attempt to draw a boundary around what is shared and what is not shared?

Sam Gilbert 15:34

Yes, so the idea of a Hippocratic oath for tech workers is definitely out there, the writer who has explored it more than I have is James Williams in his book Stand Out Of Our Light. I think that that is certainly something that would help. I also think that it is beneficial that at the moment, we're having more discussion about data ethics and the ethics of artificial intelligence, and that that is permeating some of the tech companies. So I think more ethical reflection on the part of tech executives and tech workers is to be welcomed. I don't think that's sufficient. And I do think that it's important that we have stronger regulation of the tech sector. And I suppose from my perspective, the thing that needs to be regulated, much more than anything to do with how data is collected or how data is used in advertising. Is this what sometimes referred to as online safety, or other times it's referred to as online harms. So that is anything that gives rise to individuals being at risk of being harmed as they live their lives online. There's actually legislation that is coming through in the UK at the moment called online safety bill, which is far from perfect legislation, but in my opinion, it's directionally right. Because it is more concerned with preventing harm and giving tech companies a responsibility for playing their part in it, then it is concerned with trying to regulate data or advertising.

Todd Landman 17:13

Yeah, so it's really the result of activity that is trying to address rather than that the data that drives the the activity, if I could put it that way. So if we think about this, do no harm element, the mediating function that's required at least to get trusted information available to users. I, I wonder if we could pivot a little bit to the current crisis in Ukraine, because I've noticed on social media platforms, a number of sites have popped up saying we're a trusted source for reporting on on the current conflict, and they get a sort of kite mark or a tick for that. I've also seen users saying, don't believe everything you see being tweeted out from Ukraine. So where does this take us and not only COVID, but to something as real time active and horrific as conflict in a country, we can talk about Ukraine or other conflicts about the sharing of information on social media platforms?

Sam Gilbert 18:08

Yes, well, this is a very difficult question. And unfortunately, I don't have the answer for you today. I guess what I would point to is something you touched on there Todd, which is the idea of mediation. And we have been through this period with social media, where the organizations, the institutions that we traditionally relied on to tell us what was true and what was false and sort fact from fiction, those organisations have been disintermediated. Or in some cases, they have found themselves trying to compete in this very different information environment that is much more dynamic in a way that actually ends up undermining the journalistic quality that we would otherwise expect from them. So this is not a very satisfactory answer, because I don't know what can be done about it, except that it is a very serious problem. I suppose just to make one final point that I've been reminded I've been reading stories on this topic in relation to the Ukraine crisis, is that the duality of this power that tech companies and that technology has given to ordinary users in the era of social media over the last 15 years or so. So if we were to rewind the clock to 2010, or 2011, the role of Twitter and Facebook and other technology platforms in enabling protest and resistance against repressive regimes that was being celebrated. If we then roll forwards a few years and look at a terrible case like the ethnic cleansing of the Rohingya people in Myanmar, we are at the complete opposite end of the spectrum where the empowerment of users with technology has disastrous consequences, and I guess if we then roll forward again to the Ukraine crisis, it's still not really clear whether the technology is having a beneficial or detrimental effect. So this is really just to say, once again, when we think about the power of tech companies, these are the questions I think we need to be grappling with, rather than questions to do with data.

Todd Landman 20:31

Sure, there was there was a great book years ago called the Logic of Connective Action. And it was really looking at the way in which these emerging platforms because the book was published some years ago about lowering collective action costs, whether it was, you know, for protest movements, or, you know, anti-authoritarian movements, etc, we did a piece of work years ago with someone from the German Development Institute on the role of Facebook, in, in opposition to the Ben Ali regime in Tunisia, and Facebook allowed people to make a judgement as to whether they should go to a protest or not based on number of people who said they were going and and so it lowered the cost of participation, or at least the calculated costs of participating in those things. But as you say, we're now seeing this technology being used on a daily basis, I watch drone footage every day of tanks being blown up, of buildings being destroyed. And you know, part of my mind thinks it's this real, what I'm watching. And then also part of my mind thinks about, what's the impact of this? Does this have an impact on morale of the people involved in the conflict? Does it change the narrative, if you will, about the progress and or, you know, lack of progress in in the conflict, and then, of course, the multiple reporting of whether they're going to be peace talks, humanitarian corridors and all this other stuff. So it does raise very serious questions about the authenticity, veracity and ways in which technology could verify what we're seeing. And of course, you have time date stamps, metadata and other things that tell you that that was definitely a geolocated thing. So are these companies doing that kind of work? Are they going in and digging into the metadata, I noticed that Maxar Technologies, for example, is being used for its satellite data extensively, and looking at the build-up of forces and the movement of troops and that sort of thing. But again, that's a private company making things available in the public sphere for people to then reach judgments, media companies to use, it's an incredible ecosystem of information, and that it seems like a bit like a wild west to me, in terms of what we believe what we don't believe and the uses that can be made of this imagery and commentary.

Sam Gilbert 22:32

Yes, so there is this as an all things, this super proliferation of data. And what is still missing is the intermediation layer to both make sense of that. And also tell stories around it that have some kind of journalistic integrity. I mean what you put me in mind of there Todd was the open source intelligence community, and some of the work that including human rights organisations do to leverage these different data data sources to validate and investigate human rights abuses taking place in different parts of the world. So to me, this seems like very important work, but also work that is rather underfunded. I might make the same comment about fact checking organisations, which seem to do very important work in the context of disinformation, but don't seem to be resourced in the way that perhaps they should be. Maybe just one final comment on this topic would relate to the media, the social media literacy of individuals. And I wonder whether that is something that is maybe going to help us in trying to get out of this impasse, because I think over time, people are becoming more aware that information that they see on the internet may not be reliable. And while I think there's still a tendency for people to get caught up in the moment, and retweets or otherwise amplify these types of messages, I think that some of the small changes the technology companies have made to encourage people to be more mindful when they're engaging with and amplifying content might just help build on top of that increase in media literacy, and take us to a slightly better place in the future.

Todd Landman 24:26

Yeah, I mean, the whole thing around media literacy is really important. And I I also want to make a small plea for data literacy, just understanding and appreciating what data and statistics can tell us without having to be you know, an absolute epidemiologist, statistician or quantitative analyst. But I wanted to hark back to your idea around human rights investigations, we will have a future episode with a with a group that does just that and it's about maintaining the chain of evidence, corroborating evidence and using you know, digital evidence as you, you know in ways that help human rights investigations and, you know, if and when this conflict in Ukraine finishes, there will be some sort of human rights investigatory process. We're not sure which bodies going to do that yet, because we've been called for, you know, like a Nuremberg style trial, there have been calls for the ICC to be involved as been many other stakeholders involved, but that digital evidence is going to be very much part of the record. But I wonder just to, yeah go ahead Sam.

Sam Gilbert 25:26

Sorry I am just going to add one thing on that, which I touched on this a little bit, and my book, but I think there's a real risk, actually, that open-source intelligence investigations become collateral damage in the tech companies pivot towards privacy. So what some investigators are finding is that material that they rely on to be able to do their investigations is being unilaterally removed by tech companies, either because it's YouTube, and they don't want to be accused of promoting terrorist content, or because it's Google or Facebook, and they don't want to being accused of infringing individual's privacy. So while this is not straightforward, I just think it's worth bearing in mind that sometimes pushing very hard for values like data privacy can have these unintended consequences in terms of open source intelligence.

Todd Landman 26:24

Yes, it's an age old chestnut about the unintended consequences of purposive social action. I think that was a Robert Merton who said that at one point, but I guess in closing that I have a final question for you because you are an optimist. You're a data optimist, and you've written a book called good data. So what is there to be optimistic about for the future?

Sam Gilbert 26:42

Well, I suppose I should say something about what type of optimist I am first, so to do that, I'll probably reach for Paul Romer's distinction between blind optimism and conditional optimism. So blind optimism is the optimism of a child hoping that her parents are going to build her a tree house. Conditional optimism is the optimism of a child who thinks, well, if I can get the tools and if I can get a few friends together, and if we can find the right tree, I think we can build a really incredible tree house together. So I'm very much in the second camp, the camp of conditional optimism. And I guess the basis for that probably goes to some of the things we've touched on already, where I just see enormous amounts of untapped potential in using data in ways that are socially useful. So perhaps just to bring in one more example of that. Opportunity Insights, the group at Harvard run by Raj Chetty has had some incredibly useful insights into social mobility and economic inequality in America, by using de-identified tax record data to understand over a long period of time, the differences in people's incomes. And I really think that that type of work is just the tip of the iceberg when it comes to this enormous proliferation of data that is out there. So I think if the data can be made available to researchers, also to private organisations in a way that, as far as possible, mitigates the risks that do exist to people's privacy. There's no knowing quite how many scientific breakthroughs or advances in terms of human and social understanding that we might be able to get to.

Todd Landman 28:52

Amazing and I guess, to your conditional optimism, I would add my own category, which is a cautious optimist, and that's what I am. But talking to you today does really provide deep insight to us to understand the many, many different and complex issues here and that last point you made about, you know, the de-identified data used for for good purposes - shining a light on things that that are characterising our society, it with a view to be able to do something about it, you see things that you wouldn't see before and that's one of the virtues of good data analysis is that you end up revealing macro patterns and inconsistencies and inequalities and other things that then can feed into the policymaking process to try to make the world a better place and human rights are no exception to that agenda. So for now, Sam, I just want to thank you so much for coming on to this episode and sharing all these incredible insights and, and and the work that you've done. So thank you.

Chris Garrington 29:49

Thanks for listening to this episode of The Rights Track, which was presented by Todd Landman and produced by Chris Garrington of Research Podcasts with funding from 3DI. You can find a detailed transcript on the website at www.RightsTrack.org. And don't forget to subscribe wherever you listen to your podcasts to access future and earlier episodes.

Further reading and resources:

Sam Gilbert (2021) Good Data: An Optimist’s Guide to Our Digital Future.
Bill Lampos’ covid infodemiology: Lampos, V., Majumder, M.S., Yom-Tov, E. et al. (2021) “Tracking COVID-19 using online search”.
Infodemiology Japan/natural disasters paper: [1906.07770] Predicting Evacuation Decisions using Representations of Individuals' Pre-Disaster Web Search Behavior (arxiv.org)
On “empowering power”: Greene, Amanda and Gilbert, Samuel J., (2021) “More Data, More Power? Towards a Theory of Digital Legitimacy”.
On the Hippocratic oath for tech workers: James Williams (2018) Stand out of our Light: Freedom and Resistance in the Attention Economy.
Matthias C. Kettemann and Konrad Lachmayer (eds.) (2022) Pandemocracy in Europe: Power, Parliaments and People in Times of COVID-19.
W. Lance Bennett and Alexandra Segerberg (2013) The Logic of Connective Action; Digital Media and the Personalization of Contentious Politics.