The trend in tech has been to gather more and more data on everyone -- customers and employees alike -- even if there is no direct reason to collect so much data. This has led to a pushback by users and experts about data privacy and more conversations about standards of data science ethics.
At Black Hat USA 2018, Laura Norén, director of research at Obsidian Security, spoke about data science ethics, how companies can avoid "being creepy," and why privacy policies often leave out protections for employees.
Editor's note: This interview was edited for length and clarity.
How did you become interested in data science ethics?
Laura Norén: Data science isn't really a discipline, it's a set of methods being used across disciplines. One of the things that I realized fairly early on was people were doing the same thing with data science that we've done with so many technologies. We get so excited about the promise. People get so excited about being the first to do some new thing. But they're really using technologies before they fully understand what the consequences and the social impact would be. So that's when I got started on data science ethics and I talked fairly consistently to get a course that was just about ethics for data science.
But I ended up spending several years just working on, 'What is it that's unique about data science ethics?' We've had ethics forever. Most engineers take an ethics class. Do we really need to reinvent the wheel here? What's actually new about this?
I realized it is actually very difficult to ask those kinds of questions sitting solely from within academia because we don't have business pressure and we don't have the data to really understand what's happening. I knew that I wanted to leave for a while so that I could be a better data chief science ethicist, but that it would be very difficult to find a company that would want to have such a person around. Frankly, no tech company wants to know what they can't do, they want to know what they can do. They want to build, they want to innovate, they want to do things differently.
Obsidian is a new company, but it's founded by three guys who have been around for a while. They have seen some things that I would say they would find creepy and they didn't ever want to be that kind of company. They were happy to have me around. [They said], 'If you see that we're being creepy, I want you to push back and to stop us. But also, how we can avoid that? Not just that we should stop, but what we should do differently so that we can continue to move forward and continue to build products. Because, frankly, if we don't put X, Y, Z product out in the world someone else will. And unless we have a product that's actually better than that, you're still going to have employee data, for instance, being treated in bizarre and troubling ways.'
Why was it important to study data science ethics from within a company?
Norén: I got lucky. I picked them because they care about ethics, and because I knew that I needed to see a little bit more about how data are actually being used, deployed, or deleted or not, combined in a real setting. These are all dangerous issues, but unless you actually see how they're being done, it's way too easy to be hypercritical all the time. And that's kind of where that field is going.
It's also very interesting that employee data is not yet in the spotlight. Right now, the spotlight in tech ethics is on how tech companies are treating their workers. Are they inclusive or not? Do they care if their workers don't want to develop weapons? Do they still have to do that anyways? And then it's also on user data. But it's not on employee data.
I feel like -- I don't know exactly how fast these cycles go -- in three to five years, the whole conversation will be about employee data. We will have somehow put some stop-gaps in place to deal with user data, but we will not have paid much attention to employee data. In three to five years, when regulation starts to come down the chain, we've actually already built systems that are at least ethical. It's hard to comply with a regulation that doesn't exist, but at least you can imagine where those regulations are going to go and try to be in compliance with at least the principle of the effort.
What makes employee data different from a data science ethics perspective?
Norén: One of the major differences between user data and employee data, at least from a legal perspective, is that when someone starts to work for a company, that company usually has them consent to a bunch of procedures, one of those procedures being, 'And you consent to us surveilling what you are doing under the auspices of this company, using our physical equipment when you're out in the world representing us. Therefore we need to be able to monitor what you're up to, see that you're in line with what we think you're supposed to be doing for us.' This means that employees actually have far fewer privacy assumptions and rights than users do in a practical sense. They have those rights, but then they consent to give them up. And that's what most employees do.
That's why there's not a lot of attention here because they've signed an employment agreement that they've established. Legally it's not a gray area. Employers can potentially do what they wish.
Is there a way to push back on those types of policies, or is it more just a matter of trying to get companies to change those policies?
Norén: California has the California Consumer Privacy Act; it's very similar to the GDPR. They've changed a few things, and -- almost as a throwaway -- they stuck employees in there as potential users. It's moving to be tried in the court of law -- someone's going to have to test exactly how this is written -- but it doesn't go into effect for a while. It is possible that regulators may explicitly -- or in a bumbling kind of almost accidental fashion -- write employees into some of the policies that are like GDPR copycats.
But not really, because in the court of public opinion, eventually people started to say, 'Hey, this isn't right, that's not right. I don't think that I really consented to have my elections meddled with. That's not in my imagination.' If you look at the letter of the law, I'm sure Facebook is probably in compliance, but ethically their business practice extended beyond what people turned out to be comfortable with. I have a feeling that that same kind of thing is going to happen with employees.
Probably we'll see this kind of objection happening among fairly sophisticated workers first, just like the Google Maven project was objected to by Google employees first. They're very sophisticated, intelligent, well-educated people who are used to being listened to.
The law will have to react to those kinds of things, which is typical, right? Laws always react.
What are good data science ethics policies that enterprises should adopt when handling both user data and employee data?
Norén: Well, one of the more creative things that we're trying to do is, instead of asking people at one point in time to confront a very dense legal document that says, "OK, now I've signed -- I'm not even sure what -- but I'm going to sign here and then I'll just move on with my life,' is to kind of do transparency all throughout the process.
Let's say you're a typical employee and you emailed your wife, girlfriend, boyfriend, kid, whoever, some personal connection from your work account. Now you've consented to let your employer look at that email traffic. They may or may not be reading the contents of the email, but they can see subject lines and who you're contacting and that may be personal for you. Instead of just letting that happen, you could say, 'Hey, it looks like you're emailing someone who we think is a personal connection. Just wanted to remind you that we are able to see ...' and then whatever your agreement is.
Remind them of what you're able to see and then you say something like, 'You know, if you were to contact this person after hours or on another device or outside of this account then we wouldn't be able to see that.' To encourage them to take their own privacy a little bit more seriously on a daily basis right at the moment where it matters rather than assuming they're going to remember something that they signed three years ago. Even three minutes ago. Make it really accessible and then do that transparent kind of thing throughout.
Maybe they're still OK with it because it's just email. But maybe then you also use some of the information that you have about those emails. Like, 'OK, I can see that Jane is totally comfortable emailing her mom all the time.' But then if Jane leaves the company, maybe that's some of the first steps you investigate and then delete. So not only do you make transparency kind of an ongoing process where you're obtaining consent all the way along for doing what you're doing, or at least providing your employees some strategy for not being surveilled, but then once they leave, you probably -- as an employer -- want to maintain some of the data that they have.
Certainly from the cybersecurity perspective, if you're trying to develop predictive algorithms about, 'What does a typical employee working in accounting do?' you don't just want to delete all their data the second they leave because it's still valuable to you in terms of creating a baseline model of a typical employee, or in this case maybe if you require creating a baseline model of that employee, it's still really valuable. But you probably don't need to know all the times that they were emailing their personal connections. Maybe that's something that you decide to decay by design. You decay out some of the most privacy-sensitive stuff so that you can keep what is valuable to you without exposing this person's private communications any more than they would need to be.
What are the ethical issues with storing so much data?
Norén: One of the things about these contracts is the indefinite status of holding onto data. We're very skeptical about hoarding data. We're very picky about what we keep. And we try to find ways to take the stuff that might be not all that valuable to us but very sensitive to the individual because it's personal or who knows might make it sensitive, and deleting that kind of thing first.
The right to be forgotten is that someone's going to come to you and say, 'Hey, would you please forget this?' Just as a good social scientist, we know that only a very select group of people ever feel so empowered and so informed of the right to go do such a thing. So it's already kind of an unfair policy because most people won't know how to do that, won't know that they can do that. We feel like some of these broader policies are actually more fair because they will be applied to everyone, not just the privileged people who are entitled to their rights, and they're going to demand these things and figure out how to do it. So those are the kinds of policies that we're looking at.
We have decided never to store the contents of people's emails in the first place. That also falls in line with our "do not hoard" policy. We don't need to know contents of emails. It doesn't give us anything additional for what we need to do, so we're just not going to store it. And we're not going to get transfixed by what lots of data scientists get transfixed by which is the idea that in the future if we have all the data now, we'll do this magical thing in the future that we haven't figured out yet. No, that fairy tale's dead here.