A little knowledge is a dangerous thing. A lot of data is toxic.
Anthony Accardi, CTO at Boston online shopping company Rue La La, was paraphrasing author Nassim Nicholas Taleb to introduce a thoroughly modern business problem: being misled by a cacophony of data.
"It's very easy to follow that noise, to get lost and distracted and wind up in the wrong place," Accardi said at the Argyle 2018 CIO Leadership Forum in Boston earlier this month.
Being data-driven is the aim for organizations today -- using data to make smart decisions and move the business forward. Instead though, many are afflicted with a debilitating phenomenon Accardi called data poisoning. "It's what happens when we react to a large volume of data in ways that are harmful to ourselves, to our company and to our culture."
Data poisoning manifests in different ways in different contexts, he said -- unconscious bias leading to bad business decisions, for example, or unproductive arguments over product features. The antidote for all its permutations is building a healthy relationship with data.
Clear the way to intelligence
One area where data poisoning can germinate is business intelligence, Accardi said. On a screen he showed a cartoon in which several businesspeople are sitting at a conference table. One employee has a laptop open, and the caption above him reads, "Give me a moment to find unbiased data that supports calling you and your idea stupid." Chuckles arose from the room of about 100 IT executives.
"It's satire," Accardi said. "But think about what it means -- what it's saying is that regardless of whatever your agenda is, there's so much data out there that you can selectively filter in a way so that the data supports the conclusion you want to arrive at."
There are some healthy things companies can do to unbend warped agendas, though. One is to define what's important to the business in metrics -- profit, revenue, expenses, brand behavior -- and then determine how to measure them and how they relate to one another.
"That in and of itself averts a whole bunch of arguments about, 'Is this thing, what we're talking about, good or bad for the company?'" Accardi said.
Another remedy he prescribed is data visualization that people can use to quickly come up with creative solutions to business problems. Collaboration is "critically important," Accardi said. The best way to root out unconscious biases that could lead to unsound decisions is getting people together in groups. And their makeup should be diverse; that way, it's easier to spot potentially damaging inclinations and make better business decisions.
He gave an example of an exercise done at Rue La La: Everyone who uses visualization software is put into a room and divided into teams, and each team has people from different areas of the business. They're all given a "secret ingredient" -- a new data source, for example -- and directed to create a compelling story using that data. It's a highly effective way of tackling a new data set and "defining the edges" -- identifying where people are effectively working toward business goals and where there are pitfalls, Accardi said.
"The more you can do that type of cross-functional collaboration in business intelligence ongoing, the more you can really start to make a cultural change and address the data poisoning in the business intelligence realm."
Modeling good data health
Product development is also vulnerable. This is where "the most unproductive arguments happen," Accardi said. That's because there are a lot of important decisions that need to get made -- what to build, for example, or whether the product, once built, works the way it's supposed to.
To quickly reach a clear conclusion, do A/B testing -- blind studies that compare two versions of a product -- but shorten the feedback loop, he said. This is done by eliminating bias in the test design and including users who use the product every day.
"If you're measuring a financial metric, for example, you don't want to inadvertently put all your high-value members in your control and skew your results," Accardi said. And keep it simple and to the point, zeroing in on the metrics that best assess the feature that's being tested.
Do these things and "you can get a testing loop that's tight, that gives clear results and can miraculously unite a lot of people and cut through the noise that the data would otherwise produce."
Taking the agility concept even further is a serverless architecture -- a data model loaded into a database service and exposed through an API. Not only does it save companies the work of managing infrastructure, it helps get to "end-to-end testing much more quickly, all the way to the customer." When there are physical servers, Accardi said, the focus tends to linger on the nitty-gritty data modeling and a small group of data scientists who are removed from the business and customers.
"There's no easier way to settle some of these unproductive arguments than just having a voice of a customer weigh in in a very clear way," Accardi said.
That's especially important when working with machine learning and artificial intelligence, which Rue La La uses to personalize customers' shopping experience. These technologies are "opaque things where it's hard to understand how they actually work," Accardi said. So data scientists often work on separate projects and don't collaborate much, and there isn't a businessperson to be found. This results in a narrowed way of thinking.
A platform that facilitates collaborative model development can stanch this strain of data poisoning, he said, because code is "right next to data, right next to visuals, so you can piece together how the model is going and the data is driving a customer experience." Data scientists can work together on the same projects, and data engineers and businesspeople can also be involved.
"You start breaking down these siloed walls within your organization -- very powerful," Accardi said.
'There is a negative'
The problem of data poisoning was not lost on Argyle attendee Jeffrey Cunningham, director of enterprise architecture at Thomson Reuters in Boston. "There is so much overload. People are letting the data tell the story you want it to tell versus what you should tell."
Marc Schultz, senior data privacy manager at Staples, in Framingham, Mass., put it succinctly: "We're doing all this data collection -- and there is a negative."
Bill Gallagher, who's in senior IT at Boston University, said Accardi's presentation confirmed that applications organizations rely on are studded with inaccuracies, often because people enter in data incorrectly. That happens at BU, he said, where cleanup is done regularly.
"There's a lot of cleanup. I think that happens to all higher-ed departments," he said.
One way of keeping data in order is categorizing it better and in more detailed ways and having different people focusing on different types of data. For example, at BU lower-paid students cleanse more generic data while employees with master's degrees who are trained in data procedures scrub the specific, complicated data sets.
"One of the key things is updating applications in which the data going in has some naming conventions, has some quality checks and other tests so that we don't have messy data," Gallagher said, adding, "to minimize our data poisoning."
Editorial director Margie Semilof contributed to this report.