Autumn was a busy season for tech conferences. Experts across the fields of development, testing and operations shared expertise that could help others in the industry push past some of their own DevOps transformation or testing obstacles.
Here are a few insights gleaned from O'Reilly's Velocity Conference in New York and DevOps Enterprise Summit (DOES) in Las Vegas.
Testing at a massive scale
Technology giants Microsoft and Intel separately explained how they improved internal QA strategy and test coverage.
Sam Guckenheimer, product owner of Azure DevOps at Microsoft in Redmond, Wash., explained at Velocity how Microsoft vastly improved its internal testing speed and reliability. Like other organizations, Microsoft made mistakes in its internal testing, such as code thrown over the wall to test automation engineers, testing outsourced to vendors and lots of test redundancy. "I want to be clear: We sucked [at testing], too," Guckenheimer said.
So, over two-plus years and 42 sprints, Microsoft implemented a number of changes. The company realigned test and dev engineers, rolled them into multidisciplinary feature teams and adjusted which metrics it measures. But the big change was a new test taxonomy that shifted more tests to the left, before the CI stage and during pull request builds. As a result, Microsoft scaled to handle roughly a half-billion internal tests per day to support about 78,000 deployments.
Many organizations treat QA as the middle child of the DevOps pipeline, where it receives less appreciation and visibility than other stages, said Manish Aggarwal, software engineering manager at Intel in Austin, Texas, at the DOES event. Additionally, many QA cycles shrink when delays hit app dev teams, despite an ever-growing test matrix.
As part of its DevOps transformation and focus on continuous improvement, Intel adopted Electric Cloud to create a custom UI and automate many tests, as well as the subsequent approval gates, change requests and pipeline triggers. The company's daily test cases increased from hundreds to about 15,000, and code coverage flipped from more than 85% unknown code to more than 87% known code. It also shortened QA lead time per CI cycle from days to 10 minutes.
Get DevOps off the ground
Larger organizations have embraced DevOps dojos to quickly indoctrinate staff members to the principles and practices of that methodology. First, an organization identifies its culture and delivery challenges, then it procures the necessary resources and generates interest to educate cross-departmental IT staff in sprints.
Also at DOES, Keanen Wold, DevOps transformation and developer practice leader at Delta Air Lines in St. Paul, Minn., walked through the steps with which his organization created its DevOps dojo. Among its primary challenges, Delta needed to procure physical space, determine duration for sprints and bring in the proper coaching. The airline company met with some established enterprise dojo operations, such as Target, Ford and Honeywell, to get off the ground.
A number of challenges ahead of Delta's DevOps dojo launch in June 2018 threatened to delay the project, such as the facility being incomplete, but Wold said his team worked through it. Since the dojo launch, two groups -- nine total teams -- completed the program, and some of those workers have launched features into production thanks in part to the training, he said.
Delta's DevOps transformation was an emotional journey, Wold said, but he urged that DevOps teams must embrace the uncomfortable. "If you're preparing to take this journey on your own, just recognize that you're going to get punched in the kidneys, just when you think you're off and running," he said.
Polish your apps, within reason
Code refactoring can help alleviate performance concerns and tidy up unwieldy code. However, it's not a task for enterprises -- or developers -- to undertake lightly.
App dev teams should only attempt a refactor when their organization shifts product requirements, adopts a new technology or requires improved app performance, said Maude Lemaire, senior software engineer at Slack, based in San Francisco. As with many developer tasks, a code refactor must positively impact the business's bottom line to be viable.
In her presentation at Velocity, Lemaire, tongue in cheek, defined refactoring as "the process by which we take a pile of poo and turn it into a shinier pile of poo" -- complete with the requisite emojis. A refactor can unearth more problems, and it's far from a cure-all. So, organizations must attempt it for the right reasons and with a careful approach, she said.
As two asides in her presentation on Slack's code refactoring efforts, Lemaire said that the Slack workspace with the most public and private channels, including direct messages, numbered more than 80,000 channels as of October. Also, as Slack documented code fixes and improvements, Lemaire's team discovered that Slackbot had engaged in millions of conversations … with itself.
AI obscurity stalls implementation
Many software vendors tout their AI capabilities, but the inability to debug neural networks limits enterprise implementation. At least 95% of AI projects fail to get off the ground because of this inability to test and troubleshoot, said Torsten Volk, analyst at Enterprise Management Associates based in Boulder, Colo., at DOES. "When you [hear] companies are fully embracing it, that's pure BS. It's experimental," he said.
Examples include IBM Watson for Oncology's failure to accurately deliver cancer care recommendations, according to a report by STAT News and The Wall Street Journal, as well as numerous image recognition failures across vendors. Worse still, it's not always apparent when an AI model delivers bad information, which can prove costly, Volk said.
Simply put, with no clear way to test AI, the risks still outweigh its reward. "You push that button, and your GPU farm runs for a week, costs you $25,000 and achieves nothing," Volk said. "There's no test software that makes up the corner cases for you."