Data Management, Analytics & AI

  • data-integrationData integration is hard. Over the years, of all the technologies and processes that are part of an organization’s analytics stack/lifecycle, data integration continuously has been cited as a challenge. In fact, according to recent ESG research, more than 1 in 3 (36%) organizations say data integration is one of their top challenges with data analytics processes and technologies. The data silo problem is very real, but it’s about so much more than having data in a bunch of locations and needing to consolidate. It’s becoming more about the need to merge data of different types and change rates; the need to leverage metadata to understand where the data came from, who owns it, and how it’s relevant to the business; the need to properly govern data as more folks ask for access; and the need to ensure trust in data because if there isn’t trust in the data, how can you trust the outcomes derived from it?

    Whether ETL or ELT, the underlying story is the same. At some point, you need to extract data from its source, transform it based on the destination tool and/or merging data set, and then load it into the destination tool, whether that be something like a data warehouse or data lake for analysis. While we won’t get into the pros and cons of ETL or ELT, the ETL process is still prevalent today. And this is due in part to the mature list of incumbents in the ETL space, like Oracle, IBM, SAP, SAS, Microsoft, and Informatica. These are proven vendors that have been in the market for multiple decades and continue to serve many of the largest businesses on the planet. There are also several new(ish) vendors looking to transform the data integration market. Big companies like Google (via the Alooma acquisition), Salesforce (via MuleSoft), Qlik (via Attunity acquisition), and Matillion all have growing customer bases that are embracing speed, simplicity, automation, and self-service.

    Now whichever your approach is to addressing data integration, I keep hearing the same things from customers: “Vendor X is missing a feature” or “I wish I could…” or “I can’t get buy-in to try a new solution because the technology isn’t mature” or “that sounds great, but it’s a lot of work and we’re set in our ways” or “I’m just going to keep using Vendor Y because it’s too disruptive to change.” And every time I hear these common responses, I ask the same follow-up question: what’s your ideal tool? Everyone wants to ensure the technology is secure, reliable, scalable, performant, and cost-effective, but I wanted to understand the more pointed wants based on the actual folks who are struggling with data integration challenges day in and day out.

    Without further ado, I present to you the top list of “wants” when it comes to an ideal data integration tool/product/solution/technology:

    1. Container-based architecture – Flexibility, portability, and agility are king. As organizations are transforming, becoming more data-driven, and evolving their operating environments, containers enable consistency in modern software environments as organizations embrace microservice-based application platforms.
    2. GUI and code – Embrace the diversity of personas that will want access to data. A common way I’ve seen organizations look at this is that (generally speaking) the GUI is for the generalists and the code behind is for the experts/tinkerers. By the way, this mentality is evolving as modern tools are looking to help the generalists and experts alike with more automation via no-code/low-code environments and drag-and-drop workflow interfaces.
    3. Mass working sets – Common logic or semantic layers are desired. The last thing an engineer or analyst wants to do is write unique code for each individual table. It doesn’t scale and becomes a nightmare to maintain.
    4. Historic and streaming – Using batch and ad-hoc on historic and streaming data will ensure relevant outcomes. Organizations increasingly want hooks to better meet the real-time needs of the business and that means real-time availability and access to relevant data without having to jump through hoops.
    5. Source control with branching and merging – Code changes over time. Ensure source control is in place to understand how and why code has changed. Going hand in hand with source control is the ability to support branching and/or merging of code to address new use cases, new data sources, or new APIs.
    6. Automatic operationalization – This is focused on the DevOps groups. Ensure new workflows can easily go from source control to dev/test or production. Deployment is the first priority, but do not lose sight of management and the iterative nature of data integration processes as users, third-party applications, and data changes/evolves. 
    7. Third-party integrations and APIs – The analytics space is massive and fragmented. The more integrations with processing engines, BI platforms, visualization tools, etc., the better. And ensure the future of the business is covered, too. That means incorporating more advanced technology that feeds data science teams, like AI and ML platforms and services.

    While this list is by no means complete or all encompassing, it speaks to where the market is headed. Take it from the data engineers and data architects: they’re still primarily ETLing and ELTing their lives away, but they want change and recognize there are opportunities for vast improvement. And marginal improvements without massive disruption is the preferred approach. So a note for the vendors: it’s about meeting customers where they are today and minimizing risk as they continue on their data transformation journeys.

  • no-facial-recognitionThe recent announcement from IBM to withdraw from all research, development, and offerings of facial recognition will not stop facial recognition from being used by law enforcement or government entities. There. I said it. Facial recognition will continue on its gray area trajectory with or without IBM. But what IBM has done, specifically Arvind Krishna, is bringing attention to a growing concern that needs far more national and global attention. The use of facial recognition needs to be scrutinized for bias and privacy concerns. It needs oversight. It needs guardrails. Usage, especially from law enforcement and governing entities, needs to be transparent. And frankly, the technology needs to be better for it to work the way people envision.

    (more…)

  • GettyImages-1156502194Over the last several months, automation has seen a jump in interest. Operational efficiency has been a top priority for years, but as of late, it’s an even greater priority. For businesses, tasks or processes that used to be viewed as manageable but inefficient are now being scrutinized. The inefficiency aspect is being amplified and organizations don’t have a choice but to act. And one of those actions is to look into a trendy buzzword that is proving to be so much more: robotic process automation (RPA).

    So first off, what is it? RPA uses software and advanced technology like AI and ML to automate repetitive processes that are traditionally performed by a human. A software bot is configured to mimic structured actions to rapidly interact and transmit data based on established business workflows.

    We, as humans, require breaks. We have defined working hours. And whether we want to admit it or not, we’re predisposed to make a mistake here or there. This is especially true when executing a repetitive task over and over again. All it takes is a fat-finger typo entering information from a submitted form to create a ripple effect that could be catastrophic to a business or a customer.

    While RPA bots don’t sleep, don’t stop, and (when programmed properly) don’t make mistakes, it’s easy to get lost in the potential of RPA. It’s important to not lose sight of what RPA is and isn’t. RPA is not a physical robot. It doesn’t think freely. It doesn’t have cognitive abilities. RPA does enable predictability and reliability in the time it takes to complete a task or execute a workflow from beginning to end. RPA does save humans countless hours completing mundane tasks, enabling them to focus on more important tasks and projects. RPA does improve operational and business process efficiency.

    So where are organizations today in their adoption of RPA? Enterprise Strategy Group research shows that nearly 1/3rd of respondents report their organization currently utilizes bots in production environments to help automate tasks. But what is interesting is that when looking at this data based on level of digital transformation maturity, it shines a spotlight on the continued separation of the more digitally transformed businesses over their less digitally transformed peers. ESG recently published a research brief that highlights some of these key findings. It can be found here.

    For more RPA info, stay tuned over the coming weeks as I’ll be doing a double click on the RPA market, as well as highlight best practices on how to get started.

  • Vaccines: Data’s Fight Against COVID-19

    vaccine-research-dataI’ve talked through a couple ways that data is helping fight COVID-19, from detecting and tracking an outbreak, to detecting within people. In this blog, I’ll be focused on research and elimination. While the world sets its sights on a hopeful slowdown similar to what is experienced with a seasonal flu, a true vaccination or cure is the ultimate goal. With more and more researchers throwing their collective hats in the ring, data sharing and collaboration is becoming key. What new information is available on the virus makeup? How is it evolving/mutating? Where are we making progress on vaccination development? What approaches have a higher likelihood of success? How can progress be shared across the globe to spur new ideas or rapid insight? All of these questions tie back to data, and using data science and AI to help answer them is rapidly becoming a go-to approach.

    (more…)

  • Robotic Process Automation Adoption Trends

    As organizations look for ways to streamline operations, improve efficiency, and reduce costs, they are increasingly embracing automation technology like robotic process automation (RPA). While some view RPA as an ultimate destination to achieve peak business and process efficiency, those organizations that view themselves as digitally transformed have already embraced it and have their eyes set on the next phase: intelligent automation, where RPA is paired with artificial intelligence (AI) and machine learning (ML) to not only interact with systems but also to predict future insights/outcomes based on trending data.


    For more information or to discuss these findings with an analyst, please contact us.
  • Detection: Data’s Fight Against COVID-19

    computer-imagingMy first blog in this series focused on using data, AI, and visualization to help detect and track the continued outbreak of COVID-19. The next topic in this blog series is focused on a different form of detection: detecting the virus in people. And it starts with understanding symptoms and ends with techniques to identify infected individuals at scale, which eventually helps shape individual treatment plans and resource allocation based on established virus hotspots.

    (more…)

  • Data Protection Conversation with Veritas (Video)

    In this latest edition of Data Protection Conversations, I talk with Doug Matthews from Veritas.

  • Outbreak: Data’s Fight Against COVID-19

    covid-19-dataWhile the world continues its fight against COVID-19, data is becoming one of the most prominent weapons for humans. Whether tracking an outbreak, detecting the virus on a case by case basis, preventing further spread, or eventually eliminating the virus altogether, data is fueling decision making from governing bodies to personal households. As we evolve to deal with the difficult norms of what today brings, the technology community is rising to answer a desperate call to arms. The value of trusted data has never been higher as actions are being taken based purely on statistical models and predictions. The need for collaboration on a global scale is paramount. Being able to rapidly respond to new information is now a matter of life or death. Over the next few weeks I’ll be revisiting and highlighting some of the ways data is being used in the ongoing fight against this historical and scary virus.

    (more…)

  • Veeam’s Cloud Program Celebrates 10 Years

    cloud-infrastructureVeeam’s maniacal focus on channel has paid off handsomely through the years. What is even more remarkable is how much has changed in what used to be a very traditional 2-tier distribution channel with VARS and integrators in the same 10-year timeframe. The ability of Veeam to help partners evolve to successfully adopt and make money with cloud technologies while avoiding disintermediation has been key in my opinion. 

    (more…)

  • The Advantages of a Data Science Team

    Enterprise Strategy Group research shows that using a formal data science team is tied to better business outcomes.

    Use this infographic to understand what that means for IT organizations in terms of total data use, the public cloud, serverless analytics, and more.


    For more information or to discuss these findings with an analyst, please contact us.
  • Covid-19 Will Change Backup and Recovery – Part 1

    GettyImages-1210228646There hasn’t been a conversation in the past 3 weeks in which a particular non-cyber (for once) virus hasn’t been discussed. Many of our clients are asking us what we believe will change from a product/service/solution strategy in the space of backup and recovery. I don’t have a crystal ball but based on recent research, and as the year progresses, I expect that we will be able to better understand what might be significant changes in how organizations approach IT and the topic of backup and recovery, including disaster recovery. 

    In the meantime, I would like to share what I expect will happen based on past trends, pre-Covid-19, and my broad perspective on the market. I will not name specific vendors–this is not about who’s better than the other at this or that…Look at this at a high level revisited checklist of what to look for in a solution. 

    Before we start: I believe a lot of the changes that we will see were already in motion; what we will likely see is an acceleration of underlying trends or needs.   

    What will not change

    Good data stewardship is not going away – probably ever – and this means that all the best practices for defining service levels (RPO/RTO) still apply. How you get there is likely to change, but business and IT fundamentals in this case remain the same. Data is an asset that must be protected, business must continue in the face of planned and unplanned interruptions, and compliance requirements must be met. Data growth is not slowing down, meaning that organizations will need to keep planning accordingly, and archive a growing volume of data.  

    This means that organizations will still need to look for solutions that meet all their SLA requirements and can accompany their data growth, or perform at scale, or fit in the environment with the proper set of integrations (hardware and hypervisor integration, for example).

    What will change – or is changing already – Part 1 of many…

    Management capabilities: It’s the pretty obvious one with staff now working remotely from home or having limited options to be in an office or data center. The winners of this phase will be solutions (whether on-prem, hybrid, or in the cloud) that reliably deliver advanced remote and secure management and deployment capabilities. For example, adding endpoint backup and recovery, or protecting new VMs created for the specific circumstances brought about by this crisis. Usability will be key against a backdrop of skills shortages (which began long before Covid-19) in data protection and adjacent IT areas (including cybersecurity). Further abilities are listed below. In addition, organizations that have coherent, broad, and deep reporting and alerting capabilities will be in great shape. However, I suspect this is a hurdle in many organizations.

    DR Testing, and testing in general: It’s a best practice everyone should be an expert at, and most solutions today offer many possibilities to conduct non-disruptive (to production) testing, often supplemented by AI/ML and automation in general. Our research shows that the more people practice recovery the more satisfied they are with their solution. Great news for vendors who have put forward testing education programs. Based on our experience and validation of many solutions in the market, I would say there is no question that great solutions exist and can be leveraged. Not every solution is born equal, but overall they’re pretty good. So if the technology is available and in place, the real question revolves around practice and skill sets. It’s a combination of process and people. In a pandemic you may not have the experts available to do it. It is therefore key that more IT generalists be trained (which ties back to the usability requirement mentioned above). This positions certain cloud-based solutions/services in a pretty good spot.

    I will stop for now, but stay tuned as I continue in my next blog….

  • The Advantages of a Data Science Team

    As organizations look to prioritize data-driven initiatives, the success of those initiatives will be directly tied to people, processes, and technology. While data science may seem aspirational or even foreign to some organizations, ESG research shows direct ties between organizations with a data science team and better use of data, better use of technology, and better business outcomes. For those organizations looking to drive greater business value through the use of data, a formal data science team can help.

    (more…)