your123 - stock.adobe.com
Databricks platform fuels analytics at State Department
The vendor's AI and machine learning capabilities have enabled the government agency to improve the effectiveness of its data analysis technology and its overall efficiency.
When the U.S. Department of State implemented a plan to better turn data into insights, it chose Databricks as its primary data preparation platform and the fuel for the advanced analytics needed to effectively carry out the agency's responsibilities.
Given its mission advising the president on all matters related to foreign policy and helping set the nation's foreign policy through treaties and agreements with other countries, the State Department collects massive amounts of data critical to the safety and security of the nation.
Like many organizations, the State Department collects data from workplace applications such as Salesforce and ServiceNow. But beyond those, it gathers data from emails, phone calls, inter-agency communications, communication and social media platforms such as WhatsApp, and other sources.
And much of the data collected through those many channels is managed and stored in isolated repositories.
In an attempt to better manage all that data, join it together and make key information easy to access when needed -- sometimes in real time as world events unfold -- the State Department launched the Center for Analytics in March 2020 to better transform data into fuel for foreign policy decision-making.
As part of that transformation, the State Department deployed Databricks' platform about 18 months ago.
Databricks, founded in 2013 and based in San Francisco, is a data management vendor whose lakehouse platform combines the capabilities of traditional data warehouses with data lakes.
"Since we stood up Databricks, it's become that central data platform for us to source our data and clean the data and enable it for advanced analytics," Mark Lopez, specialist master at Deloitte, a consultant for the State Department, said recently during Data + AI Summit, a user conference hosted by Databricks.
Mark LopezSpecialist master, Deloitte
At the time the State Department adopted and deployed Databricks for data preparation and management, the department also wanted to improve its overall analytics operations to make it easier to find key information at the right moment it's needed in order to derive insights that result in actions.
But within that overarching goal of improving the efficiency and effectiveness of its analytics were more specific objectives.
Among them were enhancing Freedom of Information Act requests with the Databricks platform's augmented intelligence and machine learning capabilities.
The U.S. government received nearly 800,000 FOIA requests in fiscal 2020, and though the departments of Homeland Security and Justice received the most, the State Department also received a high volume of requests.
Finding the exact information requested among the trillions of documents the State Department keeps is often difficult, but now a combination of machine learning and AI capabilities like natural language processing and text mining is making the process more efficient.
In addition, the State Department wanted to use machine learning and AI to uncover insights from mission-centric records, conduct investigations and enhance security, respond to information requests from Congress and provide evacuation assistance to people abroad who need to quickly leave a dangerous location.
By combining the Databricks platform's AI and machine learning capabilities in concert with other analytics tools, the State Department was able to accomplish its goals, according to Alan Gersch, also a specialist master at Deloitte.
The State Department now uses Databricks to build machine learning models that feed BI dashboards from such vendors as Tableau that are used to inform policy decisions. The agency also uses Databricks-fueled models and NLP to enrich archived data with metadata to accelerate searches, and combines Databricks with Microsoft Azure Data Factory to bring disparate data sources together to automate the reports the agency delivers to the president and secretary of state.
As a result, processes that previously took days now take less than an hour, in many instances.
"Databricks acts as the force multiplier and the glue that integrates other systems together and enhances them and accelerates them," Gersch said.
Applying the technology
The U.S. first sent troops to Afghanistan in the wake of the Taliban's attacks on the World Trade Center and Pentagon on Sept. 11, 2001.
Twenty years later, on August 30, 2021, the U.S. withdrew the last of its troops. But just because all U.S. troops had been removed from Afghanistan, that did not mean the U.S. was finished evacuating people from the region.
Some U.S. citizens remained in Afghanistan. So did many Afghans who had assisted the U.S. and others who were in mortal danger as a result of their actions during the 20 years of war between the U.S. and the Taliban.
Identifying who needed to be evacuated, however, was a complex undertaking. So was the process for vetting the different groups of people that might want to leave Afghanistan with the help of the U.S.
In order to identify and assist the many people needing to get out of Afghanistan, the State Department established a task force of data scientists, data engineers and data analysts, according to Lopez. And using tools from the Databricks platform along with Azure Data Factory, the task force over several months identified and sourced relevant information needed throughout the vetting process.
"We needed to understand where these people are, do they intend to leave, who is part of their family," Lopez said.
Ultimately, the Databricks platform along with Azure Data Factory enabled the State Department to ingest data from disparate sources, bring it together in one location, discover which data points might be connected and be related to the same person or one person and their family members, and get people out of Afghanistan who needed to leave.
"The goal was to get people on flights out of Afghanistan, and at some points we had hundreds of flights going out each day," Lopez said. "A lot of this was enabled using Databricks and the Azure stack as well. Leveraging Databricks as our central data processing engine has really enabled us to integrate a lot of source systems and process and scale up fast."