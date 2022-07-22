The open source Apache Beam batch and stream data processing technology is finding a home in a growing number of large organizations.

At the recent Beam Summit hybrid conference, users from Google, Twitter, Spotify, Adobe, Intuit, LinkedIn and others outlined how and why they are using the Apache Beam technology.

Beam became a top-level project at the Apache Software Foundation in 2017. Beam provides capabilities that enable organizations to manage data pipeline workflows for both batch and stream processing for data.

Apache Beam is widely used at Google, according to Kerry Donny-Clark, engineering manager for Apache Beam at Google.

During a keynote on July 18, Clark noted that Google uses Beam to support data processing for YouTube, Waze, the Vertex AI machine learning platform and the Google Dataplex data fabric.

Google has no overarching mandate directing its service teams to use Beam, but rather, each team came to use Beam because they found it met their needs, Donny-Clark said. He highlighted that Beam supports multiple languages including Java, Python and Go, which is helpful for developers who use different programming tools.

"There's no command that Google developers need to use Beam," Donny-Clark said. "But they found Beam useful for a wide variety of use cases throughout the company and of course, that tells me that Beam can support things truly at Google scale."

Spotify Wrapped powered by Beam stream data processing Streaming music service provider Spotify is also a big user of Apache Beam. In another keynote, Spotify data engineer Rickard Zwahlen said his organization has used Beam since it moved away from its own on-premises Hadoop cluster for data processing. One of the largest data processing jobs that Spotify runs is in support of the Spotify Wrapped service, which provides a year-end wrap-up for users about what music they listened to. At this point Beam pipelines are a large majority of all our scheduled jobs. Rickard ZwahlenData engineer, Spotify "At this point Beam pipelines are a large majority of all our scheduled jobs," Zwahlen said. Using Beam has not been without some problems for Spotify. A challenge Spotify faced when moving on from Hadoop was that much of the tooling that the company was using was written in the Scala programming language, which is not directly supported by Beam. So Spotify built its own open source project, Scio, that provides a Scala API to interface with Apache Beam.