Databricks fuels analytics for Spain's top soccer league
After prioritizing data-driven decision-making over the last decade, LaLiga is now using Databricks to inform its clubs and fans about play on the field.
Count Spain's premier soccer league among those organizations that are fully committed to analytics.
LaLiga, home to renowned clubs such as Real Madrid and Barcelona, and players like current leading scorer Karim Benzema and longtime star Luis Suárez, adopted the Databricks Lakehouse Platform in 2018. The league now uses it to inform both its 20 clubs about the play on the pitch as well as its fans as they follow matches on their televisions and mobile devices.
Databricks, founded in 2013 and based in San Francisco, is a data lakehouse vendor whose cloud-based platform combines the benefits of data warehouses and data lakes. Within its lakehouses, users are able to query and analyze structured data using SQL as with a data warehouse and unstructured data within the flexible architecture of a data lake.
LaLiga, meanwhile, has been in operation since 1929 and is the top soccer league in Spain. Real Madrid has been its most dominant club, claiming 34 championships, while recent stars include Cristiano Ronaldo and Lionel Messi.
A shift toward analytics
A little less than a decade ago, the league felt it was behind when it came to analytics, according to Tom Woods, marketing and communications lead for LaLiga Tech.
Many of the world's largest enterprises have been using analytics to inform business decisions for years. And sports organizations including Major League Baseball in the United States and the governing bodies of tennis worldwide -- the Association of Tennis Professionals and the Women's Tennis Association -- have used analytics for a couple of decades to maximize player performance and inform their fan bases.
For example, as documented in the book -- and subsequent movie -- Moneyball, baseball's Oakland Athletics were early pioneers of analytics in sports, using data at the turn of the 21st century to find value in players overlooked by other teams and compete for championships despite having one of the smallest payrolls.
Into the second decade of the 21st century, however, Spain's premier soccer league had not yet begun using analytics.
Finally, when Javier Tebas took over as league president in 2013 and made analytics a priority, that changed, according to Rafael Zambrano López, head of data science for LaLiga.
"Our president decided ... to make us more data-driven, and our department was started [at that time]," he said. "We started to build everything from zero, and have gone from there to now."
The impetus for the move to become more data-driven came from a sense of responsibility to the league's clubs, Woods added. He noted that while many sports organizations had started to recognize the value of analytics, European soccer as a whole had not yet become data-driven.
Now, however, among other major European soccer leagues, the German Bundesliga uses AWS for analytics and the English Premier League uses Oracle Cloud.
"We are responsible for guiding our clubs and saw a responsibility as a league to help these clubs to adapt to new ways of doing things," Woods said. "We saw it as necessary for keeping things running efficiently and engaging the fans."
That meant developing a decision-making ecosystem that could help the on-field product and fan experience, and even help the clubs with things like fraud detection and preventing match fixing, he continued.
At first, LaLiga developed its own data management and analytics systems. Eventually, however, the league decided to adopt a data and analytics platform and blend the soccer league's existing capabilities with those of a vendor.
According to Zambrano López, LaLiga was introduced to Databricks through its relationship with Microsoft and chose Databricks for its speed, ease of use and low cost.
Doing more with data
Now, after adopting Databricks a few years after making its commitment to analytics, the Spanish soccer league captures more than 3 million rows of data per match. It puts that data into action by delivering reports to teams and serving up statistics to fans within seconds as games play out on the soccer pitch.
Rafael Zambrano LópezHead of data science, LaLiga
LaLiga strategically positions cameras in each of its stadiums, and it's through those cameras that the league captures all that data from every match. The cameras track each player's every move, taking 25 frames per second, and send that data into Databricks, where it's automatically fed into data models developed by Zambrano López and his team for real-time analysis.
"The cameras collect the position of the players and the ball, and we combine that data with the event data -- the passes, goals, red cards, yellow cards, etc. -- and that allows us to create our metrics," Zambrano López said. "There are about 25 metrics, and we can share those with every club so they can improve."
One such metric is goal probability.
When a player takes a shot -- whether it actually results in a goal or not -- within about 30 seconds, the combination of Databricks and data models LaLiga developed can determine the probability that the shot would result in a goal and share that information with teams, broadcasters and fans as the match is taking place.
Teams can immediately use such information to determine whether players are helping the team with smart play or hurting it -- for example, by taking shots that have little chance of going in when a pass might have resulted in a better scoring opportunity. Fans, meanwhile, are better informed throughout the course of a match.
But Databricks' analytics capabilities are enabling the league's teams to know far more than just goal probability during a soccer match.
By tracking the players' movement -- how much they run during a match, how their speed changes and how their gait may differ from one match to the next due to fatigue -- clubs can attempt to predict and prevent player injuries before they occur.
In all, using Databricks, LaLiga provides each team with a 150-page report after each match.
"They are handed a huge amount of data that they can then go analyze internally," Woods said. "We give them some assistance with how to analyze the information well, but many clubs are now investing in their own analytics teams. And many of them now attribute victories or particularly good seasons to better understanding of the competition."
He added that while all the clubs have been receptive to analytics, about five have most aggressively invested. Sevilla, currently tied for second place in the standings behind Real Madrid, is one example.
More to come
In September 2021, the league launched LaLiga Tech to supply other sports organizations -- not just in soccer -- with the analytics capabilities LaLiga developed in concert with Databricks and its other technology partners, including Microsoft.
"We're at the start of rolling out a whole new business from within LaLiga where everything we have created up to this point is being adopted by third parties across the [sports] industry," Woods said. "We now see a bit more awareness about going to a digital model, but it's piecemeal. We're in a position of providing these services to the rest of the industry, and Databricks is a central part of that."
Meanwhile, the league's centralized data team meets with the clubs about once every two weeks to get feedback on the data it's providing the clubs, and it's working to add new tools to analyze play. According to Zambrano López, LaLiga is experimenting with Databricks tools such as MLflow and Delta Lake.
"We are always exploring new things to do with Databricks," he said.
Baseball's Rangers seek analytics edge with Tableau