Peloton rides, runs, rows with AWS for data management
The connected fitness company has long used AWS tools. When its data volume surged during COVID-19, Redshift was critical -- and still is as the company attempts a fiscal comeback.
As Peloton's use has exploded, so has its reliance on Amazon Redshift.
Peloton built its data operations with AWS from its start in 2012 when AWS gave Peloton some of its seed funding, according to Jerry Wang, Peloton's director of data engineering.
"So from day one, Peloton built its infrastructure around AWS," he said.
On day one, however, Peloton was a startup in an industry -- connected home fitness -- that didn't yet exist.
People had stationary bikes, treadmills and weight sets in their homes, but they weren't connected to anything. Those with home gyms worked out in isolation -- or perhaps with a personal trainer.
There were also spinning studios where people went in person to take guided cycling classes, studios specializing in disciplines such as yoga and Pilates, and gyms where people went to lift weights, work out on cardio equipment and take various types of group fitness classes.
But the worlds of home workouts and group fitness were separate.
Peloton set out to change that, first by offering a combination of hardware -- bikes -- and software that enabled customers to take spin classes in their homes, and later expanding into other exercise modalities such as yoga, running and strength training.
Just as it had few customers at its start, it had little data. That changed gradually at first, and then exponentially at the onset of the COVID-19 pandemic.
And throughout Peloton's growth into a company with revenue totaling $3.5 billion in 2022 despite its recent fiscal problems and attempted turnaround, AWS has enabled Peloton to manage its data with the Redshift cloud data warehouse as the centerpiece it uses to gain the insights to make decisions and take action.
Peloton wasn't always the formidable combination of technology and fitness it became.
The company didn't have more than 50 instructors teaching classes in three languages, studios on two continents and classes in a dozen different exercise modalities from high-intensity interval strength and cycling to yoga and meditation.
It didn't have commercials during the Super Bowl or boast celebrity customers such as Miley Cyrus and Usain Bolt, and it wasn't part of the pop culture zeitgeist.
Instead, when it emerged from stealth in 2014, the company had a studio in New York City's Chelsea neighborhood where a handful of instructors led classes that were streamed to a small clientele group who purchased Peloton bikes for more than $2,000 and subscribed to its service for an additional $39 per month.
Back then, it was generally viewed as a premium brand rather than one for a mass audience.
And it stayed that way even as it hired more instructors, developed a treadmill to sell in addition to its bike and added new class types -- running and yoga among them.
AWS, meanwhile, released Redshift the same year Peloton was founded. And it was only a year after Peloton streamed its first class in 2014 that it started using Redshift as its data warehouse, according to Wang.
At the time in 2015, Peloton had just 10 instructors teaching only cycling classes. The company didn't generate a lot of data back then when compared with the large enterprises of the time. But it grew quickly to a user base of more than 100,000 by 2017, according to Peloton's S-1 filing in 2019 before its initial public stock offering.
As Peloton grew, so did Redshift, which competes with data warehouses including Google BigQuery, Snowflake, Microsoft's Azure SQL Database and Azure Synapse, and Firebolt.
Beyond workout data, Peloton collects three primary types of data to fuel its data science and analytics initiatives: customer data, financial transaction data and social media data. And as Peloton's data volume increased during its early years with its customer base on the rise, Redshift met its needs.
"The most important thing that we need is a data warehouse that can support large amounts of data," Wang said. "On the market, there are only a handful of mature products that have the capability to support petabytes of data."
Jerry WangDirector of data engineering, Peloton
In addition, with Peloton's infrastructure built on AWS, Redshift fit right in, Wang continued.
"We needed something that worked seamlessly with AWS," he said.
Wang noted that while other tech giants also offer a suite of products that enable data management and analytics, Peloton has not been tempted to leave AWS given the evolution of Redshift, in particular.
Among other updates, in 2019, AWS boosted Redshift's automation capabilities. In 2021, AWS introduced a serverless option that automatically scales up and down depending on customers' needs, enabling them to better control cloud spending. And AWS is continually increasing both speed and capacity.
Those added capabilities kept Peloton with AWS through its early growth -- and keep Peloton with AWS now following the exponential growth that came in 2020 and 2021.
"The most important thing is the continuous improvement from Redshift," Wang said. "Over the years, as the business grew and we required more support from the underlying software -- in this case, Redshift -- it had the capability to meet our increasing demand."
Peloton never needed AWS' continuous advancements more than in 2020 and 2021.
Peloton ended its 2019 fiscal year in June 2019 with just over 500,000 subscribers and subscription revenue of $181.1 million for the year.
At the time, gyms and fitness studios were still open. No one knew COVID-19 was coming. A year later, however, the entire country was in lockdown. Gyms and fitness studios were closed, and the only way people could work out was with what they had at their personal disposal.
Connected fitness provided a way to get professional-quality instruction without having to be in the same physical space as another person. And by March 2020, when lockdowns began in earnest, Peloton offered a cheaper option than the $2,000 bike with its $39 monthly subscription.
In 2018, Peloton introduced an app-only subscription for less than $20 per month. Users had to supply their own hardware -- a non-Peloton bike or treadmill -- and they didn't have access to the performance metrics Peloton measures and provides with its equipment. But they had access to the classes.
So when the pandemic hit, Peloton's popularity exploded.
Only three months into the pandemic, Peloton reported that subscription revenue had more than doubled to $363.7 million for fiscal 2020. A year after that, subscription revenue was up to $872.2 million for fiscal 2021.
Meanwhile, Peloton's data volume, as a result of the exponential customer growth, also surged. While subscription revenue grew fivefold over the course of two years, data volume grew tenfold each year, according to Wang.
Now, of all the data Peloton manages, 95% was generated after the onset of the pandemic.
At the same time that Peloton started experiencing substantial growth in 2019, AWS introduced concurrency scaling in Redshift.
The feature enables customers to run hundreds of concurrent queries, automatically scales up Redshift's processing power to meet demand at a given moment and scales it down when there's less demand. That subsequently enables organizations to pay only for the processing power they actually need and helps keep costs under control, which was a concern for Peloton with data volume increasing at such a rapid rate.
In addition, the storage capacity AWS continued to add to Redshift addressed Peloton's increasing needs. In particular, the launch of Redshift Serverless in 2021 was critical, according to Wang.
"Those addressed some of the pain points we had," he said. "It gave us a cheap solution to bump up our storage. Redshift Serverless was the perfect solution for our growth."
But while Redshift Serverless enabled Peloton to stay on top of its soaring data volume, it could not help with a few poor forecasting decisions that led the connected fitness company to suffer some economic hardship.
Like many other tech-related companies that experienced substantial growth during the pandemic, Peloton overextended itself. Since the first couple of years of the pandemic, its stock has tumbled, and it closed many of its retail stores in North America and laid off several thousand employees.
Ultimately, its founder and CEO at the time, John Foley, was forced to resign, and a new regime led by former Spotify CFO Barry McCarthy was brought in to execute an economic comeback.
As Peloton works with its data, it builds pipelines to extract the data from its source, stores it in its data warehouse, and transforms the data to prepare it for data science and analytics.
Redshift serves as the hub for all that activity with AWS Glue acting as Peloton's primary extract, load and transform tool. In addition to AWS products, Peloton uses Tableau and Looker to analyze data and DBT Labs to assist with data transformation.
"Over the years, other vendors have approached me that are similar to Redshift," Wang said. "Each has their own strengths. But my concerns are whether they can effectively process the volume we have, whether they work well with the rest of our infrastructure and how much it would cost to migrate from Redshift."
The one scenario that might tempt Peloton to leave Redshift for another vendor's data warehouse is if that data warehouse offers needed features that AWS does not, Wang continued.
"But over the years, I haven't seen that kind of deal breaker," he said. "There might be a time that a new vendor has a feature which is attractive and AWS may lag behind, but Redshift has always added it to its roadmap and released it in time."
Concurrency scaling, in fact, is an example of a feature first introduced by another vendor that AWS added to Redshift in time to meet Peloton's needs.
Peloton's ride with AWS hasn't always been smooth.
Beyond concerns about cost control and capacity that were addressed by advances in Redshift, Wang noted that Redshift's support for JSON -- a format used to exchange data between web clients and web servers -- didn't meet Peloton's requirements. In addition, there was a time Redshift didn't ingest streaming data.
But as with the additions of concurrency scaling and a serverless option to alleviate concerns about cost control and capacity, AWS improved its support for JSON and added streaming data ingestion capabilities.
Now, as Peloton works to regain its economic footing, the connected fitness company is expanding its use of AWS.
For years, the company has only used one instance of Redshift. Going forward, Peloton plans to add up to a dozen instances of Redshift Serverless tailored to meet the various needs of self-service business users across Peloton's organization and essentially set up a data mesh architecture.
"We want to build the core data, and then we want to deploy multiple serverless instances so the business teams can be more self-service," Wang said. "The centralized team will provide the baseline, and then serverless instances will allow the different business teams to become self-service."
As Peloton implements new instances of Redshift Serverless and expands its use of AWS products, it is now a collaborator with AWS, according to Wang.
Peloton serves as one of AWS' testers for tools in development, and the companies conduct biweekly meetings in which they discuss features currently in development as well as others that AWS might add to its roadmap.
"Both of us benefit from this collaboration," Wang said. "We've built a feedback loop between us and the Redshift team, and we work together to build new data products. They are eager to learn about what their customers are doing, and Peloton represents many of the typical business use cases given the size of the company and the volume of the data."
Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.