kirill_makarov - stock.adobe.com
ShopRunner uses Databricks for machine learning in retail
Databricks user ShopRunner talks about the tools showed at Spark + AI Summit 2019, such as MLflow and Databricks Delta Lake. And Datameer reveals new Databricks integration.
ShopRunner sells a subscription service that helps retailers offer customers free two-day shipping. The nine-year-old e-commerce platform vendor sees itself as an alternative to Amazon, one fueled by a distinctly competitive corporate ethos.
It's sort of "an anti-Amazon collective," Michelangelo D'Agostino, vice president of data science and engineering at ShopRunner, said of ShopRunner and its retail partners. CEO Sam Yagan has called the company "the rebel army."
ShopRunner relies on machine learning in retail technology -- some is developed in-house, and some it makes available through technology vendor partnerships -- to power its technology platform and membership service.
Its connected network of retailers, as well as its promise of free two-day shipping to members, makes ShopRunner a notable competitor to Amazon's online shopping platform, even as the tech giant recently promised it will soon make one-day shipping a free option for Amazon Prime members.
One of ShopRunner's key partnerships is with Databricks, a unified analytics and machine learning platform vendor.
ShopRunner relies on some of the core Databricks platform features for machine learning in retail and also has started to use newer ones, such as MLflow and Delta Lake, the open source tool for managing data lakes that was unveiled at the Spark + AI Summit 2019 in San Francisco.
Data warehouse vendor Datameer, also at the April 2019 conference, showcased a new tool and a new partnership with Databricks.
The Databricks-ShopRunner partnership
ShopRunner signed a contract with Databricks in late 2017 and has been working with the larger vendor's technology since 2018.
Despite its name, ShopRunner does not provide logistics to enable the two-day shipping service. Instead, the company sells the paid two-day shipping membership service to retail customers, which number about 100. Retail partners bring in more customers through the partnership, as well as useful data captured from members.
The company uses Spark to process all that captured web-scale behavioral data for machine learning in retail, which D'Agostino said amounts to about one terabyte per day. ShopRunner processes almost all of the data within the Databricks platform, which it then uses to power recommendation systems, marketing efforts and predictive models.
This has created a unified platform for ShopRunner's retail partners with tight integrations, making the platform a one-stop location to buy products from many retailers, much like Amazon's own platform.
Using the Databricks platform has "drastically simplified onboarding and let us put stuff into production way more quickly," D'Agostino said.
New open source tools
ShopRunner has been experimenting with MLflow, an open source machine learning management tool originally developed by Databricks, for its machine learning in retail technology.
The e-commerce vendor has been a beta tester of the Databricks-hosted version of MLflow, D'Agostino said, and "so far, we really like what we see."
"Experiment tracking and model versioning are super important and are rapidly evolving, and we like that Databricks makes it so easy to get started with MLflow," he said.
MLflow 1.0, revealed at Spark + AI Summit 2019, is set to be out sometime in May 2019.
ShopRunner also used Databricks Delta before Databricks open sourced it and renamed the product Delta Lake at the conference.
The company hasn't used it much, D'Agostino said, but noted that features such as data set versioning and time travel are useful for reproducing models.
However, D'Agostino noted that the tool provides some of the same capabilities that Snowflake, a cloud data warehousing vendor, offers, except Delta is now open source and runs on data stored in the Apache Parquet format.
"We look forward to trying it," he said.
Data on Delta Lake
Users, partners and analysts appeared to greet the Databricks open sourcing news at the Spark + AI Summit, held April 23 to 25 at the Moscone Center.
Delta Lake fills a gap but one that is also filled by software from vendors like Cloudera, AWS, Microsoft and Google, Forrester analyst Mike Gualtieri said during the conference.
"Databricks is smart to open source it because its success will depend on other vendors adopting it and building enterprise tooling for it," he said.
Databricks, Gualtieri continued, is successfully negotiating the challenge of adding value, while not alienating vendors. But, he added, besides other vendors, Databricks faces competition from other open source platforms, like Apache Flink.
"The tech world doesn't like dominance in any one area, so it tends to allow competitors to rise," Gualtieri said.
Datameer at Spark + AI Summit
In other Spark + AI Summit news, Datameer, an analytics lifecycle platform vendor, showcased a new integration with Databricks.
The integration will help bring Spark into the platform, Frank Henze, vice president of innovation at Datameer, said in an interview.
"Using the horsepower of Spark computation means we have done something to replace our formal Hadoop with a Spark cluster," Henze said.
The Datameer platform offers tools to more easily and intuitively visualize and interact with data natively on Spark, the vendor said.
"Overall, what we feel is we can help Databricks customers to bring the ease-of-use data preparation to their customers," Henze added.