ktsdesign - stock.adobe.com
Banco Hipotecario, a commercial bank and mortgage lender in Argentina, struggled to deploy its machine learning models. The proprietary software it used to develop the models was outdated and the bank couldn't use some of the new libraries in R or Python or keep track of the models.
That changed, according to Matías J. Stanislavsky, head of BI and analytics at Banco Hipotecario, when the bank began using Databricks about a year ago.
Databricks, with MLflow, enabled Banco Hipotecario to modernize its technology and architecture, as well as let it deploy models more cheaply, efficiently and at scale.
Originally developed by Databricks, MLflow is an open source platform for managing machine learning lifecycles. The platform enables users to deploy, manage, track and reproduce machine learning models.
It's a popular tool. The platform gets more than two million monthly downloads in Python alone and more than 200 code contributors, said Matei Zaharia, co-founder and CTO at Databricks, in a keynote session during Spark + AI Summit 2020.
During the annual conference sponsored by Databricks, this year held virtually, Zaharia revealed that Databricks has donated MLflow to the Linux Foundation, a nonprofit technology consortium dedicated to protecting and growing Linux. The group provides support for open source communities.
"Because the community has been growing so quickly, we also wanted to make sure that it can keep doing that," Zaharia said.
Matei ZahariaCo-founder and CTO, Databricks
"There's now a large, nonprofit, vendor-neutral foundation that's managing the project, and that'll make it very easy for a wide range of organizations to continue collaborating on MLflow," he said.
Modernizing bank IT
Meanwhile, among other things, Banco Hipotecario deployed and managed models with Databricks and MLflow targeting customers to help increase customer retention and cross-sells, while lowering the cost of acquiring new customers.
The bank used Databricks to create the datasets for the model, Stanislavsky said. With more than a million active customers and one to two million transactions per day, Banco Hipotecario couldn't train the model on a single computer. With Databricks, it ran an elastic Spark cluster on the cloud.
Doing that on premises, Stanislavsky estimated, would have cost about $2 million. Using Databricks, it was well under $1 million, he said.
Using MLflow, Banco Hipotecario compared model results to help the company pick the best models for the job.
"After we operationalize the 'best model,' we were able to keep track of the new model versions and deploy them as soon as we verified that we were having some data drifting, for example," Stanislavsky said.
Data drift refers to unexpected or unannounced changes in a model's input data. The changes, if big enough, can lower the accuracy of a model.
The MLflow tracking feature enables users to log parameters, code versions, metrics and output files, as well as query their machine learning experiments. This can help users better account for data drift, debug issues or replicate successful models.
Still, Stanislavsky noted he would make at least one change to MLflow.
As a bank, Banco Hipotecario must comply with financial regulations, and must maintain separate development, integration, homologation and production environments for its data to adhere to those regulations.
The bank had to create its own routines to move its MLflow models through the different environments. While it wasn't "a big deal," Stanislavsky said, it required the bank to do some extra work. Still, he said, "I think they will solve this in the near future."