Getty Images

Databricks adds new developer tool to lakehouse platform

The vendor now enables developers to build applications and models using familiar tools in their most familiar environment before loading their work into a data storage repository.

Databricks on Tuesday added a new tool to its lakehouse platform, enabling developers to author and test code in a familiar integrated development environment before connecting it to their Databricks cluster.

Founded in 2013 and based in San Francisco, Databricks is a data lakehouse vendor whose platform combines the structured data storage capabilities of data warehouses with the unstructured data capabilities of data lakes.

Previously, although the vendor enabled developers work with their data in myriad ways, that work had to be done within the Databricks environment.

Now with the public preview of Visual Code Extension for Databricks, the lakehouse vendor is enabling developers to build data, augmented intelligence and machine learning models and applications with Visual Studio Code (VS Code) before moving it into Databricks.

VS Code is an integrated development environment (IDE) launched by Microsoft in 2015 that developers commonly use for a host of operations. That includes editing, testing, debugging and controlling continuous integration/continuous delivery (CI/CD) pipelines.

New capabilities

Visual Code Extension essentially enables developers to use a familiar tool in their most familiar environment before moving it into the Databricks lakehouse. That is significant for developers, according to Donald Farmer, founder and principal of TreeHive Strategy.

He noted that many developers find it frustrating to have to work outside their preferred development environment.

"[Visual Code Extension] is useful for developers who use Visual Studio," he said. "It has been a real frustration for them to have to work in another IDE."

Farmer added that when vendors force data workers to work in a particular environment rather than a familiar one, it can be a barrier to adoption. By launching Visual Code Extension, Databricks is attempting to make it easier for potential customers to work with the vendor's lakehouse.

"This announcement shows that this has been a … barrier to adoption. So it's good to see Databricks listening to the community and delivering," Farmer said.

The community, in fact, played a substantial role in Databricks' decision to develop Visual Code Extension, according to Tarek Madkour, the vendor's director of product management.

As Databricks adds functionality to its lakehouse to both enable existing customers and attract new ones, its primary motivation is making data workers as successful as possible, Madkour said. In its attempt to do so, the vendor communicates frequently with customers to learn what users want added to the Databricks platform.

Familiarity is a common theme. Providing developers a familiar environment was motivation for developing Visual Code Extension.

"We want to meet developers where they are," Madkour said. "We want to enable developers to use tools they are familiar with and make them productive with that. Visual Studio Code is one the [most popular] in the industry."

More IDE extensions

VS Code is not the only IDE developers use to build models and applications. Another popular one is PyCharm, an environment for programming in Python.

Databricks' roadmap, therefore, includes developing tools similar to Visual Code Extension that will enable developers to use PyCharm and other programming environments to build applications and models before moving them into a Databricks lakehouse.

[Visual Code Extension] is useful for developers who use Visual Studio. It has been a real frustration for them to have to work in another IDE.
Donald FarmerFounder and principal, TreeHive Strategy

In addition, Databricks plans to add more capabilities to Visual Code Extension. It was in private preview before moving to public preview, where it can now be accessed by any Databricks customer. The extension comes at no added cost to their subscription.

No date for general availability has been set, according to Madkour.

"What's coming next is in two dimensions," he said. "There's deeper Virtual Studio Code integration and also other tools similar to Visual Code Extension like PyCharm support."

Databricks is smart to focus part of its roadmap on adding more functionality to its first extension with an IDE and on adding extensions to others, according to Farmer.

In particular, he noted that RStudio -- an IDE for the R programming language -- users have had trouble working with Databricks. Some of the lakehouse vendor's customers have also had trouble with CI/CD workloads.

"RStudio users are frustrated with Databricks integration," Farmer said. "I [also] hear that people find it difficult to run CI/CD processes with Databricks. It is possible, but clunky."

Broader roadmap

Beyond extensions to IDEs, one part Databricks' roadmap will focus on enhancing the data governance capabilities of its lakehouse platform, according to Madkour.

The vendor launched Unity Catalog in June 2022 after unveiling it in preview a year earlier, enabling organizations to more easily organize and govern their data. It plans to continue adding functionality.

In addition, Madkour noted that Databricks sees opportunities to develop integrations with new technologies like ChatGPT and other AI tools the vendor can use to develop no-code/low-code features.

"There's a new world of generative AI -- we've all heard about ChatGPT and other large language models. And that's going to open up a whole new world of what we can enable our customers to do," he said. "Think about people who don't necessarily know code who would love access to data. That's a new opportunity for Databricks."

Eric Avidon is a senior news writer for TechTarget Editorial and is a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Business intelligence technology

Data Management
Content Management