Edelweiss - Fotolia

HashiCorp Nomad 1.0 expands monitoring polish, autoscaling

HashiCorp Nomad has been used in production before its 1.0 release, but its rapid development this year makes it applicable to a wider variety of uses.

After five years of development that included a bevy of updates in 2020, HashiCorp Nomad reached 1.0 status this week.

HashiCorp Nomad is a resource scheduler that manages the deployment of applications on IT infrastructure. It's similar to container schedulers such as Kubernetes, but it can manage applications across a variety of compute types that also include virtual machines and bare-metal servers.

Production use of HashiCorp products is common before they reach version 1.0, as the company's philosophy on promoting products to 1.0 differs from most tech vendors. HashiCorp products typically don't reach that designation until the company anticipates no further breaking changes will be made to the code, and the process tends to take much longer.

Prior versions of Nomad, including the 0.11 and 0.12 releases from earlier this year, were already in production at scale in enterprises such as iJet, Trivago, Roblox, Cloudflare and Pandora, according to HashiCorp officials.

However, this year's updates, especially monitoring refinements and the broadened autoscaling support released with version 0.12 in June, addressed features that Nomad users had been awaiting.

"Service and cluster autoscaling ... and observability improvements ... will provide a lot more information about the services that are running on the cluster on Nomad, which is huge," said John Spencer, senior site reliability engineer at Bowery Farming, an indoor produce farming company in New York that has centralized automation for its facilities using Nomad and Consul service discovery.

Service and cluster autoscaling ... and observability improvements ... will provide a lot more information about the services that are running on the cluster on Nomad, which is huge.
John SpencerSenior SRE, Bowery Farming

"Nomad provided some monitoring information out of the box, but there were some pieces that were missing, like monitoring memory usage and when jobs are killed because of running out of memory," Spencer said. "We developed some [of our own] tooling, but I'm excited to have a native approach to that with Nomad 1.0."

Hashicorp first added detailed monitoring insights and alerts to Nomad's remote execution UI in March with version 0.11. In June, version 0.12 expanded on this with debug and audit logs, the ability to search and stream logs through the UI, and in Nomad Enterprise, the ability to perform cross-namespace queries.

Nomad 1.0 also introduces support for event streaming and point-in-time topology visualizations of Nomad infrastructure, along with support for more granular monitoring alerts.

"I can actually watch what's happening to a deploy file as the job is going, and it allows me to get a little more granularity in terms of diagnosing problems if something happens," said Peter McCarron, a senior product marketing manager at HashiCorp, of the topology visualization feature. "It gives me a more visual way of looking at things, as opposed to trying to keep track of job deployments and looking through logs."

HashiCorp Nomad Enterprise Dynamic Application Sizing
HashiCorp Nomad Enterprise 1.0 now includes Dynamic Application Sizing according to memory and CPU demand, as shown in this UI screenshot.

Nomad 1.0's scaled-up feature set

HashiCorp Nomad features, especially autoscaling, have expanded rapidly in the last eight months. HashiCorp added horizontal application autoscaling support with the 0.11 release, which meant cluster resources could be made available to applications as needed. HashiCorp indicated horizontal cluster autoscaling, in which infrastructure resources from AWS Autoscaling Groups can be added to clusters as needed, was coming soon when it announced version 0.12.

This version also included the ability to scale jobs vertically in response to demand through the Nomad UI. Users could invoke a new spread scheduling feature as well, which deployed apps over a wide area within a server cluster to balance workload distribution instead of packing workloads into fewer nodes for maximum utilization.

Now version 1.0 also includes an Enterprise edition-only feature dubbed Dynamic Application Sizing, which deploys or rolls back application instances automatically in response to Nomad's real-time detection of their memory and CPU utilization, under user-set parameters.

HashiCorp officials acknowledged 0.12 added many of the core features that made Nomad ready for version 1.0, which also included production-level support for federated multicluster deployment; support for memory oversubscription using Docker task drivers; a Podman task driver key to supporting OCI container images commonly used in Red Hat Enterprise Linux and CoreOS environments; and single-command support for Nomad cluster state snapshot backups.

Most of the new features in version 1.0 are focused on exposing those new utilities visually in the Nomad UI, easing management.

"Nomad was running at scale in 0.12, but that scale wasn't really visible," said Amith Nair, vice president of product marketing at HashiCorp. "It was a big steppingstone toward 1.0, and 1.0 took it to the next level in terms of making it more appealing."

Nomad vs. Kubernetes overlap increases

Beginning with version 0.11, HashiCorp's previous Nomad updates brought its orchestration features closer to longstanding attributes of Kubernetes, including autoscaling. Version 1.0 introduces support for namespace-based management, a core Kubernetes feature used for workload management and security isolation, to open source Nomad, where it had been a Nomad Enterprise-only feature before.

The similarities between Nomad and Kubernetes are not lost on HashiCorp officials, but they emphasize that Nomad is more complementary to Kubernetes than it is competitive, because it supports VMs and bare-metal servers, which Kubernetes does not. Windows hosts are also equal citizens in Nomad clusters, while support for them in Kubernetes is still maturing.

HashiCorp further concedes that Kubernetes is stronger for data analytics workloads such as machine learning, as well as newer cloud-native architectures such as serverless computing. The vendor references multiple Nomad customers that use it alongside Kubernetes, including Cisco monitoring subsidiary AppDynamics. HashiCorp's Consul service mesh can form a bridged network between Nomad and Kubernetes environments as well.

However, some companies, including Bowery Farming, run Nomad instead of Kubernetes -- not alongside it -- for container-based and bare-metal workloads.

"We evaluated Kubernetes, but for our size and scale it seemed like it would be unnecessarily complex, and too burdensome to manage," Spencer said in a HashiConf Digital presentation this month. Bowery Farming expects rapid growth and already supplies produce to large grocery chains such as Whole Foods and Walmart in the Northeast. For now, it has a DevOps team of 12 software engineers and a one-man SRE team in Spencer.

Larger companies, such as Q2 Software, a financial services company in Austin, Texas, also use Nomad as their primary application orchestrator because of its support for Windows and VM-based applications alongside containers, according to a HashiConf presentation.

Dig Deeper on Systems automation and orchestration

Software Quality
App Architecture
Cloud Computing
Data Center