A centralized IT monitoring tool helps a healthcare tech firm keep tabs on multiple public cloud infrastructure environments.
AthenaHealth, a network services and mobile app provider for medical groups and health systems, began to add public cloud resources alongside a local container environment managed with Mesosphere's DC/OS in 2018. These environments include VMs on Amazon Web Services Elastic Compute Cloud (EC2), as well as Docker, Mesosphere and Amazon Elastic Container Service (ECS) container infrastructures. Stateful applications such as databases run in Docker containers on EC2 instances without a scheduler, while stateless apps run in ECS and streaming apps, such as Apache Spark, run on Mesosphere, which specializes in container integration for big data apps.
Integrations between SignalFx's container monitoring tool, AWS CloudWatch and Prometheus for time-series monitoring allow Athena's SRE team to monitor all of these environments from a single interface.
Change is constant in the public cloud environments as AthenaHealth's deployments grow, said Shailan Lala, director of cloud engineering at the firm in Watertown, Mass. "This mirrors the industry in general when it comes to container schedulers, as the launch and evolution of each orchestration platform is occurring at a rapid pace," he said.
AthenaHealth typically prefers open source tools, but chose to buy SignalFx's software because its public cloud environment is split among more than 100 separate AWS accounts. SignalFx offered easy integration with AWS CloudWatch IT monitoring metrics for EC2 instances and containers, as well as integration with the open source Prometheus time-series IT monitoring tool the company started with. SignalFx Prometheus integration meant the company didn't have to lose its work to set up time-series container monitoring on its premises.
"We didn't [move] every team to Prometheus, and there are some teams that can deploy infrastructure across all of the accounts," said Nick Imbriglia, engineering manager for the cloud SRE team at AthenaHealth. "SignalFx let us build out centralized alerts that helped us simplify that, and detect abnormalities."
Weighing IT monitoring tool options
AthenaHealth had used Cisco's AppDynamics APM tool in its DC/OS environment, and initially considered its use in early 2018 with Amazon ECS, along with Datadog and SignalFx. Datadog has since added Prometheus integration, while AppDynamics has it on the near-term roadmap, the company said.
But a year ago, SignalFx Prometheus integration was available and easy to use, Imbriglia said. SignalFx also has a "smart agent" that collects metrics from the Mesosphere DC/OS environment's hardware, software, application and DC/OS components in one package, and doesn't require the AthenaHealth IT team to specify integrations between the tool and each of those elements.
Similarly, AWS CloudWatch metrics on CPU and memory utilization were imported into SignalFx with one button click, Lala said.
"I liked the fact that within a period of thirty minutes, I could see my metrics in Signal, and get to a point where I could start answering questions, instead of just upfront setup," he said.
Still, AthenaHealth has some wish list items for SignalFx. Among them are role-based access control refinements, which require deeper integration between SignalFx and Microsoft Active Directory.
Shailan Laladirector of cloud engineering, AthenaHealth
"I would like to see it integrated so [that] a member of a certain group in Active Directory would automatically be placed into the corresponding team in SignalFx [with the right] permissions," Imbriglia said.
The tool must also update role-based access control so that only the admins that create tokens to send monitoring metrics into SignalFx can see those tokens, he added.
These features are on SignalFx's roadmap, the company said.
Next on the SignalFx to-do list: AIOps, APM features
Like most of its competitors, SignalFx seeks to cover all aspects of IT from application to servers, and add AIOps features that automatically detect anomalies and inform root cause analysis. But while it offers such features -- the SignalFx APM tool launched in November 2018, and anomaly detection and directed troubleshooting were already available -- AthenaHealth does not have them in production.
"We haven't been quite ready for the AIOps stuff they're rolling out, but we'd like to be able to quickly identify services in trouble, along with their customer impact," Imbriglia said. "We also plan to evaluate the APM and dynamic tracing tools."
Also to come in 2019 are dependency-aware alerting and comparisons between baseline APM traces and those that emerge as the environment changes, according to SignalFx.
Editor's note: This article has been updated to correct inaccuracies about AthenaHealth's use of container services and Kubernetes monitoring in the cloud. We regret the error.