The 2020 portion of AWS re:Invent came to a close this week, ending with an emphasis on preparing for the unexpected and navigating the current realities of the world.
Amazon CTO and vice president Werner Vogels gave the closing keynote in what has become a sort of annual counterbalance to AWS CEO Andy Jassy's talk. Jassy delivers a more polished, rapid-fire sales pitch on the value of choosing AWS, while Vogels takes a more systematic and deliberate approach focused on best practices for those already on the platform.
There was also some news in week three, especially around IoT. And while it was supposed to be the last week of the conference, AWS announced plans to add another round of virtual sessions that will run Jan. 12 to Jan. 14. However, don't expect this addendum of content to include any major service announcements, keynotes or leadership sessions.
In this third and final re:Invent recap -- you can find the first two recaps here and here -- we'll go over Vogels' keynote and the news, and share some insights from two other company leaders on the latest from AWS.
While Vogels did make several service announcements, his keynote was largely a call to action. He advised customers to be more efficient with their cloud resources in order to reduce their carbon footprint. He also urged developers to be mindful of their users and the uncertainties and hardships they might be facing.
He used examples of children doing schoolwork in parking lots because they lack decent internet connections at home. Applications that are essential services should be designed so they still work on low bandwidth, high latency connections, he said.
"We as developers have a responsibility to our customers to build the best applications we can for them in ways that take the current reality very seriously," Vogels said.
He also spoke extensively about the need to design robust, dependable applications. IT teams must be able to deal with changes -- foreseeable and unforeseen, Vogels said.
Vogels reiterated past points about the importance of logs and metrics, and he used that as a springboard to discuss the growing focus on observability and the need to see the signals of failure before they happen. He also discussed how AWS uses fault injections to find unreliable unknowns through a process known as fuzzing, and he highlighted a service coming in 2021, AWS Fault Injection Simulator (FIS) that will run controlled chaos engineering experiments on users' applications.
Experience with infrequent but critical events could also improve application performance, because IT teams will learn about blind spots that are missed in monitoring and alarms, Vogels said.
"Mean time to resolution isn't just about your architecture and automation, but also about the operational muscles you've built and exercise over time," he said.
Additional AWS insights
Over the course of re:Invent, SearchCloudComputing interviewed several AWS leaders to talk about the news from the show. The following are insights on a spectrum of topics from Dave Brown, vice president, Amazon EC2, and Deepak Singh, vice president, compute services.
Amazon Elastic Container Service (ECS) Anywhere and Amazon Elastic Kubernetes Services (EKS) Anywhere extend AWS' managed container services on premises, to edge locations and, in theory, to other public clouds. And despite the bundled announcement, the two services are really targeting different audiences.
ECS Anywhere provides the same experience regardless of location, so it's mostly about adding capacity at locations beyond AWS' data centers. On the other hand, EKS Anywhere is for IT teams that prefer the Kubernetes control plane.
There's no shortage of ways to run Kubernetes on premises, so the goal here isn't to compete in that space, Singh said. Instead, the service is for IT teams that like the EKS operational model and want a way to set up their on-premises clusters to make it easier to migrate to AWS over time.
Amazon ECS and Fargate
AWS has evolved its roadmap for Amazon ECS to focus more on simplicity and the developer experience, Singh said. The changing focus is in response to customer usage patterns. And while Kubernetes gets much of the spotlight in the container world, Amazon ECS is apparently still going strong, especially with Fargate, which runs serverless containers on the platform.
"ECS has well over 100,000 customers and about half of every new container customer we have on AWS in 2020 starts on Fargate -- the vast majority of them on ECS," Singh said.
AWS added more Graviton2 instances at re:Invent. These instances rely on AWS' custom Arm-based processors. It also announced plans to add a custom machine learning processor call AWS Trainium, on top of its existing Inferentia machine learning chip.
When asked if he could see a time when it would be feasible for AWS to do most of its chips in-house, Brown said, "It's difficult to say," but that ultimately it comes down to price/performance.
Brown reiterated that Intel and AMD remain critical processors for many customers -- most instances still run on Intel -- and that AWS will need those two chip manufacturers to continue to innovate. However, he also acknowledged the advancements in Arm chips and how that's impacted their internal efforts on Graviton2-based instances.
"I don't know what the mix would look like longer-term, but 40% price/performance on Graviton2 is pretty enticing for our customers."
The Amazon EC2 Mac instances for macOS was another announcement that generated a lot of buzz, but what might have flown under the radar is the engineering effort required to make it work.
These instances rely on Mac minis installed inside AWS data centers. The Mac mini fits perfectly in a 1U server sled, Brown said. From there, the Nitro card can emulate peripheral devices, making the Mac mini think it's a hard drive plugged in via the Thunderbolt connection. But that didn't solve everything.
"The one thing we couldn't do was push the power button," Brown said.
To handle that, engineers used a solenoid with a small motor that can send an API to it and turn on the Mac mini.
Also, the terms and conditions for Big Sur apparently had to be changed to accommodate the service, with new language added around leasing, Brown said.
Amazon Kinesis suffered a major outage in the Northern Virginia Region in the week leading up to re:Invent. And because of dependencies, it also impacted Amazon Cognito, CloudWatch and, to a lesser extent, AWS Lambda.
You can read the full summary of the incident, but Brown was forthright when asked about it.
"That's a tough learning [experience] and unfortunately we had to learn it the hard way, and now we have to make sure we don't have that problem anywhere else," he said.
Kinesis failed on a limit within the system, so AWS has audited the rest of its services to make sure they won't fail on a thread limit or any other kernel limit. Engineers will also take a closer look at service dependencies to ensure they degrade properly when systems fail.
The incident also gives some insight into the scale and complexity of AWS operations, since the system hit the 32,000-thread limit on a single machine -- something Brown said was unheard of before this event.
"We run at a scale that is just abnormal, and we do very well at doing that normally," Brown said. "One of the things I said is I've never seen a single service use that many threads on a single host before, and that was a learning experience for us."
Here's a quick recap on the product news from week three. As a reminder, many of these updates are in preview and may not be immediately available.
- The AWS IoT suite of services incorporated custom metrics as well as alarms to monitor the health of remote devices, while AWS IoT Device Defender analyzes past device-level data to detect operational or security anomalies. In addition, Fleet Hubs, a new feature in AWS IoT Device Manager, can be used to build web apps to track and take remote actions on collections of devices.
- AWS IoT Device Defender ML Detect analyzes your device fleets for security anomalies.
- AWS IoT EduKit is intended to help IoT novices get started. It includes a reference hardware kit, guides and code examples.
- AWS IoT SiteWise Edge can be installed on local hardware or AWS on-premises devices to collect and analyze data. This is targeted at industrial settings with the goal of doing more processing and analysis on site rather than having to send it all back to AWS' data centers.
- SiteWise also added support for Grafana and Modbus TCP and EtherNet/IP protocols.
- AWS IoT Analytics increased error handling and added support for the Apache Parquet format.
- AWS IoT Core Device Advisor is a testing feature to validate IoT device connections to AWS IoT Core.
- AWS IoT Greengrass 2.0 includes an open source edge runtime, as well as new ways to develop local software and manage device fleets.
- AWS CloudShell is a browser-based shell for command-line access to Amazon Linux 2 environments. It comes with the AWS CLI as well as Bash, zsh and PowerShell.
- Amazon Location Service adds location data to applications, including maps, geocoding, points of interest, geofences and application tracking. Location Service is a bit of catch-up for AWS, since competitors Microsoft and Google have had similar mapping capabilities for some time.
- AWS Lambda now supports self-managed Apache Kafka clusters as an event source, as well as capabilities intended to simplify the use of streaming data in Amazon Kinesis and Amazon DynamoDB Streams.
- AWS Systems Manager Change Manager uses predefined workflows to automate approval steps in operational changes and notify required approvers across an organization. It also sends notifications to avoid changes that might conflict with timeframes important to the business and integrates with AWS CloudWatch alarms for automatic rollbacks.
- AWS Systems Manager has been designed to provide visibility into an organization's infrastructure, but the latest addition brings the tool to the application level. With AWS Systems Manager Application Manager, users have a consolidated view of metrics and logs from consoles, tools and sources that span a portfolio of applications.
- Amazon Managed Service for Grafana pulls in data from multiple sources so users can query, correlate and visualize that information to improve operations. It was built in conjunction with Grafana Labs.
- Amazon Managed Service for Prometheus is a tool for monitoring containerized applications at scale, based on the Cloud Native Computing Foundation's Prometheus project.
- AWS Single Sign-On now synchronizes with Microsoft Active Directory for group and user information consistency with AWS accounts and applications.
- AWS added a Wavelength Zone in Tokyo -- the first one outside the U.S. These scaled-back infrastructure zones are designed for ultra-low latency applications that run on 5G devices in urban settings.
- The AWS Well-Architected Tool now has APIs that extend the best practices analysis to third parties that integrate with the platform.