Kirill Kedrinski - Fotolia
Site reliability engineering is a DevOps practice wherein IT professionals codify and automate infrastructure and application maintenance and support. SRE is primarily concerned with service reliability, and commonly uses service-level agreements to dictate and meet expectations around various metrics like uptime.
What does a site reliability engineer do?
Where DevOps organizations foster closer relationships between development and operations teams -- as well as other key stakeholders, such as networking, security and testing teams -- the site reliability engineering team acts as a leader of communication and, at times, even a project ambassador.
Saul Ortigoza, senior site reliability engineer at Wizeline, a global product development company based in San Francisco, wears an array of technical and cultural hats on any given day on the job. "My main goal is to make [customers] and engineers successful, and also to help scale the discipline and the DevOps practice," Ortigoza explained.
Ortigoza started out with a degree in mechatronics engineering, for which he worked on industrial automation, he said, with some low-level coding and low-level electrical engineering thrown into the mix. His career started in the automotive industry, testing embedded systems, a role that laid the foundation for a future SRE position. "I was using most of my time to improve processes and create tools to make the everyday work easier," he said. This led him to start using tools like Jenkins, an open source automation server, and GitLab, a Git-based DevOps lifecycle tool, to build a framework for the testing environment.
What is an SRE role like day to day?
When asked to describe a normal day in his role, Ortigoza began his list with meetings. As the senior SRE at Wizeline, he acts as a go-between for the different stakeholders -- customers, IT team leaders and business leaders -- to ensure that projects develop and evolve smoothly.
Each group of the SRE's constituents has different needs and goals, which must all align at the end of the day: Customers have requirements for their deliverables, and those requirements inform how Ortigoza requests projects of his developers -- and those developers have their own needs for success.
Like most DevOps practitioners, Ortigoza points out that DevOps isn't a job role, or something that a person performs. "It's more of a culture … that everybody has to [do] their part, everybody has to contribute. And the most important thing of this culture is ownership … everybody is responsible for whatever thing we were doing," he said.
A culture of ownership isn't a culture of assigning blame for failures or malfunctions. Rather, it's one of independence -- and collective responsibility. This encourages self-sufficiency and higher quality work -- because if they break it, they also have to fix it.
"One of the most common [misconceptions] is that a site reliability engineer … does all of the things related to CI/CD, infrastructure and automation," Ortigoza said and laughed. And while the SRE definitely does touch on all of those things, the SRE's role is to provide guidance and support to constituent IT teams -- to enable engineers to do things for themselves.
Indeed, the SRE team is responsible for finished projects, which requires close supervision and regular communication -- but that doesn't mean wresting control from the engineers tasked with given projects. Instead, this supervision yields more opportunities for support and assistance: Each IT professional brings in unique knowledge and perspective, all of which is invaluable to the successful development of a project.
But Ortigoza doesn't just enable those around him to get their work done. He has his own project at Wizeline, with a large tech company he declined to name. For this project, he acts primarily as an infrastructure architect, and leads the charge to build a custom platform. And although the meetings and communication skills are key, Ortigoza said architecting is the most important skill for his SRE role.
"SREs are in early. They know a little bit of everything," he said, including coding, testing, security and project management -- and a lot of infrastructure and automation.
DevOps and IaC
"Ninety percent of the work [at Wizeline] is with external customers. And [infrastructure as code] is standard practice for the DevOps and reliability projects," Ortigoza explained. And just as IaC becomes standard practice in DevOps organizations, Terraform is swiftly becoming an essential part of IaC toolkits; Ortigoza estimates that 80% of IaC projects he sees rely on Terraform -- along with an array of other tools that integrate with it. CloudFormation is another popular option, as well as various serverless frameworks. "Terraform is very good for, maybe, alias resources, but we are also using Kubernetes; we will also be using something like Helm, or tools built around Helm, to also implement infrastructure-as-code patterns," Ortigoza said.
Ortigoza also recommends taking IaC a step -- or a leap -- further, and to adopt an everything-as-code approach to the entire IT ecosystem, which he says includes things like policy, and even documentation. When documentation is recorded and stored in a variety of places, such as a configuration management database and a Google doc -- or, worse, locally on someone's computer -- it's very easy to lose track of information. It's also easy for documentation to begin to create trees with different people updating different information at different times. All of that information should be codified and stored as close to its subject matter as possible; this eases the process to update documentation when the code itself is updated, because there's no need to hunt it down.
Advice for DevOps newbies
"One of the main things that DevOps tries strive to solve is … cross-functional teams," Ortigoza said. Because while cross-functional teams are the ideal for DevOps, it's often difficult to arrange a tight integration of those crossovers -- and ultimately, many organizations implement DevOps silos. Even a DevOps silo is still a silo, and ultimately it harms the IT organization more than it helps. "Not having cross-functional teams will slow you down a lot because there will be many barriers, and … passing things around from one team to another team. This is very bad for feedback speed and for the communication of getting the things done."
Ortigoza's main advice is to view things holistically. "I always tell my engineers … that software is not code," he said. Software is, instead, a lot of moving parts supplied by many different teams, such as testing documentation and security, and all of those pieces are equally as important as the developers' written code.
DevOps is supposed to ease this process and interaction, he said, because its goals include that cross-functional team, and senior leadership will consist of experts who will invest in those areas. Ortigoza's advice: Listen to those experts. But remain open to dialogue; in a DevOps organization, every IT professional must approach their work with the mindset of ownership and strive for self-sufficiency where possible. Because the SRE is there to guide you, not to do your work for you.
Finally: "Do not try to solve people problems with technology," Ortigoza said.
There are countless apps and tools purported to solve all your problems, but sometimes the introduction of a new tool can exacerbate a people problem, rather than solve it. Instead of reaching for the shelf of available tools, evaluate your organization and its culture and instead find the source of the issue you want to solve. A shiny new application won't fix that for you.