Browse Definitions :
Definition

site reliability engineer

Site reliability engineer is a job title for a specialist who works with software developers to ensure that an organization's computing systems are scalable, stable and predictable. The position calls for someone who is comfortable with both software engineering and IT operationsThe term SRE was coined at Google around 2003 when the company hired Ben Treynor Sloss to lead a team of software engineers to run a production environment. The company needed to create new paradigms to manage its large systems, while continuously introducing new features and a high-quality end user experience (UX).

Although site reliability engineering duties were historically done by operations teams, today’s SREs use machine learning (ML) to automate tasks previously performed by human labor. Typically, SREs are responsible for selecting infrastructure tools, managing production changes and determining emergency responses. SREs typically devote up to 50 percent of their time on operations responsibilities (including issues, on-call and manual intervention) and the rest of their time on coding and automation tasks. However, these percentages and SRE duties may vary, depending on specific business models and culture.

Site reliability engineer skills

The job of the SRE has evolved beyond that of the system administrator (sysadmin). Required skills for SREs include a bachelor’s degree in computer science or a related field as well as production-level experience in at least one high-level command-line interface-written code language (such as Java, C/C++ and Go) and at least one dynamic language (including Ruby, Python and Node.js). Other required skills may include advanced experience in either networking, Linux/Unix administration, systems programming, distributed systems, databases or cloud engineering. Employers are also looking to hire SRE team members who have experience in data-driven analysis and infrastructure-as-code (IaC) as well as server clusters, load balancing and monitoring. Other desirable SRE skills are experience with at least one major cloud provider and one container technology. Soft skills such as being a good communicator are a plus.  

Site reliability engineering vs. DevOps

Site reliability engineering and DevOps have similar goals: keeping a diversely skilled team involved in software development, from design through operation; automating repetitive tasks; and using engineering tools in operations. In contrast, while DevOps applies to positions both within and outside IT, SRE is focused on supporting IT operations during software development and deployment in production. Additionally, although business leaders usually are involved in DevOps, they are not often involved in SRE.

This was last updated in September 2020

Continue Reading About site reliability engineer

SearchNetworking
  • network security

    Network security encompasses all the steps taken to protect the integrity of a computer network and the data within it.

  • cloud-native network function (CNF)

    A cloud-native network function (CNF) is a service that performs network duties in software, as opposed to purpose-built hardware.

  • Wi-Fi 6E

    Wi-Fi 6E is one variant of the 802.11ax standard.

SearchSecurity
  • incident response

    Incident response is an organized approach to addressing and managing the aftermath of a security breach or cyberattack, also ...

  • MICR (magnetic ink character recognition)

    MICR (magnetic ink character recognition) is a technology invented in the 1950s that's used to verify the legitimacy or ...

  • What is cybersecurity?

    Cybersecurity is the protection of internet-connected systems such as hardware, software and data from cyberthreats.

SearchCIO
  • privacy compliance

    Privacy compliance is a company's accordance with established personal information protection guidelines, specifications or ...

  • contingent workforce

    A contingent workforce is a labor pool whose members are hired by an organization on an on-demand basis.

  • product development (new product development -- NPD)

    Product development, also called new product management, is a series of steps that includes the conceptualization, design, ...

SearchHRSoftware
  • talent acquisition

    Talent acquisition is the strategic process employers use to analyze their long-term talent needs in the context of business ...

  • employee retention

    Employee retention is the organizational goal of keeping productive and talented workers and reducing turnover by fostering a ...

  • hybrid work model

    A hybrid work model is a workforce structure that includes employees who work remotely and those who work on site, in a company's...

SearchCustomerExperience
  • digital marketing

    Digital marketing is a general term for any effort by a company to connect with customers through electronic technology.

  • hockey stick growth

    Hockey stick growth is a growth pattern in a line chart that shows a sudden and extremely rapid growth after a long period of ...

  • Salesforce Trailhead

    Salesforce Trailhead is a series of online tutorials that coach beginner and intermediate developers who need to learn how to ...

Close