Site Reliability Engineering is not a new concept nor a new term or role within the IT domain; born in 2003 by engineering teams at Google. However, its significance has become critical in the age of DevOps and the Continuous Integration and Delivery methodology.
Wikipedia Defines Site Reliability Engineering As Follows:
“Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems.”
However, Ben Traynor, VP of engineering at Google and founder of Google SRE, goes a step further;
“SRE is fundamentally doing work that has historically been done by an operations team. But using engineers with software expertise and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labour. In general, an SRE team is responsible for the availability, latency. As well as, performance, efficiency, change management, monitoring, emergency response, and capacity planning.”
In other words, with demands for software teams to reduce code delivery from weeks to days, and even hours. There is simply no time in the process to manually configure the code and environment. Let alone introduce errors – over and above – code bugs into the process.
Add the need to manage the appropriate level of risk inherent in:
- Compliance and security,
All of this is possible. And, is not only a cost-reducing business case but a systemic imperative as well. Well, with the given nature of online, consumer and customer-facing systems.
At Mammoth-AI, you meet our skilled engineers in SR processes. But they are also experts in bringing automation and orchestration to operations, test and infrastructure.