Hi Consultant, we are currently hiring for an SRE role with one of our top clients. This position offers great work flexibility. If your skills align with the role, we'd love to hear from you-apply today!
Title: Site Reliability Engineer (SRE)
Location: London
Job Type: Hybrid (2 days in-office)
...
Duration: 6-12 Months (extendable)
Role Overview:
As a Site Reliability Engineer (SRE), you will be responsible for improving the non-functional and operational characteristics of products and services, including availability, performance, efficiency, security, incident response, and capacity planning. You will play a key role in the adoption of SRE practices, collaborating with engineers and stakeholders to ensure a secure, efficient, and reliable change process.
Key Responsibilities:
- Lead SRE practices adoption and collaborate with teams to meet defined service level objectives (SLOs).
- Track and reduce toil, define SLIs, SLOs, and manage error budgets.
- Improve release processes and implement automation to enhance reliability and velocity.
- Provide coaching and guidance to colleagues and teams, lead where necessary.
- Balance risks, manage day-to-day health of production and non-production environments.
- Respond to incidents and support internal and external customers with streamlined solutions.
- Contribute to Site Reliability Operations: production support, incident response, observability, security, and performance.
- Coordinate major incidents in a complex, multi-party environment.
- Implement proactive monitoring, alerting, and ensure system availability and performance.
- Provide expertise on risk tolerance, performance, and system health.
- Communicate clearly with teams, customers, and stakeholders during incidents.
Skills and Experience:
- Strong knowledge of reliability systems and software engineering.
- Data-driven approach to problem solving and fact-finding.
- Experience with establishing SRE functions in a 24/7 support environment.
- Coding experience in building, testing, scanning, and deploying .NET and JavaScript applications.
- Hands-on experience with cloud platforms (IaC, policies, DevOps tools, Kubernetes, monitoring).
- Expertise in DevOps, security, IT service management, and full-stack observability.
- Experience with Kubernetes, Prometheus, and GitOps practices.
- Familiar with chaos engineering and "Shift to Right" approaches.
- Knowledge of automation via orchestration tools, cloud-native microservices, and API management.
- Effective communication and presentation skills.
Benefits:
- Collaborative and innovative team environment.
- Opportunities for professional development and growth.
- Chance to make a significant impact in the reliability and performance of products and services.
Randstad Technologies is acting as an Employment Business in relation to this vacancy.