Microsoft has been a leading company in computing for decades. We are a global company, relied on by companies, governments, utilities, stores, schools, universities and co-operatives to deliver the things they need to work, every day.
To make this work, we need to make it reliable. To make it reliable, we need you -- someone who already is, or is interested in becoming, a Site Reliability Engineer (also known as SRE). SREs are people who take engineering-based approaches to solving operations problems; we like infrastructure, we like seeing how the big complicated thing works, and most importantly, we gain great satisfaction from making it better.
Site Reliability Engineers build, monitor, and maintain the systems and infrastructure that ensure our customers can quickly access their data and run workloads whenever they need to. We identify service problems and areas for improvement, and we help implement solutions. Our work is key to the success of many of the Microsoft services you've heard of, and a number you haven't. There are very few bits of Microsoft which aren't touched by SREs in some way or other.
We value the input of people who aren't afraid to be learning all the time and celebrate mistakes as long as they show the way forward. We have diverse backgrounds from self-taught to Computer Science and Biology and strongly believe that diverse experiences and backgrounds, and an environment where everyone can feel safe to contribute their own insights in a data-driven, objective, but supportive way is the key to making the best workplace possible.
The Challenge Ahead
The latest chapter in our evolving story is our cloud -- Microsoft Azure -- which helps millions of people around the world get things done every day. Due to our rapid growth, we need to build an SRE community creating the next generation of infrastructure to operate one of the largest clouds in the world.
The scale of the challenge is enormous: we'll need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, and doggedly chasing problems down to a single line of code, all in the service of production reliability. You'll do this across hundreds of services, hundreds of thousands of servers, and trillions of events that our monitoring systems generate every day.
On a day to day basis, Site Reliability Engineers build, monitor, and maintain the systems and infrastructure that ensure our customers can quickly access their data and run workloads whenever they need to. We identify service problems and areas for improvement and help implement solutions. Our work is key to the success of all of Microsoft Azure cloud services.
Summary of responsibilities:
- Improve the operability of services by building a picture of what is going wrong, extending our auto-remediation system as a first step, and refactoring systems not to experience them in the first place where possible
- Follow the chain of a problem through the entire stack, and fix the underlying causes, not just one symptom
- Participate in the incident management lifecycle, including escalation, communication, debugging, resolution, and problem management including on-call on a rotational basis
- Build intelligent systems that digest and analyze massive amount of telemetry to provide health alerts, automatically troubleshoot and mitigate problems without human intervention
- Manage availability, latency, scalability and efficiency of services by engineering reliability into software and systems
We would like to talk with you if you:
- Are interested in distributed systems and working with high scale services
- Enjoy working in a fast-moving environment and not afraid to change things to make it better
- Enjoy new technological challenges and fixing problems that result
- Believe that a team working well together is truly smarter than the single smartest person on that team
- Aspire to grow as a person, as a teammate, and as an engineer
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.