As a Senior Site Reliability Engineer, you will champion all things pertaining to reliability at Customer Identity Cloud (formerly Auth0). Working closely with the product engineers, quality engineers, platform engineers and architecture teams, your primary focus will be on ensuring production systems remain operational at all times, while continually setting and achieving long-term performance, reliability and scalability goals in a platform with an exponential growth plan for the coming years.
With Customer Identity Cloud's (formerly Auth0’s) increased dedication to ensuring customer availability expectations are exceeded in every way, you will play a key role as we evolve our system architecture to meet the demands of enormous growth and support the hundreds of millions of users who rely on us to provide uninterrupted access to business-critical enterprise and consumer applications.
- Exceptional communication skills, including technical writing in the English language
- Systematic problem-solving approach, coupled with a strong sense of ownership and drive
- Understanding of microservices, cloud infrastructure (AWS, Azure, GCP), databases (SQL, No-SQL, Key/Value), containers (docker, kubernetes), web technologies (web sockets, http) and networking (SSL, routing, VPN)
- Live and breathe SLIs, SLOs, error budgets and SLAs
- Strong belief in automating everything and reducing toil for yourself and teammates
- Fast learner who is not afraid to tackle multiple challenges at once
- Comfortable with the Agile software development methodology
- Loves to work as a team, but is able to work effectively in a remote environment where tasks may be self-driven
- Working with the other teams to run, own and improve incident response processes
- Participate in regular on-call rotations to ensure 24/7 coverage of all critical systems
- Use existing monitoring tools to identify problems and resolve and/or escalate to service teams
- Implement changes to enable or improve infrastructure resilience, monitoring, and alerting
- 2+ years as a Site Reliability Engineer or in a Cloud Operations/DevOps role
- 1+ years using golang, shell scripting and terraform
- 2+ years as software developer in a SaaS environment
- 3+ years in a production environment supporting large-scale, mission-critical applications
Okta is an Equal Opportunity Employer.
Okta is rethinking the traditional work environment, providing our employees with the flexibility to be their most creative and successful versions of themselves, no matter where they are located. We enable a flexible approach to work, meaning for roles where it makes sense, you can work from the office, or from home, regardless of where you live. Okta invests in the best technologies and provides flexible benefits and collaborative work environments/experiences, empowering employees to work productively in a setting that best and uniquely suits their needs. Find your place at Okta https://www.okta.com/company/careers/.
By submitting an application, you agree to the retention of your personal data for consideration for a future position at Okta. More details about Okta’s privacy practices can be found at: https://www.okta.com/privacy-policy.