Site Reliability Engineer, Microservices Senior/Staff

We are looking for an experienced Site Reliability Engineer to join our Technical Operations team. At Okta, we are "Always On." The core of that starts with this team, ensuring that customers never worry about the Okta service. They strive to build the most reliable and performant systems on the planet. 

As a member of the Microservices team at Okta, you’ll be pushing the limits of Docker and container-based infrastructure to build and manage complex, highly available services.  

What You'll Do:  

Be a collaborative member of a team that is responsible for Okta's production infrastructure, with a focus on scaling our impact and lowering our operational overhead.

  • Promote and apply best practices for building scalable and reliable tooling across engineering.
  • Designing, building, running, and monitoring Okta's production infrastructure.
  • Migrate and run workloads on container orchestration architectures and tooling, such as AWS ECS.
  • Driving initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practices.
  • Responding to production incidents and determining how we can prevent them in the future.
  • Identifying and automating manual processes.
  • Support a 24x7 online environment as part of an on-call rotation.
  • Develop and maintain technical documentation, runbooks, and procedures.

Qualifications for the role: (add or remove qualifications that apply or do not apply)

  • 3+ years of experience managing large-scale AWS deployments.
  • Experience with Federal and DoD compliance requirements (FedRAMP, IL) preferred.
  • Experience supporting Docker containers and web applications running on Java / Apache / Tomcat in a highly available production environment.   
  • Experience with service discovery and load balancing Docker containers running in AWS ECS and / or Kubernetes clusters.
  • Familiarity with messaging services such as SMS and email.
  • Background with Linux systems administration and strong scripting skills in Bash, Ruby, Python, Go, etc.
  • Previous experience with tooling for automating systems and infrastructure via Ansible, Chef or Terraform.
  • Champion excellent written and oral communication skills, with the ability to influence others.


Education and Training:

  • BS. Computer Science (plus) or relevant experience

*This position requires the ability to access Impact Level 4 (IL4) data, as defined by the Department of Defense (DoD) Cloud Computing Security Requirements Guide. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.


Okta is an Equal Opportunity Employer.

Okta is rethinking the traditional work environment, providing our employees with the flexibility to be their most creative and successful versions of themselves, no matter where they are located.  We enable a flexible approach to work, meaning for roles where it makes sense, you can work from the office, or from home, regardless of where you live.  Okta invests in the best technologies and provides flexible benefits and collaborative work environments/experiences, empowering employees to work productively in a setting that best and uniquely suits their needs.  Find your place at Okta 

By submitting an application, you agree to the retention of your personal data for consideration for a future position at Okta.  More details about Okta’s privacy practices can be found at:


Upload Resume/CV (PDF must be less than 8 MB )
Cover Letter
Upload Cover Letter (PDF must be less than 8 MB )
U.S. Equal Opportunity Employment Information (Click here for instructions)

We request this data to help assess our candidates and workforce to promote diversity, inclusion, and belonging and to ensure we maintain fair and equitable hiring practices. Responding to the survey is voluntary.