Principal Reliability Engineer, AWS

We are looking for an experienced Principal Reliability Engineer to join our Business Technology team to maintain and continually improve our cloud-based applications.  The Reliability Engineering team builds foundational back-end infrastructure services and tooling allowing our Business Technology teams to release scale and automate their software reliably and predictably. Reliability Engineers are team players who embed themselves within Business Technology to advance systems architecture and performance. 

We are looking for a smart, innovative, and passionate engineer for this role, someone who has a passion for designing complex cloud-based server infrastructure. The ideal candidate welcomes the challenge and enjoys seeing their designs run at scale with automation, testing, and tuning. If you exemplify the ethics of, "If you have to do something more than once, automate it," we want to hear from you!

What You'll Do:

  • Work with a team that designs and engineers Okta’s IT infrastructure 
  • Promote and apply best practices for building scalable and reliable cloud infrastructure
  • Be a subject matter expert and partner with our team for Amazon Web Services (AWS) and cloud services
  • Develop and maintain technical documentation, network diagrams, runbooks, and procedures
  • Designing, building, running, and monitoring Okta's IT infrastructure and cloud services
  • Driving initiatives to evolve our current cloud platforms to increase efficiency and keep it in line with current standards and best practices
  • Recommend, develop, implement, and manage appropriate policy, standards, process, and procedural updates
  • Identifying and automating manual processes
  • Backup and recovery strategy with the ability to contribute to the business continuity planning

Qualifications for the role


  • 7+ years of experience architecting and running complex AWS or other cloud networking infrastructure resources such as VPCs and VPC Peering, compute and load balancing, access and identity management, key management, VPN, NAT, and firewalls, as well as DNS
  • Demonstrated ability to operate complex cloud infrastructure at scale and deliver projects to schedule 
  • Experience with monitoring tools and knowledge to develop a comprehensive observability strategy
  • Possesses excellent written and oral communication skills, with the ability to influence others
  • Understanding of Atlassian On-premise technologies (Confluence/Jira)


  • 10+ years of experience as a Cloud Services Manager
  • Possess in-depth knowledge in network design, firewalls, cloud-based static and dynamic routing, including packet capture analysis, anycast/unicast, load balancers, VPN, and session management
  • Have exposure to FedRAMP, SOC2, FIPS, or other compliance programs
  • 5+ years of experience with SQL relational databases such as PostgreSQL or MySQL
  • 5+ years of experience with automating systems and infrastructure via Ansible, Chef, or Terraform 
  • Knowledge and experience with Agile principles, practices, and tooling 
  • Knowledge and experience with reliability engineering concepts and best security practices in hybrid cloud environments
  • Experience with developing tooling and automation in Bash, Ruby, Python, Go, or similar with git version control
  • Possesses strong Linux system administration skills

Education and Training

  • BS. Computer Science (plus) or relevant work experience

Okta is an Equal Opportunity Employer.



Why Work at Okta?


Upload Resume/CV (PDF must be less than 8 MB )
Cover Letter
Upload Cover Letter (PDF must be less than 8 MB )
U.S. Equal Opportunity Employment Information

Completion is voluntary