At Okta some of the common catchphrases are "Always On" and “No Mysteries”, and nowhere do we embrace that more than in Site/Data Reliability Operations. We are looking for a Database Reliability Engineer who has several years of experience working on large scale environments with zero downtime. The job is to deliver an extremely reliable, performant, and secure database infrastructure through the skillful use of automation, and when undoubtedly something breaks hunt down the root cause and fix it.
If you like to be challenged and have a passion for solving problems at scale with automation, testing, and tuning, then we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it” and who can rapidly self-educate on new concepts and tools.
Job Duties and Responsibilities:
As a Database Reliability Engineer, you will have ownership of all technical aspects of our data services tier. Reporting to the Manager of Site Reliability Engineering, you will partner with our core product engineers, performance engineers, site reliability engineers, and growing DBRE team, work towards scaling, securing and tuning our MySQL clusters. Additionally, you will play a key role as we evolve our architecture to meet the demands of Okta's enormous growth and the hundreds of millions of users who rely on us to provide uninterrupted access to business-critical enterprise and consumer applications.
- Ensure effective performance and 24X7 availability of the production database systems
- Design, automate and document operational processes, tasks, and configuration management
- Lead efforts on performance tuning, scaling, and benchmarking the data services infrastructure
- Work closely with performance engineers and core product engineers on a myriad of topics
- Contribute to automation such as configuration automation using chef, launching infrastructure using terraform and in house tooling as well as automate any other repetitive tasks.
- Track resource usage trends and take preventative actions to restore full health
- Monitor security and database operation related alerts, take preventive or corrective action to resolve issues
- Participate in on-call rotation and occasional off-hour activities
Minimum Required Knowledge, Skills, Abilities, and Qualities:
- 5+ years of experience managing MySQL / Percona Server 5.7 / 8, Aurora at scale
- 2+ years of experience using AWS/GCP or any other cloud provider
- 1+ years of experience with managing Vitess in production
- 2+ years of experience with automating systems and infrastructure using Terraform
- Proficient using and developing Chef cookbooks and recipes to manage configuration
- Proficient in a Linux environment including Linux internals and tuning
- Experience as a first responder for the data tier on a high-traffic site
- Experience working in AWS (EC2 / EBS / S3 Snapshots / Aurora / RDS)
- Identify with: security conscious, self-motivated, accountable, collaborative, reliable, and a team player.
- Proficiency in automating administrative tasks using (Ruby, Python, Shell, Ansible, Go)
- Tech blogging / Open source projects contributions a plus
Okta is an Equal Opportunity Employer