Summary
As a site reliability engineer supporting container platforms on our team, you’ll work with the internal Booz Allen core team on the development of more robust systems by building a resilient infrastructure. You’ll build in redundancy, implement monitoring tools, and automate wherever possible. You’ll reduce toil by scripting routine tasks and automating self-repair. This is your chance to leverage your expertise with Kubernetes environments in the cloud while assisting junior engineers and broadening your knowledge base.
Responsibilities
- 3+ years of experience in Site Reliability Engineering (SRE), DevOps, or Platform Engineering roles
- Experience with production in Kubernetes, including clusters, workloads, networking, and troubleshooting
- Experience with two or more Kubernetes-based platforms, including AKS, EKS, or OpenShift
- Experience with cloud platforms, including AWS or Azure, and core services, including IAM, networking, logging, or compute
- Experience with scripting or automation, including Python, Bash, or Go, on Linux platforms
- Experience with Infrastructure as Code, such as AI solutions or CloudFormation
- Experience with observability tools, such as Dynatrace, Prometheus, or Grafana
- Knowledge of incident management, RCA, and reliability engineering principles
Apply for this position