Our Firm

Careers

VIEW ALL APPLY NOW

Site Reliability Engineer

Experienced

STATS | Product & Technology | Chicago, IL

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. The main function of the SRE team is to be responsible for the availability, performance, monitoring, and incident response for STATS’ internally critical and our customer-facing systems.

What You'll Do:

  • Engage in and improve the whole life cycle of services—from inception and design, through deployment, operation and refinement
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems  through sustainable mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.
  • Establish a mindset and a set of engineering approaches to running better production systems with focuses on optimizing existing systems, building infrastructure and eliminating work through automation
  • Establish a culture of diversity, intellectual curiosity, problem solving and openness to ensure team success
  • Create an environment that provides the support and mentorship needed to learn and grow
Skills & Requirements

What You'll Need:

  • B.S. in Computer Science or equivalent experience
  • Minimum of 3 years of experience with technical operations and software development
  • Solid understanding/experience of containerization services such as Docker
  • Working knowledge of open source tools such as Prometheus, Grafana, Logstash, Elasticsearch
  • Solid understanding/experience of web services, databases and relating infrastructure/architectures
  • Ability to manage using a preferred scripting language
  • Solid understanding of IT infrastructure
  • Excellent Troubleshooting Skills
  • DevOps experience a plus
  • System administration experience a plus
  • AWS cloud experience a plus
  • Supporting experience for enterprise-level SaaS environment a plus
  • Security experience a plus
  • Kubernetes experience a plus
 

STATS provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, national origin, age, disability or genetics.