Site Reliability Engineer
STATS | Product & Technology | Chicago, IL
Site Reliability Engineer:
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. The main function of the SRE team is to be responsible for the availability, performance, monitoring, and incident response for STATS’ internally critical and our customer-facing systems.
What You'll Do:
- Engage in and improve the whole life cycle of services—from inception and design, through deployment, operation and refinement
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health
- Scale systems through sustainable mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
- Establish a mindset and a set of engineering approaches to running better production systems with focuses on optimizing existing systems, building infrastructure and eliminating work through automation
- Establish a culture of diversity, intellectual curiosity, problem solving and openness to ensure team success
- Create an environment that provides the support and mentorship needed to learn and grow
What You'll Need:
- B.S. in Computer Science or equivalent experience
- Minimum of 3 years of experience with technical operations and software development
- Solid understanding/experience of containerization services such as Docker
- Working knowledge of open source tools such as Prometheus, Grafana, Logstash, Elasticsearch
- Solid understanding/experience of web services, databases and relating infrastructure/architectures
- Ability to manage using a preferred scripting language
- Solid understanding of IT infrastructure
- Excellent Troubleshooting Skills
- DevOps experience a plus
- System administration experience a plus
- AWS cloud experience a plus
- Supporting experience for enterprise-level SaaS environment a plus
- Security experience a plus
- Kubernetes experience a plus
Who We Are:
The values at STATS mirror those of foundationally great teams and franchises. Our objective is to win through effort, creativity, team-work and positive energy. Specifically, we look for candidates that embody the following: Be All-In, Put the Fan at the Center, Get Stuff Done, It’s Your Team, Make an Impact, and Fearless Integrity. We want employees that crave responsibility, accountability and want to have fun working in a collaborative, Get Stuff Done environment.
- Put Fan at the Center: You enjoy working with customers and have the strong communication skills to make those interactions a success.
- It’s Your Team: You embrace others' ideas (even if they conflict with your own) for the sake of the company and customer. You are a collaborator and relationship builder.
- Get Stuff Done: You are driven and your can-do attitude inspires others to elevate their performance in a fast-moving environment.
- Make an Impact: You thrive in a fast-paced, changing environment and you’re excited by the chance to play a large role.
- Be All-In: You must be passionate about what you do and about our customers' success.
- Fearless Integrity: You are self-motivated and capable of holding both yourself and others accountable to deliver on multiple tasks.
STATS provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, national origin, age, disability or genetics.