The Production Reliability Engineer combines experience in traditional system and network engineering with software engineering to automate processes and reduce toil. The PE is responsible for ensuring production systems are running in a reliable and scalable manner while constantly using observability data to prove these metrics. Responsible for network and systems automation, network and systems security, network and systems monitoring and alerting. Ensures reliable and timely delivery of network and systems operations for Internet, Core and Data Center connectivity, and application delivery. Works across the enterprise to automate where possible and reduce toil. Candidates with an equivalent combination of experience and/or education are encouraged to apply.
Applied experience in network, systems, and process automation.
Applied experience in network, security, and systems operations.
Applied experience in network and systems security.
Knowledge of Linux server administration.
Requires the ability to work with users with varying levels of technical expertise.
Experience with programming and scripting languages (Python, Go, Java, Bash, etc). Software Engineering and database (relational and NoSQL) experience preferred.
Experience with observability, monitoring, and alerting tools (Prometheus, Grafana, etc).
Must be able to apply experience, logic, and imagination to make sense of situations that arise and develop solutions that solve problems.
Must have the ability to establish and maintain effective working relationships with other IT staff, business stakeholders, and vendors.
Proficient written and verbal communications skills.