[Close] 

Associate Customer Reliability Engineer

Dick's Sporting Goods is looking for Reliability Engineers (RE) with a passion for system reliability to join our Reliability Engineering organization. As part of this engineering team, you will build reliability into our systems, infrastructure, and applications. Our goals are ambitious and very focused on results and include user-facing applications, observability, production excellence, reliability, errors elimination, efficiency, and automation of manual and repetitive tasks. The RE role at Dick's Sporting Goods (DSG) provides an opportunity to blend system design and software engineering skills with passion for troubleshooting and defects elimination to address an ever-changing applications and environments with scalability and reliability challenges. This is an opportunity for you to join us on this journey and have a real impact on how we support our customers and build software. The RE will work with other Reliability Engineers (RE), Product Managers, and Developers practitioners to produce and ensure highest levels of availability and reliability of all our customer facing websites, third party interfaces and legacy application services. The RE is expected to work with management, peers, and customers to define and implement the technical vision, improve monitoring tools, error detections, defects elimination while improving Mean Time to Detection/Resolution, and overall service availability and customer satisfaction. Qualifications Troubleshoot high severity e-commerce, infrastructure and legacy business applications/websites performance and availability issues and manages the incident lifecycle to resolutions. Lead root cause analysis/investigations through identifying, analyzing and remediating service(s) performance and availability issues to ensure maximum service uptime and availability. Conducting Blameless Post Incident Review is expected. Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation and refinement. Maintain services once they are live by measuring and monitoring availability, latency and overall system health. You're expected to be on- call and have strong written communication skills and be able to develop working relationships with coworkers. Experience in balancing service reliability, metrics, sustainability, technical debt, and operational toil for live services running at scale. Work across multiple project teams simultaneously to support rapid development efforts. Solve complex, business critical issues that impact bottom line financial numbers and customer loyalty/experience. Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity. Contribute positively to open source projects developed by DSG and join existing communities. Navigate this broader ecosystem and structure projects with upstream/ downstream opportunities in mind. Identify and integrate with third-party solutions where it makes the most sense. Use data to understand the availability, reliability, and sustainability of our software. Bring experience, pragmatism, empathy, and composure to interactions with teams outside of the RE organization. Work frequently with Product teams on shared goals and cross-team projects. Balance planned and reactive work using basic project planning techniques and technical roadmaps. Work and collaborate across teams such Application services, Capacity Planning, Hardware, Network, and Datacenter Operations. Participate in building advanced tooling for testing, monitoring, administration, and operations of multiple clusters across multiple environments. Experience negotiating SLIs, SLOs, and SLAs with product owners. General/Minimum Qualifications 3-5+ years of applying reliability engineering principals to distributed services. Understanding of and comfort with the GNU/Linux operating system. Proficiency in high-level languages such as Ruby, Python, and Bash. Exposure to system-level languages such as Go, C/C++. Familiarity with configuration management software such as Puppet, Chef, Ansible, or Salt. Source control, branching, & merging: git/svn/etc (Repository Management) Networking basics: TCP vs UDP, basic troubleshooting, HTTP - load balancing, firewall, private networks, multi-tier design, scale-out, persistent data Databases - at a minimum understands the basics - select/insert Familiarity with standard infrastructure concepts like load balancers, firewalls, object storage and where/when they might be used. Service Management - Incident Response, Change, and Problem Management. Experience with Kubernetes and Docker. Cloud computing concepts (not necessarily provider specific) - VMs vs Docker Containers, block storage vs object storage, infra automation vs install automation. Experience operating a platform, software as a service, or shipping software. Experience as an open-source contributor. 18000JP2
Salary Range: NA
Minimum Qualification
Less than 5 years

Don't Be Fooled

The fraudster will send a check to the victim who has accepted a job. The check can be for multiple reasons such as signing bonus, supplies, etc. The victim will be instructed to deposit the check and use the money for any of these reasons and then instructed to send the remaining funds to the fraudster. The check will bounce and the victim is left responsible.

More Jobs

Associate Reliability Engineer
York, PA BAE Systems
Optics Reliability Engineer
Allentown, PA Cisco
ASE III, Technical Customer Service Engineering
Pittsburgh, PA Xerox
Reliability Engineer
York, PA PDS Tech Inc
Reliability Engineer
Towanda, PA DuPont
Maintenance Reliability Engineer
North East, PA Welch's