Associate Reliability Engineer

Description The Associate Customer Reliability Engineer will work with other Reliability Engineers (RE), Product Managers, and Developers practitioners to produce mission-critical infrastructure, tools, performance improvements, actionable and meaningful performance measurements, and communication to stakeholders. The RE will maintain the solution definition and systems requirements of multiple application development projects and will set standards for requirements definitions, systems configurations and data/systems integration definitions. The RE will work with the business and technical teams to close gaps between solution design and business processes and assess impacts to services/processes across the enterprise. The CRE role at Dick's Sporting Goods (DSG) provides an opportunity to blend system design and software engineering skills with passion for troubleshooting and defects elimination to address an ever-changing applications and environments with scalability and reliability challenges. Qualifications Lead architecture and security reviews of systems architecture designs with development and infrastructure teams. Ensure systems monitoring and purge and archive strategies are implemented and enforced. Ensures appropriate documentation concerning project deliverables, system support and ITS processes are created and distributed to affected parties and maintained as necessary. Document and/or review systems management and operations manuals for hand off to production operations teams prior to production release or implementation. Mentoring responsibilities for other systems analysts Set standards for business use cases to reflect necessary elements to build requirements against Set standards for business requirements definition Set standards for solution configurations definition Troubleshoot e-commerce, infrastructure and legacy business applications/websites performance and availability issues and manager the incident lifecycle to resolutions. Drive root cause analysis/investigations through identifying, analyzing and remediating service(s) performance and availability issues to ensure maximum service uptime and availability. Conducting Blameless Post Incident Review is expected. Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation and refinement. Maintain services once they are live by measuring and monitoring availability, latency and overall system health. You're expected to be on- call and have strong written communication skills and be able to develop working relationships with coworkers. Supervise ITSD, Operations and ESOC level 1 technicians in service reliability, metrics, sustainability, technical debt, and operational toil for live services running at scale. Work across multiple project teams simultaneously to support rapid development efforts. Solve complex, business critical issues that impact bottom line financial numbers and customer loyalty/experience. Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity. Contribute positively to open source projects developed by DSG and join existing communities. Navigate this broader ecosystem and structure projects with upstream/ downstream opportunities in mind. Identify and integrate with third-party solutions where it makes the most sense. Use data to understand the availability, reliability, and sustainability of our software. Bring experience, pragmatism, empathy, and composure to interactions with teams outside of the RE organization. Work frequently with Product teams on shared goals and cross-team projects. Balance planned and reactive work using basic project planning techniques and technical roadmaps. Work and collaborate across teams such Application services, Capacity Planning, Hardware, Network, and Datacenter Operations. Participate in building advanced tooling for testing, monitoring, administration, and operations of multiple clusters across multiple environments. Experience negotiating SLIs, SLOs, and SLAs with product owners Bachelor's 1-3 years experience Additional Technological Requirements (Provide any requirements not listed above) Valuable Technologies Like: WebSphere Commerce, WebSphere eXtreme Scale, WebSphere Application Server, WebSphere Message Broker, WebSphere MQ, Order Management, Web Services, Tomcat, Apache, TCP, UDP, Load Balancers, (Repository Management git/svn/), Puppet, Chef, Ansible, Salt, VM, Dockers Containers Valuable Methodologies Like: ITIL, Agile, , SCRUM, Reliability Engineering Valuable Languages Like: Java, JavaScript, SQL, XML, HTML, CSS, Visual Basic, AJAX, C++, COBOL, JSTL, Ruby, Python, Bash. Go, C/C++. Valuable Languages Like: Java, JavaScript, SQL, XML, HTML, CSS, Visual Basic, AJAX, C++, COBOL, JSTL, Ruby, Python, Bash. Go, C/C++. Valuable Databases/OS Systems Like: Oracle, DB2, SQLServer, Windows, UNIX, Linux, SYSTEM Valuable Monitoring Tools Like: IBM Monitoring, SCOM,CA Spectrum, AppDynamics, Soasta, Foglight Service Management Tools Like: Remedy, Service Now, Jira, Pivotal Tracker, Xmatters, Knowledge, Skills, & Abilities (Select knowledge, skills, & ability required) ? Excellent written & Verbal Communication Skills ? Execution skills ? Business acumen ? Project management knowledge ? Customer-service oriented ? Ability to drive projects & manage project teams ? Strong interpersonal & client consultation skills ? Ability to work effectively in a team environment ? Self-motivated & results oriented ? In-depth analytical skills ? Strong presentation skills ? Strong detail orientation ? Supervisory & leadership capabilities ? Superior organizational abilities ? Problem solving & troubleshooting capabilities ? Process & procedure oriented Additional Knowledge, Skills, & Abilities (Provide any knowledge, skills, & abilities not listed above) Intellectual curiosity, problem solving and openness is key to its success. Mindset for solving production systems issues and understanding root cause while providing 'Detective work' and automating away toil - doesn't like boring repetitive tasks. Enjoys digging into new problems. Capable of driving and focusing on results given in some cases given an ill-defined problem, such as "this is slow", and developing metrics and making measurable improvements Can work on different tasks in different systems week to week Knows when to ask for help and when to dig more on their own Understanding of and comfort with the GNU/Linux operating system. Proficiency in high-level languages such as Ruby, Python, and Bash. Networking basics: TCP vs UDP, basic troubleshooting, HTTP - load balancing, firewall, private networks, multi-tier design, scale-out, persistent data Databases - at a minimum understands the basics - select/insert Familiarity with standard infrastructure concepts like load balancers, firewalls, object storage and where/when they might be used. Service Management - Incident Response, Change, and Problem Management. Exposure to system-level languages such as Go, C/C++. Familiarity with configuration management software such as Puppet, Chef, Ansible, or Salt. Source control, branching, & merging: git/svn/etc (Repository Management) Experience with Kubernetes and Docker. Cloud computing concepts (not necessarily provider specific) - VMs vs Docker Containers, block storage vs object storage, infra automation vs install automation. Experience operating a platform, software as a service, or shipping software. First-hand experience with Prometheus and Istio. Experience as an open-source contributor. 18000HSV
Salary Range: NA
Minimum Qualification
Less than 5 years

Don't Be Fooled

The fraudster will send a check to the victim who has accepted a job. The check can be for multiple reasons such as signing bonus, supplies, etc. The victim will be instructed to deposit the check and use the money for any of these reasons and then instructed to send the remaining funds to the fraudster. The check will bounce and the victim is left responsible.

More Jobs

Associate Reliability Engineer
York, PA BAE Systems
Allentown, PA Air Products and Chemicals
Optics Reliability Engineer
Allentown, PA Cisco
Sr. Associate, Data Engineer, Financial Services
Philadelphia, PA KPMG
Reliability Engineer
York, PA PDS Tech Inc
Reliability Engineer
Towanda, PA DuPont