Back to jobs
Site Reliability Engineer AWS
- Posted 11 August 2023
- SalaryCompetitive Contract Rate
- LocationWellington
- Job type Contract
- SpecialisationInformation & Communication Technology
- ReferenceBH-1916
Job description
Job Opportunity: AWS Site Reliability Engineer (SRE) - 12 Month Contract
Are you ready for an exciting challenge? Do you thrive in a fast-paced, dynamic environment? Seeking an AWS Site Reliability Engineer (SRE) to join our innovative team on a 12-month contract basis. If you're passionate about microservices, NewRelic or similar technologies, and ensuring top-notch operational excellence, this opportunity is tailor-made for you!
Key Responsibilities: As an AWS SRE, you will play a pivotal role in ensuring the operational success of our application environment, focusing on meeting and exceeding expectations, SLOs, and SLAs. Your responsibilities will include:
Kia ora, Comspek and our clients fully support and encourage diverse hiring and inclusive recruitment processes. Don’t meet every single requirement of this job description? That’s ok - You do not need to tick every box or have expertise in the full JD. Comspek is dedicated to building diverse, inclusive and authentic workplaces based on different clients’ needs. So, if you’re excited about this role, we encourage you to apply.
Are you ready for an exciting challenge? Do you thrive in a fast-paced, dynamic environment? Seeking an AWS Site Reliability Engineer (SRE) to join our innovative team on a 12-month contract basis. If you're passionate about microservices, NewRelic or similar technologies, and ensuring top-notch operational excellence, this opportunity is tailor-made for you!
Key Responsibilities: As an AWS SRE, you will play a pivotal role in ensuring the operational success of our application environment, focusing on meeting and exceeding expectations, SLOs, and SLAs. Your responsibilities will include:
- Defining and producing operational metrics for distributed systems and microservices.
- Establishing SLO/SLAs and implementing an error budget framework.
- Enhancing proactive monitoring and alerting for services and systems.
- Swiftly responding to and resolving incidents, issues, requests, and problems.
- Crafting detailed postmortems and Post Incident Reports.
- Spearheading automation of incident response, change, and service request processes.
- Taking charge of initial Level 2 triage and escalation during business hours and after-hours on-call shifts.
- Optimizing on-call rotations and procedures for seamless operations.
- Evaluating the current system and presenting ideas for enhancements.
- Developing solutions to support DevOps, ITOps, and support teams.
- Leading and contributing to canary processes for releases.
- Building and maintaining services and systems using automation scripts.
- Participating in release planning and deployments.
- Leading planned Disaster Recovery testing.
- Documenting and updating Playbooks for enhanced knowledge sharing.
- Collaborating with and managing interactions with 3rd party vendors.
- Thriving in a fast-paced environment with multiple priorities.
- Strong communication and interpersonal skills, and a commitment to quality outcomes.
- Exceptional problem-solving and diagnostic abilities with keen attention to detail.
- The capability to translate technical jargon into clear language for various stakeholders.
- Experience working within Agile software development teams.
- Proficiency in large-scale Planning Increment (PI) planning, iteration planning, release planning, and backlog management using tools like Jira.
- An adaptable and flexible working style, even under pressure.
- Strong planning, design, implementation, and documentation skills.
- Working with cloud platforms, particularly AWS.
- Familiarity with AWS services such as RDS, ECS, ElastiCache, Redis, and AWS Cloud Development Kit (CDK) or Terraform.
- Expertise in NewRelic platform or similar Application Performance Monitoring (APM) tools.
- Proficiency in incident management tools like OpsGenie and APM.
- Experience with log aggregation and querying, e.g., CloudWatch.
- Development/scripting skills with TypeScript, Bash, Kotlin (Java), and GitHub Actions.
- Strong grasp of Infrastructure as Code and testing automation.
- Building and promoting operational dashboards.
Kia ora, Comspek and our clients fully support and encourage diverse hiring and inclusive recruitment processes. Don’t meet every single requirement of this job description? That’s ok - You do not need to tick every box or have expertise in the full JD. Comspek is dedicated to building diverse, inclusive and authentic workplaces based on different clients’ needs. So, if you’re excited about this role, we encourage you to apply.