About the Role :
We are seeking a skilled and motivated DevOps / Site Reliability Engineer (SRE) with 2+ years of experience to help us build, scale, and maintain robust, secure, and high-availability infrastructure. As a DevOps / SRE team member, you will work closely with development, QA, and operations teams to automate processes, monitor system health, and ensure the reliability of our services.
This is a hands-on role that requires strong technical skills, a deep understanding of modern DevOps tools and practices, and a problem-solving mindset.
Key Responsibilities :
- Design, implement, and maintain CI / CD pipelines for reliable code deployment
- Monitor application performance and system reliability using tools like Prometheus , Grafana , or Datadog
- Maintain and improve cloud infrastructure (e.g., AWS, GCP, Azure) following best practices
- Manage infrastructure as code using tools such as Terraform , Ansible , or CloudFormation
- Troubleshoot infrastructure and application issues, ensuring minimal downtime and fast resolution
- Automate repetitive operational tasks and improve development workflows
- Implement and enforce security, backup, and disaster recovery strategies
- Participate in on-call rotation and respond to incidents with root cause analysis and postmortem reviews
- Work closely with development teams to ensure applications are designed for performance, availability, and scalability
- Optimize resource usage and costs across cloud environments
Qualifications : Required :
Bachelors degree in Computer Science, Engineering, or a related field2+ years of experience in a DevOps , SRE , or Systems Engineering roleHands-on experience with Linux / Unix system administrationExperience with CI / CD tools such as Jenkins , GitHub Actions , CircleCI , or GitLab CIWorking knowledge of cloud platforms (AWS, GCP, Azure)Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)Experience with infrastructure as code tools like Terraform, Ansible, or similarProficient in at least one scripting or programming language (e.g., Bash, Python, Go)Strong understanding of monitoring, logging , and alerting systemsVersion control with GitPreferred :
Experience with Kubernetes administration in production environmentsFamiliarity with security best practices and compliance standardsUnderstanding of networking , load balancing , and DNS configurationsExposure to incident management and SLA / SLO / SLI conceptsExperience working in Agile environments