Talent.com
你所在的国家不提供此工作机会。
Lead Platform / Site Reliability Engineer

Lead Platform / Site Reliability Engineer

IO TECH SOLUTIONS LIMITEDHongKong, Hong Kong
30 天前
职位类型
  • Quick Apply
职位描述

What You'll Do :

As a Lead SRE, you'll be instrumental in shaping our systems' future. Your responsibilities will include :

  • System Reliability Leadership : Develop and execute strategies to achieve unparalleled service reliability and availability. You'll implement cutting-edge best practices, design resilient monitoring solutions, and conduct comprehensive failure injection and failover testing.
  • Advanced Automation : Spearhead automation initiatives to streamline complex operational tasks, enhancing efficiency and reducing manual interventions.
  • You'll advocate for treating "operations as a software problem" throughout the organization.
  • Comprehensive Monitoring & Performance : Design and maintain advanced monitoring and alerting systems to assess system health, performance, and user experience. You'll conduct in-depth analysis of metrics and logs to proactively identify and resolve complex issues.
  • Incident Management & Prevention : Lead during critical incidents, ensuring rapid resolution and clear communication. You'll conduct thorough post-mortem analyses, implement sustainable solutions, and share insights to prevent recurrence.
  • Expect to participate in on-call rotations as a primary escalation point.
  • Strategic Collaboration : Work closely with development and operations teams to embed reliability principles throughout the software development lifecycle.
  • You'll provide expert guidance, promote SRE best practices, and foster a culture of shared ownership for system reliability.
  • Capacity Planning & Optimization : Monitor and analyze system capacity and
  • performance data, forecast future demands, and lead efforts to scale infrastructure efficiently to meet growth.
  • Continuous Improvement & Innovation : Identify areas for systemic improvement in systems, tools, and processes. You'll lead the design and implementation of innovative solutions to enhance reliability, performance, and operational efficiency.
  • Mentorship & Leadership : Provide technical leadership and mentorship to SREs and other team members, fostering growth and skill development. You'll also contribute to hiring and onboarding processes for new team members.

What You'll Bring :

  • We're looking for a highly experienced and passionate SRE leader with :
  • 12+ years of experience in Site Reliability Engineering, DevOps, or a related critical
  • operations role, with a proven track record of leading significant reliability initiatives.
  • A Bachelors degree in Computer Science, Engineering, or a related technical field, or equivalent extensive practical experience.
  • Exceptional proficiency in scripting and programming languages (e.g., Python, Go, Java, Ruby, Bash) for developing advanced automation, tooling, and system
  • integrations.
  • Extensive hands-on experience with major cloud platforms (e.g., AWS, Google Cloud Platform, Azure) and deep expertise in containerization technologies (Docker, Kubernetes).
  • Profound understanding of Linux / Unix systems internals, networking protocols, and distributed system architectures.
  • Expertise in designing and managing CI / CD pipelines and robust version control systems (e.g., Git), advocating for GitOps principles.
  • Mastery of monitoring, logging, and alerting tools (e.g., Datadog, Prometheus, Grafana, ELK stack, OpenTelemetry).
  • Superior problem-solving skills, critical thinking, and meticulous attention to detail, especially under pressure.
  • Outstanding communication, interpersonal, and collaboration skills, with the ability to influence and lead cross-functional teams.
  • Proven ability to thrive and lead in a fast-paced, highly dynamic, and complex technical environment.
  • Expert-level debugging and root cause analysis capabilities across complex distributed systems.
  • Bonus Points For :

  • Extensive experience with infrastructure as code (IaC) tools (e.g., Terraform, Ansible, Pulumi).
  • Deep knowledge of various database systems (relational and NoSQL) and advanced data management strategies.
  • Significant experience designing, implementing, and operating microservices architectures.
  • Contributions to open-source projects related to SRE, operations, or cloud-native technologies.
  • This role offers a unique opportunity to make a significant impact on our core services and directly influence our engineering culture around reliability.
  • 为此搜索创建职位提醒

    Engineer • HongKong, Hong Kong

    相关职位
    Devops / Sre

    Devops / Sre

    IO TECH SOLUTIONS LIMITEDHong Kong Island, Hong Kong
    Quick Apply
    We are seeking a skilled and motivated.DevOps / Site Reliability Engineer (SRE).As a DevOps / SRE team member, you will work closely with development, QA, and operations teams to automate processes, ...展示更多上次更新时间:30 天前
    Crypto Listings Manager

    Crypto Listings Manager

    TothemoonHong Kong, Hong Kong, HK
    Quick Apply
    About Tothemoon Welcome to Tothemoon – a cutting-edge, comprehensive platform for trading digital assets with ease and confidence. At Tothemoon , we prioritize what matters most in the dynamic...展示更多最后更新时间: 9天前
    Solution Project Engineer

    Solution Project Engineer

    LMI TechnologiesShenzhen, CN
    Quick Apply
    LMI Technologies, recognized as one of Canada’s Best Workplaces, is a medium-sized technology company built on a culture of openness, respect and professional excellence.At LMI our staff work...展示更多上次更新时间:30 天前
    Product Design Lead

    Product Design Lead

    Hex TrustHong Kong, Hong Kong, HK
    Quick Apply
    We are seeking a Product Design Lead to shape and elevate the design function at Hex Trust.This role goes beyond creating clean interfaces - it is about driving the vision and strategy for user exp...展示更多最后更新时间: 9天前
    Staff Firmware Engineer

    Staff Firmware Engineer

    Synaptics Inc.Shenzhen, Guangdong, CN
    Millions of people experience Synaptics every day.Our technology impacts how people see, hear, touch, and engage with a wide range of IoT applications at home, at work, in the car or on the go.W...展示更多最后更新时间: 27天前
    (f2pool) DevOps Engineer

    (f2pool) DevOps Engineer

    stakefishHong Kong, Hong Kong, HK
    Quick Apply
    As our DevOps Engineer, you will be helping us build and maintain blockchain networks and protocols.You will work on improving our current infrastructure including security, automation, and monitor...展示更多最后更新时间: 3天前
    DevOps Engineer

    DevOps Engineer

    stakefishHong Kong, Hong Kong, HK
    Quick Apply
    As our DevOps Engineer, you will be helping us build and maintain blockchain networks and protocols.You will work on improving our current infrastructure including security, automation, and monitor...展示更多上次更新时间:30 天前
    Backend Engineer (Hybrid | Gaming & Resorts)

    Backend Engineer (Hybrid | Gaming & Resorts)

    IO TECH SOLUTIONS LIMITEDHong Kong, Hong Kong SAR, Hong Kong
    Quick Apply
    As a backend engineer, you will play a crucial role in designing, developing, and maintaining the backend systems and services that power the clients gaming and resorts platforms.You'll h...展示更多最后更新时间: 3天前
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Lifebyte SystemsShenzhen, Guangdong Sheng, CN
    Quick Apply
    Founded in 2017, we are dedicated to fostering an ecosystem of seamless resource exchange, where efficiency and precision are paramount. With cutting-edge solutions, we empower businesses to thrive ...展示更多上次更新时间:30 天前
    Senior Site Reliability Engineer (Hong Kong Based)

    Senior Site Reliability Engineer (Hong Kong Based)

    Bowtie Life Insurance Company LimitedWan Chai, Hong Kong Island, HK
    Quick Apply
    Our purpose is simple - we are here to bring back the good of insurance : protecting people and their families.By combining our deep domain expertise and our modern proprietary technology, we s...展示更多最后更新时间: 23天前
    Web3 Platform Token Operations Lead

    Web3 Platform Token Operations Lead

    Zeal GroupHong Kong, Hong Kong, HK
    Quick Apply
    We are seeking an experienced Web3 Platform Token Operations Lead to design, execute, and optimize ecosystem strategies for our exchange-native token. The ideal candidate should have a proven track ...展示更多上次更新时间:30 天前
    Backend Engineer - WFH (High Salary, attractive title)

    Backend Engineer - WFH (High Salary, attractive title)

    IO TECH SOLUTIONS LIMITEDHong Kong Island, Hong Kong
    Quick Apply
    We are seeking a highly skilled Backend Engineer to design, build, and maintain the server-side components of our applications. The ideal candidate will have strong expertise in backend technologies...展示更多最后更新时间: 17天前
    Junior DevOps / SRE

    Junior DevOps / SRE

    IO TECH SOLUTIONS LIMITEDHongKong, Hong Kong
    Quick Apply
    As a Junior Site Reliability Engineer you will : .Create automations around builds, automated testing, deployments, alerting, telemetry and statistics. Providing front-line on-call support in case of ...展示更多上次更新时间:30 天前
    Senior C++ Engineer

    Senior C++ Engineer

    Lifebyte SystemsShenzhen, Guangdong Sheng, CN
    Quick Apply
    Founded in 2017, we are dedicated to fostering an ecosystem of seamless resource exchange, where efficiency and precision are paramount. With cutting-edge solutions, we empower businesses to thrive ...展示更多上次更新时间:30 天前
    Post-Sales Engineer / Consultant - Storage, Virtualization & HCI Solutions

    Post-Sales Engineer / Consultant - Storage, Virtualization & HCI Solutions

    nextRolesHong Kong, Hong Kong, HK
    Quick Apply
    Responsible for Postsales including implementation & maintenance of Storage & Virtualization solutions;.Work closely with Sales team to identify, address and develop sales leads and opportu...展示更多上次更新时间:30 天前
    Fullstack Engineer - Leading Digital Asset Management Platform

    Fullstack Engineer - Leading Digital Asset Management Platform

    IO TECH SOLUTIONS LIMITEDHong Kong, Hong Kong
    Quick Apply
    Our client, a leading digital asset management platform which is run.Goldman Sachs, JP Morgan, Deutsche and is supported by a known Hong Kong family office. The ideal candidate will be tasked w...展示更多上次更新时间:30 天前
    Senior Fullstack Engineer - Web3 & Blockchain

    Senior Fullstack Engineer - Web3 & Blockchain

    IO TECH SOLUTIONS LIMITEDHong Kong, Hong Kong
    Quick Apply
    We are looking for a product focused lead engineer who is obsessed with building on-chain consumer apps.Do you dream of building an on-chain consumer app used by million people?.Do you get joy from...展示更多上次更新时间:30 天前
    Lead Consultant (FortiGuard Incident Response) - APAC

    Lead Consultant (FortiGuard Incident Response) - APAC

    FortinetHK
    Join Fortinet, a cybersecurity pioneer with over two decades of excellence, as we continue to shape the future of cybersecurity and redefine the intersection of networking and security.At Fortinet,...展示更多上次更新时间:30 天前