Talent.com
你所在的国家不提供此工作机会。
Lead Platform / Site Reliability Engineer

Lead Platform / Site Reliability Engineer

IO TECH SOLUTIONS LIMITEDHongKong, Hong Kong
30 天前
职位类型
  • Quick Apply
职位描述

What You'll Do :

As a Lead SRE, you'll be instrumental in shaping our systems' future. Your responsibilities will include :

  • System Reliability Leadership : Develop and execute strategies to achieve unparalleled service reliability and availability. You'll implement cutting-edge best practices, design resilient monitoring solutions, and conduct comprehensive failure injection and failover testing.
  • Advanced Automation : Spearhead automation initiatives to streamline complex operational tasks, enhancing efficiency and reducing manual interventions.
  • You'll advocate for treating "operations as a software problem" throughout the organization.
  • Comprehensive Monitoring & Performance : Design and maintain advanced monitoring and alerting systems to assess system health, performance, and user experience. You'll conduct in-depth analysis of metrics and logs to proactively identify and resolve complex issues.
  • Incident Management & Prevention : Lead during critical incidents, ensuring rapid resolution and clear communication. You'll conduct thorough post-mortem analyses, implement sustainable solutions, and share insights to prevent recurrence.
  • Expect to participate in on-call rotations as a primary escalation point.
  • Strategic Collaboration : Work closely with development and operations teams to embed reliability principles throughout the software development lifecycle.
  • You'll provide expert guidance, promote SRE best practices, and foster a culture of shared ownership for system reliability.
  • Capacity Planning & Optimization : Monitor and analyze system capacity and
  • performance data, forecast future demands, and lead efforts to scale infrastructure efficiently to meet growth.
  • Continuous Improvement & Innovation : Identify areas for systemic improvement in systems, tools, and processes. You'll lead the design and implementation of innovative solutions to enhance reliability, performance, and operational efficiency.
  • Mentorship & Leadership : Provide technical leadership and mentorship to SREs and other team members, fostering growth and skill development. You'll also contribute to hiring and onboarding processes for new team members.

What You'll Bring :

  • We're looking for a highly experienced and passionate SRE leader with :
  • 12+ years of experience in Site Reliability Engineering, DevOps, or a related critical
  • operations role, with a proven track record of leading significant reliability initiatives.
  • A Bachelors degree in Computer Science, Engineering, or a related technical field, or equivalent extensive practical experience.
  • Exceptional proficiency in scripting and programming languages (e.g., Python, Go, Java, Ruby, Bash) for developing advanced automation, tooling, and system
  • integrations.
  • Extensive hands-on experience with major cloud platforms (e.g., AWS, Google Cloud Platform, Azure) and deep expertise in containerization technologies (Docker, Kubernetes).
  • Profound understanding of Linux / Unix systems internals, networking protocols, and distributed system architectures.
  • Expertise in designing and managing CI / CD pipelines and robust version control systems (e.g., Git), advocating for GitOps principles.
  • Mastery of monitoring, logging, and alerting tools (e.g., Datadog, Prometheus, Grafana, ELK stack, OpenTelemetry).
  • Superior problem-solving skills, critical thinking, and meticulous attention to detail, especially under pressure.
  • Outstanding communication, interpersonal, and collaboration skills, with the ability to influence and lead cross-functional teams.
  • Proven ability to thrive and lead in a fast-paced, highly dynamic, and complex technical environment.
  • Expert-level debugging and root cause analysis capabilities across complex distributed systems.
  • Bonus Points For :

  • Extensive experience with infrastructure as code (IaC) tools (e.g., Terraform, Ansible, Pulumi).
  • Deep knowledge of various database systems (relational and NoSQL) and advanced data management strategies.
  • Significant experience designing, implementing, and operating microservices architectures.
  • Contributions to open-source projects related to SRE, operations, or cloud-native technologies.
  • This role offers a unique opportunity to make a significant impact on our core services and directly influence our engineering culture around reliability.
  • 为此搜索创建职位提醒

    Engineer • HongKong, Hong Kong

    相关职位
    Devops / Sre

    Devops / Sre

    IO TECH SOLUTIONS LIMITEDHong Kong Island, Hong Kong
    Quick Apply
    We are seeking a skilled and motivated.DevOps / Site Reliability Engineer (SRE).As a DevOps / SRE team member, you will work closely with development, QA, and operations teams to automate processes, ...展示更多上次更新时间:30 天前
    Crypto Listings Manager

    Crypto Listings Manager

    TothemoonHong Kong, Hong Kong, HK
    Quick Apply
    About Tothemoon Welcome to Tothemoon – a cutting-edge, comprehensive platform for trading digital assets with ease and confidence. At Tothemoon , we prioritize what matters most in the dynamic...展示更多最后更新时间: 14天前
    Linux and Devops Engineer

    Linux and Devops Engineer

    RM Staffing B.V.Shenzhen, GD, CN
    We are seeking a highly skilled.The ideal candidate will have deep expertise in.Linux systems administration, automation, cloud infrastructure, and CI / CD pipelines. This role requires hands-on techn...展示更多最后更新时间: 4天前
    Solution Project Engineer

    Solution Project Engineer

    LMI TechnologiesShenzhen, CN
    Quick Apply
    LMI Technologies, recognized as one of Canada’s Best Workplaces, is a medium-sized technology company built on a culture of openness, respect and professional excellence.At LMI our staff work...展示更多上次更新时间:30 天前
    Senior Site Reliability / DevOps Engineer- Asian Timezones

    Senior Site Reliability / DevOps Engineer- Asian Timezones

    hermeneutic InvestmentsHong Kong, Hong Kong, HK
    Quick Apply
    We're looking for an Senior Site Reliability / DevOps Engineer to join our hedge fund's technology team.You'll be responsible for building and maintaining our cloud infrastructure that powers our tra...展示更多最后更新时间: 4天前
    Senior Backend Engineer (IC / Lead)

    Senior Backend Engineer (IC / Lead)

    Chaos TheoryHong Kong, Hong Kong, HK
    Quick Apply
    Are you a highly driven and result-oriented Senior Backend Engineer with hands-on experience in building scalable and high-quality digital products and features? If so, we are looking for someone l...展示更多最后更新:1 天前
    • 推广
    Lead Software Engineer

    Lead Software Engineer

    IO TECH SOLUTIONS LIMITEDHongKong, Hong Kong
    Mentor junior members in team, provide training and show by example.Promote TDD culture and testing automation in team.Willing to learn and assess new technologies (e. Help to push and transform tea...展示更多上次更新时间:30 天前
    (f2pool) DevOps Engineer

    (f2pool) DevOps Engineer

    stakefishHong Kong, Hong Kong, HK
    Quick Apply
    As our DevOps Engineer, you will be helping us build and maintain blockchain networks and protocols.You will work on improving our current infrastructure including security, automation, and monitor...展示更多最后更新时间: 8天前
    DevOps Engineer

    DevOps Engineer

    stakefishHong Kong, Hong Kong, HK
    Quick Apply
    As our DevOps Engineer, you will be helping us build and maintain blockchain networks and protocols.You will work on improving our current infrastructure including security, automation, and monitor...展示更多上次更新时间:30 天前
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Lifebyte SystemsShenzhen, Guangdong Sheng, CN
    Quick Apply
    Founded in 2017, we are dedicated to fostering an ecosystem of seamless resource exchange, where efficiency and precision are paramount. With cutting-edge solutions, we empower businesses to thrive ...展示更多上次更新时间:30 天前
    Senior Site Reliability Engineer (Hong Kong Based)

    Senior Site Reliability Engineer (Hong Kong Based)

    Bowtie Life Insurance Company LimitedWan Chai, Hong Kong Island, HK
    Quick Apply
    Our purpose is simple - we are here to bring back the good of insurance : protecting people and their families.By combining our deep domain expertise and our modern proprietary technology, we s...展示更多最后更新时间: 28天前
    Web3 Platform Token Operations Lead

    Web3 Platform Token Operations Lead

    Zeal GroupShenzhen, Guangdong Province, CN
    Quick Apply
    We are seeking an experienced Web3 Platform Token Operations Lead to design, execute, and optimize ecosystem strategies for our exchange-native token. The ideal candidate should have a proven track ...展示更多最后更新时间: 4天前
    • 推广
    JR-136211 Data Center Critical Facilities EngineerIII

    JR-136211 Data Center Critical Facilities EngineerIII

    half the skyHong Kong, Hong Kong SAR, Hong Kong
    Data Center Critical Facilities Engineer III.Equinix is the worlds digital infrastructure company, operating 240+ data centersacross the globe andproviding interconnections to all the key clouds an...展示更多上次更新时间:30 天前
    Junior DevOps / SRE

    Junior DevOps / SRE

    IO TECH SOLUTIONS LIMITEDHongKong, Hong Kong
    Quick Apply
    As a Junior Site Reliability Engineer you will : .Create automations around builds, automated testing, deployments, alerting, telemetry and statistics. Providing front-line on-call support in case of ...展示更多上次更新时间:30 天前
    • 推广
    Data Center Facilities Lead

    Data Center Facilities Lead

    IO TECH SOLUTIONS LIMITEDHong Kong, Hong Kong
    Lead the design and implementation of data center facilities, including electrical systems, IT spaces, server racks, and network cabling. Conduct site assessments to evaluate collocation rooms, lega...展示更多上次更新时间:30 天前
    Fullstack Engineer - Leading Digital Asset Management Platform

    Fullstack Engineer - Leading Digital Asset Management Platform

    IO TECH SOLUTIONS LIMITEDHong Kong, Hong Kong
    Quick Apply
    Our client, a leading digital asset management platform which is run.Goldman Sachs, JP Morgan, Deutsche and is supported by a known Hong Kong family office. The ideal candidate will be tasked w...展示更多上次更新时间:30 天前
    • 推广
    Senior UNIX Engineer

    Senior UNIX Engineer

    IO TECH SOLUTIONS LIMITEDHong Kong, Hong Kong
    We are looking for an experienced Unix Engineer to join our APAC Server, Database, and Storage Team.This role focuses on designing, implementing, and maintaining Unix / Linux environments, particular...展示更多上次更新时间:30 天前
    SRE / DevOps / Platform Engineer

    SRE / DevOps / Platform Engineer

    KodyShenzhen, Guangdong Province, CN
    Quick Apply
    Kody is searching for a passionate SRE / DevOps / Platform Engineer to join our dynamic team.As part of our mission to deliver cutting-edge software solutions in the Fintech space, you will play a cruc...展示更多上次更新时间:30 天前