The Department
The IT Operations and Systems Department provides the Club’s internal and external customers with expected IT System and Services that enable business operations. The Department’s goal is to provide the Club’s IT customers with best in class IT service offerings and experience.
IT Operations and Systems serves as the primary user engagement channel for IT for help and service offerings fulfillment. Engagement is offered 24x7 via phone, email and direct on-site support.
IT Operations and Systems is the Service owner responsible for; IT Data Computing facilities; production Infrastructure platforms; Incident, Change, Problem, Resilience, Capacity, Configuration, Procurement functions; Service Assurance and Quality management; and Level 1 / 2 system support functions.
The Job
Disaster Recovery Programme
- Lead the development and maintenance of the Clubs comprehensive disaster recovery strategy aligned with the organization's objectives and risk profile
- Collaborate with IT peers to Identify critical systems and data that require protection
- Partner with business system leaders to identify and align on business critical systems, dependencies and data that require protection
- Integrate Incident and Problem learnings into DR and Resilience program improvement service availability protection.
- Lead the development and implementation of the IT operational programmes of work to support Disaster recovery capabilities, management and performance tracking (KPI reporting)
- Lead the development, maintenance and validation of recovery plans, ensuring SoP are maintained and current across all stakeholder systems
- Create and maintain detailed disaster recovery plans, including procedures, protocols, and recovery time objectives (RTOs) aligned to meet the business Maximum allowed Downtime (MTD)
- Document hardware and software configurations necessary for recovery
- Conduct regular reviews of risk assessments to identify potential threats, vulnerabilities, and their potential impact on recovery and resilience capabilities for business operations
- Collaborate with stakeholders to prioritize risks and develop mitigation strategies
- Plan and execute disaster recovery tests, drills, and simulations to ensure the effectiveness of recovery plans
- Analyze test results and make necessary improvements to enhance preparedness
- Manage relationships with third-party vendors and service providers with standard service schedules scope of work and SLA’s
- Establish Non Functional NFR’s that are integrated into Architecture Design standards and build patterns which have standard Operational acceptance testing criteria.
Resourcing and cost management :
Identify required budget, personnel, and technology resources required for the disaster recovery programme, inclusive of all improvement and remediation needsRecovery Incidents and remediation
Lead the response efforts during actual disaster events, working closely with cross-functional teams to minimize downtime and data lossLead the unified communication with senior management and stakeholders on the recovery status during execution, and facilitate the approval processesDesign the programmes of work for incident response invocation of DR plans, their execution and post incident investigation, tracking, and reviewLeads complex and / or high impact problem investigations, including those with IT system business impacts.Lead the identification of incident root cause analysis, and conduct post-mortem session with actionable recommendations linked to owners and delivery timelinesLeads remediation for critical, high-risk technical, process or people resilience quality issues that require remediations to specific incident or across the broader IT estateLead the alignment of the production operating environment with the quality management System requirements, applicable governing regulations, ensuring the continued reduction of risk and technology debtLead the identification of performance measurements for incident responseAutomated infrastructure and application testing :
Provide cost effective, high quality, high performance and robust testing, automation and quality assurance practices and services, to meet the Club’s business needs delivery pipeline demands.Lead the design and implementation not the clubs automated testing capabilitiesClearly define and syndicate testing acceptance criteria for infrastructure and business functionsEnsure the alignment and consistency of non-production SAT & UAT environment for all high priority DR classified business function applicationsEnterprise (EA) and solution (SA) architecture :
Lead the definition and inclusion of resiliency requirements for hosting, platforms, and application High-availability architecturesDevelop the process for the management and documentation of exemption requests, including the Operational risk impact assessment, aligned with Information Security standardsPartner in the development of continuous improvement and maintenance initiatives led by the Chief Architect (CA)
Information Security and Risk assessment :
Partner with the Information Security Department, to ensure coverage and compliance of non-production test environmentsEstablish automated validation of the defined information security standards and data privacy controls for non-production and production environmentsProvide continuous monitoring and reporting of data centres compliancy performanceContinuous Improvement :
Identify and review industry trends and technology innovations that scale resilience and recovery capabilities for integration to the Clubs practicesDesign, implement and scale adoption of the IT operations and systems strategic vision for Agile, DevSecOpsIntroduce and build the Clubs SRE capabilities in partnership with business functions in the medium term.Operating charter :
Lead the documentation, cross-functional alignment with business function and publish the operational parameters of the IT Continuity and DR programmeClearly define “critical business operation” hours, and timeframes with published Service level objectivesLead the direction and management of key resourcing during “critical” operation windows for betting / wagering systemsCapability, training and awareness :
Identity and develop supporting IT capability training for High-availability, resilience and operational excellenceTrack and maintain expertise across the IT operations and Services function to ensure continuance of human resources with the requisite skills, business knowledge and an ongoing, high-performance cultureDevelop and delivery cross-functional continuity and recovery awareness trainingCompliance\Performance tracking and reporting
Develop and syndicate the OKR’s for the DR programme, including measurementsIdentify and link KPI’s for Milestones, environment compliance to standards and ongoing programme healthDesign and implement digital tracking dashboard capabilities for near real-time Recovery incidents, Root Cause AnalysisProvide transparent senior leadership visibility by design and implementing a programme level KPI tracking digital dashboardInclusivity, diversity and culture
Create and foster a culture that prioritises respect, accountability to build deep trust amongst peersDevelop and attract talent that is diverse and inclusive to ensure the Clubs access to outstanding talent.Serve as a role model to support cross team / division / department efforts and model collaborative behaviours. Inspire the team to bring forward ideas and solutions to empower the people to accelerate for business successAbout You
Tertiary qualifications in science, computer science or engineering with at least 15 years of working experience in an production IT environments10 years of experience in with “High-transaction” environments, with IT System testing, Operations support, Quality assurance and risk management responsibilities5 years of experience in either IT Disaster Recovery, IT continuity, or IT resilience functionsProven leadership experience and background in high availability and high performance of IT infrastructure services;Demonstrated ability to drive technical and organization change in IT teamsStrong stakeholder, and business acumen, analytical and decision-making skills, proven management experience with large-scale organisations;Proven experience in supporting a 24x7, mission critical environment.Post-graduate qualifications in a business or management related discipline an advantageSolid understanding of provisioning IT infrastructure, Cloud technology application design, coding practices, Mesh and Messaging and Operation support practicesSolid understanding of the ITIL and service management framework (Incident, Problem, Change, Asset, Configuration and Service Level Management);Solid understanding and hands-on experience in large scale project management and tools.Ability to communicate at all levels within the organisation;Proficiency in English language, in both spoken and written; Knowledge of Cantonese and Putonghua is an advantage.
Terms of Employment
The level of appointment will be commensurate with qualifications and experience.
Closing Date
Only shortlisted candidates will be notified.