Partner with infrastructure and multidisciplinary product and research teams to help them innovate and ship fast. And plan lifecycle of hardware and licenses.…
Champion security best practices including secrets management and access controls. Upon request, Qualcomm will provide reasonable accommodations to support……
This is not a traditional DevOps or platform engineering role - it sits at the intersection of AI tooling, developer productivity, internal DevRel, metrics,……
Stage 2: Take home test. The role consists of working in a team that owns critical batch payment processing capabilities and is focused on maintaining high……
Champion and implement best practice solutions for reliable, performant and observable SaaS products. Collaborate with the product and engineering teams on the……
We are looking for a skilled engineer who brings a mix of operations and networking expertise and shares our passion to change the way our customers operate.…
Embed security into pipelines and infrastructure, including secrets, certificates, access control, compliance automation and remediation of infrastructure……
Work from anywhere in the UK or within GMT +/-2 hours time zone; we're a remote-first company; Act as a force multiplier to our engineering team to allow them……
DevOps practices within Google Cloud, driving operational excellence and system stability. 3–5+ years' experience in Site Reliability, DevOps, or Cloud……
Automate provisioning, configuration, scaling, observability, and secure workload access across AWS services. Strong Linux system administration (file systems,……
Whether it's helping customers build and maintain infrastructure, design and construct buildings, optimize global supply chains or map the world, Trimble is at……
Comfortable working with security, architecture, product and operational teams. You'll benefit from fair rewards, flexible working, wellbeing resources and real……
This person will be responsible for delivering key SDLC capabilities that enable a successful DevOps platform, deploying product updates through automation,……
Thrives in fast‑paced, iterative, agile environments. As a DevOps Engineer, you will play a key role in designing, building, and maintaining the automation,……
Join the out-of-hours on-call Rota (typically once every 3-4 weeks) once suitably skilled and appropriately trained. Experience with Docker and Kubernetes.…
Enjoy private healthcare, gym discounts, wellbeing programs and mental health support. This includes solid Python proficiency and deep knowledge of the Python……
Thrives in fast‑paced, iterative, agile environments. As a DevOps Engineer, you will play a key role in designing, building, and maintaining the automation,……
You may be assessed on key critical skills relevant to success in this role, such as risk and controls, change and transformation, business acumen, strategic……
We work as partners across engineering and product teams. LSEG offers a range of tailored benefits and support, including healthcare, retirement planning, paid……
This role sits at the intersection of strong software engineering and deep AI expertise, with a focus on turning state-of-the-art AI capabilities into robust,……
Experience in site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration.…
In this role, you’ll report to the Director of DevOps – International and work closely with software engineers, product managers, and support teams to evolve……
The DevOps Engineer would work with other DevOps and Platform members to ensure that changes made within their remit don’t affect the underlying platform or……
Understanding of security best practices, access control, and compliance requirements. Strong Linux system administration and networking fundamentals.…
Building / Maintaining relationships - To be effective at removing blockers, relationships must be built and maintained with other delivery teams, product, BAE……
TechInsights’ customers include the most successful technology companies who rely on TechInsights’ analysis to make informed business, design, and product……
Learning and participating in SRE activities: Training to eventually join the SRE on-call rotation, which is part of the group ensuring Yapily stays up, come……
Use AI to find out how well the skills on your resume fit this job description.
Job Description:
Own build, deploy, and runtime reliability across BRAHMA AI’s hybrid estate. Deliver secure, scalable infrastructure for Gen AI based workflows and products across hybrid environments. Partner with infrastructure and multidisciplinary product and research teams to help them innovate and ship fast.
We are hiring remotely across the EMEA region.
Key Responsibilities
Design, implement, and operate Slurm and Kubernetes-based platforms across cloud and on-prem GPU nodes, including autoscaling, rollout strategies, and multi-cluster operations.
Build CI/CD pipelines for services, model training, and model serving; standardise artifact/version management and environment promotion.
Implement Infrastructure as Code with Terraform/Terragrunt and configuration management; enforce drift detection and repeatable environments.
Design and implement observability stacks (metrics, logs, tracing); drive incident response and postmortems.
Secure the stack with least privilege, secrets management, network policy, and hardened baselines; support ISO/MPA controls with the security team.
Operate model-serving infrastructure for real-time and batch workloads; optimise GPU utilisation, concurrency, and latency.
Drive cost visibility and efficiency across compute, storage, and egress; forecast capacity
and plan lifecycle of hardware and licenses.
Must Haves
6+ years in DevOps/SRE/Platform roles running production systems.
Expert with Kubernetes and containers (runtime, scheduling, networking, autoscaling).
Strong with Terraform and at least one configuration management tool (Ansible