AI Performance Engineer

Bosnia and Herzegovina, North Macedonia, Croatia, Hungary, Spain, Czech Republic, Guatemala, Peru, Colombia, Serbia (Hybrid)

This job is hidden from public view (Internal use). Only people with the direct job's link can apply

We are looking for a AI Performance Engineer to work on latest large AI model knowledge, deep learning performance optimization and benchmarking on modern GPU-based systems, with a strong focus on MLPerf Training and Inference workloads.

The primary models we work on include Llama 2, Llama 3, DeepSeek, and open-source GPT-style models (GPT-OSS).

This is a hands-on engineering role involving performance profiling, PyTorch optimization, large-scale distributed training, and building reproducible benchmarking environments, in close collaboration with other performance- and systems-focused engineers.

This role requires US working hours; Europe‑based candidates must start their shift no earlier than 2 PM CET, ideally after 4 PM CET to align with the West Coast.

What You Will Do

  • Optimize training and inference pipelines for large language models such as Llama 2, Llama 3, DeepSeek, and GPT-OSS
  • Work on MLPerf Training and/or Inference benchmarks for LLM workloads
  • Profile GPU workloads to identify compute, memory, and communication bottlenecks
  • Improve scaling efficiency across multi-GPU and multi-node setups
  • Tune distributed training strategies (DDP, FSDP, ZeRO, tensor/pipeline parallelism)
  • Build and maintain reproducible benchmark environments (Docker / Singularity)
  • Collaborate with engineers on performance, stability, and scalability improvements
  • Document findings and contribute to benchmark submissions and internal reports

What We Expect (Required)

  • 1-2 year of AI engineering knowledge / Deep Learning, GPU, or HPC-related roles
  • Strong Python skills and solid experience with PyTorch
  • Hands-on experience with LLM training or inference (Llama, GPT-style models, or similar)
  • Experience with distributed training (DDP, FSDP, ZeRO, DeepSpeed, or equivalent)
  • Good understanding of GPU performance fundamentals (compute vs memory, profiling, optimization)
  • Experience working in Linux-based environments
  • Familiarity with container technologies (Docker or similar)
  • Good level of spoken and written English

Nice to Have (Strong Plus)

  • Experience working with MLPerf or other standardized benchmarking frameworks
  • Exposure to LLM optimization techniques (activation checkpointing, KV-cache optimization, sequence parallelism)
  • Experience with GPU profiling tools (torch.profiler, Nsight, or equivalent)
  • Knowledge of GPU kernel optimization (CUDA, HIP, Triton, or similar)
  • Experience working with job schedulers (Slurm or equivalent)
  • Familiarity with quantization or mixed precision (FP16, BF16, FP8)

 

AI Performance Engineer

Job description

AI Performance Engineer

Personal information
Add