AI Performance Engineer

Bosnia and Herzegovina, North Macedonia, Croatia, Hungary, Spain, Czech Republic, Guatemala, Peru, Colombia, Serbia (Hybrid)

We are looking for a AI Performance Engineer to work on latest large AI model knowledge, deep learning performance optimization and benchmarking on modern GPU-based systems, with a strong focus on MLPerf Training and Inference workloads.

The primary models we work on include Llama 2, Llama 3, DeepSeek, and open-source GPT-style models (GPT-OSS).

This is a hands-on engineering role involving performance profiling, PyTorch optimization, large-scale distributed training, and building reproducible benchmarking environments, in close collaboration with other performance- and systems-focused engineers.

This role requires US working hours; Europe‑based candidates must start their shift no earlier than 2 PM CET, ideally after 4 PM CET to align with the West Coast.

What You Will Do

  • Optimize training and inference pipelines for large language models such as Llama 2, Llama 3, DeepSeek, and GPT-OSS
  • Work on MLPerf Training and/or Inference benchmarks for LLM workloads
  • Profile GPU workloads to identify compute, memory, and communication bottlenecks
  • Improve scaling efficiency across multi-GPU and multi-node setups
  • Tune distributed training strategies (DDP, FSDP, ZeRO, tensor/pipeline parallelism)
  • Build and maintain reproducible benchmark environments (Docker / Singularity)
  • Collaborate with engineers on performance, stability, and scalability improvements
  • Document findings and contribute to benchmark submissions and internal reports

What We Expect (Required)

  • 1-2 year of AI engineering knowledge / Deep Learning, GPU, or HPC-related roles
  • Strong Python skills and solid experience with PyTorch
  • Hands-on experience with LLM training or inference (Llama, GPT-style models, or similar)
  • Experience with distributed training (DDP, FSDP, ZeRO, DeepSpeed, or equivalent)
  • Good understanding of GPU performance fundamentals (compute vs memory, profiling, optimization)
  • Experience working in Linux-based environments
  • Familiarity with container technologies (Docker or similar)
  • Good level of spoken and written English

Nice to Have (Strong Plus)

  • Experience working with MLPerf or other standardized benchmarking frameworks
  • Exposure to LLM optimization techniques (activation checkpointing, KV-cache optimization, sequence parallelism)
  • Experience with GPU profiling tools (torch.profiler, Nsight, or equivalent)
  • Knowledge of GPU kernel optimization (CUDA, HIP, Triton, or similar)
  • Experience working with job schedulers (Slurm or equivalent)
  • Familiarity with quantization or mixed precision (FP16, BF16, FP8)

 

AI Performance Engineer

Job description

AI Performance Engineer

Personal information
Add