Back to careers

Senior Network Engineer, AI Infrastructure

ROLES AND RESPONSIBILITIES

Firmus Technologies is seeking a skilled Senior Network Engineer specialising in AI networks to join our Cloud Architecture and Software Defined Infrastructure team.

The ideal candidate will play a crucial role in network design, configuration, and deployment for AI infrastructure projects. This role offers an exciting opportunity to work at the forefront of AI networking technology and contribute to the growth of AI infrastructure.

  • Primary responsibilities will include design and building bespoke AI infrastructure for new and existing customers.
  • Support operational and reliability aspects of large-scale AI clusters with a focus on performance at scale, real-time monitoring, logging, and alerting.
  • Provide specialist network engineering support to ensure optimal operation of network software and hardware.
  • Develop high quality automation and scripts to operate network infrastructure at scale.
  • Engage in and improve the whole lifecycle of services – from inception and design through deployment, operation, and refinement.
  • Improve internal tooling by identifying automation opportunities to drive speed and scale in our capabilities.
  • Be the subject matter expert for networking-related escalations.
  • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

 

SKILLS AND EXPERIENCE

  • B.Sc in Computer Science/Electrical/Mechanical Engineering or equivalent experience.
  • Hands-on experience in solving problems in large-scale RDMA over Converged Ethernet (RoCE) or InfiniBand network environments.
  • Strong hands-on experience in Linux-based platforms.
  • In-depth knowledge of network protocols and tools and management of security measures for network infrastructure.
  • Familiarity with data path hardware acceleration protocols and interfaces such as RDMA, RoCE, InfiniBand etc.
  • Familiarity with Infrastructure as Code practices. Experience in developing IaC to support automation.
  • Experience in using network automation tools such as Terraform, Ansible, Puppet, and Python scripts.
  • Familiarity with Linux networking, using device API and firewall policy management.
  • Experience with switching and routing network protocols.
  • Fast and independent self-learner with outstanding technical skills.
  • Driven and focused on customer needs and satisfaction.
  • Self-motivated with excellent leadership skills.
  • Strong written, verbal, and listening skills are essential.

 

KEY COMPETENCIES

  • CCIE or equivalent networking certifications and certification in Linux systems.
  • 5+ years of experience with AI, HPC, or parallel network architectures.
  • Proficiency in Infrastructure as Code (IaC) tools (e.g. Ansible, Netbox, Python scripts).
  • Understanding of how MPI, RDMA, and NCCL works, as well as an understanding of how job schedules (SLRUM, PBS) work.
  • Proven knowledge of Python or Bash.
  • Professional Services/Infrastructure Specialists delivery experience.

 

LOCATION

Singapore

 

EMPLOYMENT BASIS

Full-Time

 

How to apply for this position

If you think you’re the right for us, we’d love to hear from you!

Please click the link below to apply for this role.

Location

Singapore

Employment basis

Full time

Apply now