Logo-of-Maincode-hiring-for-jobs-in-Australia-on-GrabJobs

AI Software Engineer (Model Training)

icon building Company : Maincode
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - AI Software Engineer (Model Training)

About the role

Maincode is training Matilda, the first large language model built and trained from scratch in Australia. Our new compute cluster is live, and we are now scaling the next version.

This role sits directly inside that training stack. You will build the pipelines, infrastructure, and tooling that determine how efficiently Matilda trains, how stable long runs are, and how fast new experiments can be executed. Training runs last days or weeks. Small changes propagate through complex systems. The work requires precision and patience.

We build AI systems from first principles: designing the architectures, running the infrastructure, shaping the training process, and operating the models ourselves. Matilda is not a research prototype. It is a production system, trained at scale and served for open public access.

Maincode operates one of the largest private AI compute environments in Australia, built for a single purpose: training our own models. This is not a role that wraps external APIs or ships user-facing features. You will be working on the systems that train a large language model from scratch.

What you would actually do

You will build and maintain the systems that support large scale model training.

This includes:

  • Designing and maintaining distributed training pipelines for large language models

  • Building data ingestion and preprocessing systems for large training datasets

  • Developing tooling for experiment management, checkpointing, and reproducibility

  • Monitoring and debugging long running training jobs across clusters

  • Improving reliability and observability across the training stack

  • Optimising training throughput across compute, memory, and data pipelines

  • Working closely with researchers to translate experimental ideas into training runs

  • Diagnosing failures across infrastructure, training loops, and data pipelines

Training runs can last days or weeks. Small changes propagate through complex systems.

You will spend time inside code, logs, dashboards, and experiment outputs. The goal is simple: make large scale training reliable.

The kind of person who does well here

We are looking for engineers early in their careers who want to understand how large models are actually trained.

You may have one or two years of experience building production software. What matters most is curiosity and the willingness to learn how these systems behave under load.

People who tend to do well here:

  • Care about how systems behave over long runtimes

  • Enjoy debugging complex distributed systems

  • Pay attention to logs, metrics, and system behaviour

  • Prefer understanding how a system works rather than relying on abstraction

  • Are comfortable working close to infrastructure

  • Have the patience to diagnose failures that appear hours into a run

  • Want to learn how large scale AI training actually happens

You do not need prior experience training large language models. What matters is intellectual curiosity, persistence, and the ability to learn quickly.

How you would work

You will write production code that sits directly in the training stack.

You should be comfortable:

  • Working in Python

  • Using machine learning frameworks such as PyTorch or JAX

  • Writing reliable infrastructure for large compute workloads

  • Debugging distributed systems and long running jobs

  • Collaborating closely with researchers and infrastructure engineers

Much of the work sits between research and infrastructure. Ideas move quickly, but the systems that support them must remain stable.

What this role is not

  • It is not primarily about building user facing applications

  • It is not about prompt engineering

  • It is not about wrapping external APIs or third party models

You will be working on the systems that train our own models from scratch.

Why Maincode

Maincode builds AI systems end to end. We prepare the data, design the training process, run the infrastructure, and operate the models ourselves.

You will work with a small team that:

  • Builds the full AI stack rather than outsourcing it

  • Treats infrastructure as part of the intelligence system itself

  • Values engineers who want to understand how things actually work

  • Is building long term capability in training and operating large models

If you want to work directly on the systems that train large language models from scratch, this is the only role in Australia that will put you inside that work.

Note

This is a full time role based in Melbourne, working closely with our in person engineering and research team. At this time we are not able to offer visa sponsorship, so applicants must have existing and unrestricted work rights in Australia.

Original job AI Software Engineer (Model Training) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to AI Software Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar AI Software Engineer Jobs in Australia

GrabJobs is the no1 job portal in Australia, connecting you to thousands of jobs fast! Find the best jobs in Australia, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.