Senior Machine Learning Engineer, Synthetic Data & Document Understanding

Company : Abbyy

Job Type : Full Time

Bengaluru, India

Job Description - Senior Machine Learning Engineer, Synthetic Data & Document Understanding

Join ABBYY and be part of a team that celebrates your unique work style. With flexible work options, a supportive team, and rewards that reflect your value, you can focus on what matters most – driving your growth, while fueling ours.

Our commitment to respect, transparency, and simplicity means you can trust us to always choose to do the right thing.

As a trusted partner for purpose-built AI and intelligent automation, we solve highly complex problems for our enterprise customers and put their information to work to transform the way they do business. Over 10,000 customers trust ABBYY, including many Fortune 500 ones. You will work on further developing a portfolio already containing client names such as DHL, Johnson & Johnson, FDA, DMV, PwC, KeyBank, Spotify, and H&R BLOCK.

About the Role

We are seeking a Senior Machine Learning Engineer – Synthetic Data & Document Understanding to own the synthetic data generation track within ABBYY’s Document AI Data team.

This role focuses on building generative pipelines that produce high-quality, diverse, and realistic synthetic training data at scale. You will ensure synthetic data meaningfully improves downstream model performance by maintaining strong alignment with real-world document structures, formats, and statistical properties.

This is an ideal role for engineers who combine deep generative modeling expertise with rigorous data quality evaluation and production engineering skills.

Key Responsibilities

Technical Development & Innovation

Design and implement pipelines that analyze real documents to inform high-fidelity synthetic data generation

Build generative systems capable of producing documents across diverse formats, layouts, and domains

Develop evaluation frameworks to ensure synthetic data maintains distributional fidelity and diversity

Research and apply generative modeling techniques suited for document AI training

Identify and mitigate quality issues to ensure synthetic data is effective for downstream model training

Partner with Modeling teams to measure the impact of synthetic data on model performance

Project Ownership & Leadership

Own the synthetic data generation track end-to-end, from architecture to quality validation

Drive architectural decisions balancing quality, diversity, scale, and cost efficiency

Define and maintain data quality metrics and generation dashboards

Collaborate closely with annotation teams to ensure compatibility with downstream pipelines

Contribute to roadmap planning alongside Principal-level leadership

Infrastructure & Scale

Build scalable pipelines capable of generating millions of synthetic training examples

Implement post-processing, filtering, and validation mechanisms to remove low-quality outputs

Design cost-efficient workflows balancing compute, quality, and throughput

Develop monitoring systems to detect distribution shifts or quality degradation over time

Collaborate with Platform teams on compute orchestration, storage, and scheduling

Qualifications

Education & Experience

MS or PhD in Computer Science, Engineering, Mathematics, or related field

5+ years of experience in Machine Learning / AI, with focus on:

Generative models

Vision-Language Models (VLMs)

Synthetic data systems

Proven experience building and evaluating synthetic data pipelines for ML training

Strong background in data quality evaluation and statistical analysis

Technical Expertise

Deep expertise in Vision-Language Models and document understanding (layout, structure, semantics)

Strong knowledge of generative modeling for structured and semi-structured data

Understanding of what makes synthetic data valuable:

Distributional fidelity

Diversity

Realistic noise patterns

Domain coverage

Strong programming skills in Python with experience in PyTorch or similar frameworks

Experience evaluating data quality via automated metrics and downstream model impact

Familiarity with large-scale data pipelines, cloud environments, and experiment tracking

Leadership & Communication

Proven ability to independently own complex technical workstreams

Strong collaboration across data, modeling, and platform teams

Ability to clearly communicate data quality and generation trade-offs

Data-driven mindset with strong attention to coverage gaps and quality signals

Here are some of our local benefits:

Comprehensive medical, accidental, and life insurance

Weekly wellness sessions to support your physical and mental well-being

A generous paid time off policy

Join ABBYY, and you will:

Love how you work

We provide remote and hybrid working options to fit all lifestyles.

We use flexible hours across most of our teams to allow you to find your own definition of balance.

Encouraging a culture of giving, we provide two paid volunteering days off every year so you can take time to contribute to the causes you care about.

To ensure your family is cared for, we offer paid parental leave in all our locations.

Love whom you work with

We are a global team of 600+ colleagues, spread across 15 countries on four continents.

With colleagues representing 30+ nationalities, our workforce reflects the world.

Innovation and excellence run through our veins. Our teams gather the expertise which has garnered ABBYY more than 140 technology patents.

We are guided by the values of respect, transparency, and simplicity.

"Team Environment" is in the top three highest-scoring drivers of engagement across all of our departments.

Love what you work on

We are a company with more than 35 years of experience in the technology market;

Over 10,000 customers trust ABBYY, including many Fortune 500 ones, with names such as DHL, Johnson & Johnson, FDA, DMV, PwC, KeyBank, Spotify, and H&R BLOCK;

We have modernized the capture market by creating the first low-code/no-code IDP platform.

Our Machine Learning, Natural Language Processing, Computer Vision Technologies, and a marketplace built with AI, can transform any document in any process;

Top Analyst firms recognize ABBYY's market leadership, including Gartner, Everest PEAK Matrix ® Assessment, ISG Intelligent Automation Lens, and NelsonHall, amongst others.

ABBYY is an Equal Employment Opportunity employer that values the strength that diversity brings to the workplace. To learn more about our commitment to Diversity and Inclusion, check out the careers section on our website.

Original job Senior Machine Learning Engineer, Synthetic Data & Document Understanding posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Share Job

Get your Resume Reviewed for Free

Similar Senior Machine Learning Engineer Jobs in India

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Senior Machine Learning Engineer, Synthetic Data & Document Understanding

Job Description - Senior Machine Learning Engineer, Synthetic Data & Document Understanding

Similar Senior Machine Learning Engineer Jobs in India

Mobile Apps