The Impact of a Senior AI Engineer, NLP & Training Data at Coupa:Â
Coupa's AI platform uses a range of models, from classical ML classifiers to frontier LLM integrations, to power features across spend management. The Senior Engineer, NLP & Training Data will build the data factory that produces high-quality training datasets for our model development efforts. You will design pipelines that generate, label, validate, and curate the datasets used to improve model accuracy across Coupa's product suite.
What You’ll Do
- Design and implement training data generation pipelines, including synthetic data generation.
- Build data labeling and annotation workflows with quality validation loops.
- Convert enterprise data into formats suitable for model training (instruction-tuning pairs, embeddings).
- Implement active learning strategies to identify high-value training examples.
- Collaborate with domain experts to validate training data quality and relevance.
- Build automated data quality checks: coverage, balance, consistency.
- Design training data versioning and lineage tracking.
- Analyze model evaluation results to identify training data gaps.
What You Will Bring to Coupa
- 5+ years of software engineering experience, with 2+ years in NLP, data science, or ML data engineering.
- Experience with text processing, tokenization, and NLP pipelines.
- Hands-on experience with data labeling tools and annotation workflows.
- Experience generating synthetic training data using language model APIs.
- Understanding of instruction-tuning and training data quality metrics.
- Proficiency in Python (pandas, PySpark).
- Experience with data versioning tools is a plus.
- BS/MS in Computer Science, NLP, or equivalent experience.