Evaluation & Data
- Build and run evaluation pipelines for LLM-based systems
- Analyze metrics, identify edge cases, and track quality
- Create and maintain Golden and Silver datasets
- Work with evaluation frameworks and tools (like Langfuse)
- Develop Python scripts for AI workflows (APIs, data processing, monitoring)
- Support maintenance and improvement of existing systems
- Work closely with engineering teams to improve system quality
- Contribute proactively to technical challenges