Project Manager - R01559594 at Brillio in Bangalore, Karnataka, India

Project Manager

Primary Skills

Project Delivery Management, Manage Fixed Price Delivery, Estimations and Metrics, Client Management, Communications Management, Agile, Agile Metrics and Reporting, Professional Scrum Master-CSM/ PSM1, Manage Outcome Based Delivery, Requirements Creation/User Stories, Digital Accumen, Agile Coaching, Change Management (Project Management), Backlog Grooming

Job requirements

Job Title: AI Agent Evaluation Engineer JD: We are seeking a highly motivated and technically proficient AI Agent Evaluation Engineer to join our growing AI team. This crucial role will be responsible for defining, developing, and executing robust Agent evaluation frameworks and test strategies, with a significant focus on Responsible AI and Safety Evals, for our agents built using the Google Agent Development Kit (ADK). The ideal candidate will bridge the gap between AI development and reliable deployment, ensuring our agents are safe, ethical, effective, and meet high-quality performance standards. The role will be of 70% Automation and 30% Manual Testing Key Responsibilities ● Evaluation (Evals) Development: ○ Develop synthetic testing environments and simulation strategies to stress-test agents under various real-world conditions. ○ Design, implement, and maintain scalable and repeatable evaluation datasets and metrics to test agent performance, robustness, safety, and alignment (e.g., faithfulness, hallucination, prompt injection). ○ Specifically focus on building Evals for agents utilizing the Google Agent Development Kit (ADK) and related Google AI/ML services (e.g., Vertex AI, Gemini models). ● Responsible AI and Safety Evals (New Focus): ○ Develop and execute adversarial testing, jailbreaking, and red-teaming methodologies to identify potential harm, bias, toxicity, and unauthorized behavior in agent responses. ○ Implement and measure adherence to established ethical guidelines, safety policies, and content filtering mechanisms. ○ Work with policy and legal teams to ensure agent evaluations cover regulatory compliance and fairness objectives. ● Test Strategy & Execution: ○ Define comprehensive QA strategies, including functional, integration, regression, and user acceptance testing (UAT) specifically for conversational and goal-oriented AI agents. ○ Develop and execute detailed Test artefacts such as test plans,test cases, test Scenarios for agent features, tool use, memory, and reasoning capabilities. ● Bug Detection & Management: ○ Identify, document, prioritize, and track bugs using Jira, performance degradations, and alignment failures in agent behavior. ○ Collaborate closely with AI/ML Engineers and Researchers to analyze root causes and validate fixes. ● Automation & Tools: ○ Integrate evaluation pipelines into the CI/CD process to enable continuous quality assurance and fast iteration cycles. ● Reporting & Insights: ○ Analyze and interpret evaluation results, providing clear, actionable insights and quality reports to stakeholders and development teams, with a specific focus on safety metrics and risk mitigation. Required Skills & Qualifications ● Experience: 6+ years in Software QA, with at least 2 years focused on testing or evaluating AI/ML systems, conversational agents, or Large Language Models (LLMs). ● Safety Evals Expertise (Mandatory): Direct experience in designing and executing safety evaluations (red teaming, adversarial testing), bias detection, and measuring toxicity/harmful content in generative AI models. ● Agent/LLM Evals: Proven experience developing and running general evaluations (Evals) for LLM-powered applications knowing libraries like PyTest (Must) ● Google ADK Familiarity (Mandatory): Direct or strong conceptual understanding of the Google Agent Development Kit (ADK) and its components. ● Programming: Strong proficiency in Python is mandatory for script development, data processing, and automation. ● Cloud & MLOps: Familiarity with Google Cloud Platform (GCP) services relevant to AI/ML (e.g., Vertex AI) and integrating testing into MLOps workflows. ● Tools and Libraries: Langsmith, DeepEval, Ragas, Giskard, Hugging face.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Project Manager - R01559594

Job Description - Project Manager - R01559594

Primary Skills

Job requirements

Similar Project Manager Jobs in India

Mobile Apps