What You'll Do:
- Own end-to-end delivery of significant data science projects — from problem scoping and approach design through to production deployment, with a focus on knowledge graph and identity solutions
- Make sound, independently-reasoned decisions on methodology, model selection, and evaluation; document them clearly in technical solution documents covering problem statement, approach, metrics, and timeline
- Lead solution design for your own initiatives; break down complex epics into well-scoped user stories with clear acceptance criteria, adopting DataOps and MLOps best practices throughout — experiment tracking, pipeline orchestration, model monitoring, and reproducibility
- Build production-quality Python and PySpark code on Databricks — well-tested, documented, and reusable — and implement advanced ML and AI-powered workflows including entity resolution, probabilistic record linkage, embedding-based matching, semantic similarity, and LLM-augmented pipelines
- Develop and maintain reusable tools, libraries, and documentation that improve team efficiency and technical standards; conduct code reviews with constructive, specific feedback that raises the bar
- Mentor junior data scientists on technical execution, code quality, and career development; lead internal talks or workshops on knowledge graphs, identity, or ML topics
- Collaborate cross-functionally with product, engineering, and operations — translate business requirements into technical specifications, partner with data engineering on scalable pipeline design, and participate in cross-functional design reviews and working groups
Who You Are:
- Bachelor's degree required in Statistics, Data Science, Computer Science, Mathematics or a related quantitative field; Master's strongly preferred
- 3–5 years of hands-on data science experience with demonstrated ability to own and deliver complex, multi-sprint projects independently
- Advanced Python with production-quality code, testing, and documentation; strong SQL and PySpark for billion-row datasets
- Databricks workflows, Delta Lake, and job orchestration; working knowledge of cloud platforms (AWS or GCP)
- Solid command of core ML — regression, classification, clustering, model evaluation, and experimental design — applied to complex, high-volume data
- Proficiency with MLOps practices: experiment tracking, pipeline orchestration (Airflow), and reproducible model deployment
- Exposure to modern AI methodologies: RAG systems, LLM-augmented models, vector databases, and semantic search
- Strong communicator — able to translate technical work into clear documentation, user stories, and cross-functional conversations
- Demonstrated ability to mentor junior data scientists and contribute to team standards
Preferred skills:
- Hands-on experience with knowledge graph construction, entity resolution, or semantic data modeling (RDF, OWL, SPARQL, or equivalent graph frameworks)
- Familiarity with probabilistic record linkage, identity graph approaches, or embedding-based entity matching at scale
- Experience with causal inference methods (A/B testing, synthetic control, uplift modeling)
- Experience with deduplication, enrichment, or web-to-TV linkage problems
- Background in media, ad tech, or measurement — TV viewership (ACR/STB data), digital audience modeling, cross-platform measurement (linear + CTV/OTT), or identity resolution in privacy-constrained environments
- Familiarity with the measurement and identity vendor landscape (Nielsen, Comscore, LiveRamp, The Trade Desk