Shape the Future with Dun & Bradstreet

At Dun & Bradstreet, we believe data has the power to create a better tomorrow. As a global leader in business decisioning data and analytics, we help companies worldwide grow, manage risk, and innovate. For over 180 years, businesses have trusted us to turn uncertainty into opportunity. We’re a diverse, global team that values creativity, collaboration, and bold ideas. Are you ready to make an impact and help shape what’s next? Join us! Explore opportunities at dnb.com/careers.

We are seeking an experienced Principal Data Engineer to help build the next generation of our identity graph and data platform. This role is focused on designing, developing, and optimizing large-scale data pipelines and systems that ingest, process, and unify complex datasets from diverse sources (web, mobile, AdTech, government, and proprietary data). This is a highly hands-on, technical role for someone who can quickly understand existing systems, operate independently, and deliver high-quality solutions at scale. The ideal candidate is deeply analytical, detail-oriented, and experienced with building performant data pipelines and systems handling billions of records.

Key Responsibilities:

Design, build, and optimize scalable data pipelines and ETL/ELT workflows for large, complex datasets.

Design and implement foundational data architecture supporting identity resolution and ID graph systems.

Develop and enhance systems supporting identity resolution and ID graph construction (data ingestion, normalization, matching, and deduplication).

Process and unify multi-source datasets (cookies, device IDs, behavioral data, third-party and proprietary data).

Write efficient, testable, and maintainable code using Python and SQL for large-scale data processing.

Optimize data models, queries, and storage strategies for performance, scalability, and cost efficiency.

Build and maintain data validation, monitoring, and alerting systems to ensure data quality and reliability.

Troubleshoot, debug, and improve existing data pipelines and infrastructure.

Own and drive complex data problems end-to-end, from initial design through production deployment.

Make and influence key technical decisions related to data architecture, scalability, and system design.

Collaborate with data, platform, DevOps, and product teams to deliver scalable, production-ready solutions.

Translate business and product requirements into practical, performant data solutions.

Document data pipelines, systems, and workflows clearly.

Continuously improve system performance, data quality, and pipeline resilience.

Contribute to building new capabilities that improve how customers understand and leverage data insights

Key Skills:

8-12+ years of hands-on experience in data engineering or large-scale data processing.

Proven experience building and maintaining production-grade data pipelines and distributed systems.

Demonstrated experience architecting and delivering large-scale data platforms or mission-critical data systems.

Strong expertise in: SQL and relational databases (Postgres, BigQuery, Redshift, etc.), Python for data processing and analysis.

Experience with Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Storage, Cloud Functions) and/or AWS (S3, Redshift, EMR, RDS).

Experience working with large-scale datasets (hundreds of millions to billions of records).

Strong understanding of data modeling, partitioning, indexing, and query optimization.

Experience with distributed data processing and parallelization techniques.

Experience moving large volumes of data across systems and architectures.

Familiarity with CI/CD, containerization, and orchestration tools (Docker, Kubernetes, GitHub Actions, etc.).

Strong debugging and troubleshooting skills in complex data environments.

Experience with version control (Git) and Agile tools (Jira, Confluence, etc.).

Highly analytical with strong attention to detail and a data-driven mindset.

Ability to hit the ground running, quickly understand systems, and deliver independently.

Comfortable working in a remote, fast-paced, and collaborative environment.

Proven ability to drive system design and implementation.

Preferred:

Experience with identity graphs, entity resolution, or record linkage systems.

Background in AdTech, digital identity, cookies, or audience data platforms.

Experience with real-time or streaming data systems.

Familiarity with data quality, observability, and monitoring frameworks.

Experience with data visualization tools (Looker, Tableau, Power BI).

Knowledge of data privacy, compliance, and governance considerations.

Experience with modern data platforms such as Snowflake and Databricks.

Exposure to AI/ML technologies, including experience working with or integrating agentic frameworks.

All Dun & Bradstreet job postings can be found at https://jobs.lever.co/dnb. Official communication from Dun & Bradstreet will come from an email address ending in @dnb.com.

Notice to Applicants: Please be advised that this job posting page is hosted and powered by Lever, a subsidiary of Employ Inc. Your use of this page is subject to Employ's Privacy Notice and Cookie Policy, which governs the processing of visitor data on this platform.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please visit https://bit.ly/3LMn4CQ.

Principal Data Engineer (R-19440)

Job Description - Principal Data Engineer (R-19440)

Key Responsibilities:

Key Skills:

About the Company

Similar Principal Data Engineer Jobs in India

Mobile Apps