The Senior / Principal Data Engineer will design, develop, and maintain scalable data platform and analytics solutions across data lakes and operational databases. This role requires hands-on expertise in Azure Databricks, Azure SQL, Python/PySpark, Notebooks, with a strong understanding of data modeling, ETL/ELT best practices, and CI/CD automation in Azure DevOps. The ideal candidate will have a proven record of building robust, efficient, and secure data pipelines that enable analytics, reporting, and AI/ML solutions, preferably in life sciences, clinical research, or healthcare domains.
Data Architecture & Engineering
- Design and implement end-to-end data pipelines using Azure Databricks, Azure Data Factory, and ADLS Gen2.
- Build scalable and performant data models for data lakes (Medallion architecture), data warehouses, and operational systems.
- Develop ELT/ETL frameworks for ingestion from APIs, relational sources, flat files, and third-party systems (e.g., Dynamics 365, Veeva, EDC).
- Optimize data transformations, partitioning, and delta lake performance for analytics workloads.
Data Integration & Automation
- Leverage Python and PySpark for data ingestion, cleansing, enrichment, and advanced transformations.
- Implement CI/CD pipelines for data workflows using Azure DevOps and Git, including automated testing, deployment, and monitoring.
- Develop and integrate RESTful APIs for cross-system data exchange and automation.
Analytics Enablement
- Collaborate with the BI team to ensure clean, high-quality, and accessible data for the Power BI platform.
- Support semantic modeling, metric layer design, and data governance best practices.
- Enable advanced analytics by provisioning data for ML/AI initiatives and predictive insights.
Cross Functional Collaboration
- Collaborate with product/system owners, analysts, and business stakeholders to translate analytical requirements into technical data solutions.
- Drive best practices in Agile development, version control, and DevOps workflows.
Education Requirements and Qualifications
Qualifications
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or related field (Master’s preferred).
- Minimum 5–8 years of relevant experience building and maintaining data solutions (data lakes, data warehouses, operational databases).
- Expert-level proficiency in Azure Databricks, PySpark, SQL, and Azure DevOps.
- Proven experience with Azure Data Factory, ADLS Gen2, and Azure SQL Server.
- Working knowledge of CI/CD automation, version control (Git), and infrastructure as code (ARM or Terraform).
- Experience with Power BI or similar analytics platforms (Tableau, Looker) required; experience with Snowflake, Redshift, or Synapse Analytics is a plus.
- Strong analytical, debugging, and performance-tuning skills.
- Experience in life sciences or healthcare industries is a strong plus.
Skills
Core expertise: Expert-level in Databricks, PySpark, SQL, and Azure DevOps
Data engineering: Data modeling, Delta Lake optimization, ETL/ELT design, distributed processing.
Integration & Automation: Azure Data Factory, REST APIs, CI/CD pipelines, Git branching strategies.
Analytics & BI: Power BI (Tableau), semantic layer design, DAX/SQL tuning.
Cloud & DevOps: Azure ecosystem (ADF, ADLS, Azure SQL, Synapse), Infrastructure as Code
Data Governance & Quality: Metadata management, data validation frameworks, logging and monitoring.
Soft skills: Good communication, mentoring, Agile teamwork, analytical thinking, collaboration.