Job Responsibilities
- Architect, design, and optimize big data and AI/ML solutions on the Databricks platform.
- Develop and implement highly scalable ETL pipelines for processing large datasets.
- Lead the adoption of Apache Spark for distributed data processing and real-time analytics.
- Define and enforce data governance, security policies, and compliance standards.
- Optimize data lakehouse architectures for performance, scalability, and cost-efficiency.
- Collaborate with data scientists, analysts, and engineers to enable AI/ML-driven insights.
- Oversee and troubleshoot Databricks clusters, jobs, and performance bottlenecks.
- Automate data workflows using CI/CD pipelines and infrastructure-as-code practices.
- Ensure data integrity, quality, and reliability across all data processes.
Basic Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
- 8+ years of hands-on experience in data engineering, with at least 5+ years in Databricks Architect and Apache Spark.
- Proficiency in SQL, Python, or Scala for data processing and analytics.
- Extensive experience with cloud platforms (AWS, Azure, or GCP) for data engineering.
- Strong knowledge of ETL frameworks, data lakes, and Delta Lake architecture.
- Hands-on experience with CI/CD tools and DevOps best practices.
- Familiarity with data security, compliance, and governance best practices.
- Strong problem-solving and analytical skills in a fast-paced environment.
Preferred Qualifications:
- Databricks certifications (e.g., Databricks Certified Data Engineer, Spark Developer).
- Hands-on experience with MLflow, Feature Store, or Databricks SQL.
- Exposure to Kubernetes, Docker, and Terraform.
- Experience with streaming data architectures (Kafka, Kinesis, etc.).
- Strong understanding of business intelligence and reporting tools (Power BI, Tableau, Looker).
- Prior experience working with retail, e-commerce, or ad-tech data platforms.