Senior PySpark Developer
Job Summary
We are looking for a Senior PySpark Developer with strong hands -on experience working on the Cloudera stack. The role involves building and maintaining large -scale data processing and ETL pipelines using PySpark. Experience in finance or banking environments is a plus.
Experience: 5+ years
Key Responsibilities
• Develop and maintain PySpark -based ETL pipelines on the Cloudera platform
• Write efficient PySpark and Spark SQL transformations
• Process large volumes of structured and semi -structured data
• Optimize Spark jobs for performance and scalability
• Create and manage Hive tables and partitions
• Handle data validation, reconciliation, and error handling
• Support production deployments and troubleshoot issues
• Work closely with data engineers and business teams
Primary Skill (Mandatory)
• Strong PySpark experience (5+ years)
o DataFrame API
o Spark SQL
o Performance tuning (joins, partitions, shuffles)
o Batch data processing
Required Skills
• Strong Python programming skills
• Good SQL knowledge
• Experience with large -scale data processing
• Git or similar version control tools
Preferred / Nice -to -Have Skills (Not Mandatory)
• Kafka for streaming or data ingestion
• Starburst for distributed SQL querying
• Oracle database integration or data extraction
• Workflow tools such as Airflow or Oozie
• Experience working with the Cloudera stack (HDFS, Hive, YARN, Spark)
Domain Experience (Plus)
• Finance or Banking domain experience
• Exposure to transactional, risk, or regulatory data
Education
• Bachelor’s degree in Computer Science or related field (or equivalent professional experience)