Data Engineer (PySpark)

Company : Gsstech Group

Job Description - Data Engineer (PySpark)

We are seeking a highly skilled Data Engineer with strong expertise in PySpark and the Cloudera Data Platform (CDP). The ideal candidate will design, develop, and maintain scalable data pipelines while ensuring high data quality, performance, and availability across the organisation.

This role requires hands-on experience in big data ecosystems, cloud-native technologies, and advanced data processing frameworks. You will collaborate with cross-functional teams to build reliable and high-performance data solutions that drive business insights.

Key Responsibilities

1. Data Pipeline Development

Design, develop, and maintain scalable ETL/ELT pipelines using PySpark on CDP
Ensure data integrity, reliability, and performance optimisation

2. Data Ingestion

Develop ingestion frameworks to collect data from relational databases, APIs, streaming sources, and file systems
Load structured and unstructured data into Data Lake/Data Warehouse environments

3. Data Transformation & Processing

Process, cleanse, and transform large-scale datasets using PySpark
Build reusable data processing components

4. Performance Optimisation

Tune Spark jobs and Cloudera components for optimal performance
Optimise memory, partitioning, and execution plans
Reduce ETL runtime and improve cluster efficiency

5. Data Quality & Validation

Implement data validation checks and monitoring mechanisms
Ensure end-to-end data quality and governance standards

6. Automation & Orchestration

Automate workflows using tools such as Apache Oozie, Apache Airflow, or similar orchestration frameworks
Maintain CI/CD integration for data pipelines

7. Monitoring & Support

Monitor pipeline health and troubleshoot failures
Provide production support and continuous improvements

Required Skills & Qualifications

5+ years of experience in Data Engineering
Strong hands-on experience in PySpark
Experience working on Cloudera Data Platform (CDP)
Strong knowledge of Hadoop ecosystem (HDFS, Hive, Impala, YARN)
Proficiency in SQL and data modelling concepts
Experience with workflow orchestration tools (Airflow, Oozie, etc.)
Good understanding of data warehousing concepts
Experience with performance tuning and optimisation

Good to Have

Experience with cloud platforms (AWS, Azure, GCP)
Knowledge of streaming tools (Kafka, Spark Streaming)
Exposure to DevOps practices and CI/CD pipelines
Banking/Financial Services domain experience

Original job Data Engineer (PySpark) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Share Job

Get your Resume Reviewed for Free

Similar Data Engineer Jobs in the UAE

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip