Responsibilities
⢠Integrate data from multiple sources, such as databases, APIs, or streaming platforms, to provide a unified view of the data
⢠Implement data quality checks and validation processes to ensure the accuracy, completeness, and consistency of data
⢠Identify and resolve data quality issues, monitor data pipelines for errors, and implement data governance and data quality frameworks
⢠Enforce data security and compliance with relevant regulations and industry\-specific standards
⢠Implement data access controls, encryption mechanisms, and monitor data privacy and security risks
⢠Optimise data processing and query performance by tuning database configurations, implementing indexing strategies, and leveraging distributed computing frameworks
⢠Optimize data structures for efficient querying and develop data dictionaries and metadata repositories
⢠Identify and resolve performance bottlenecks in data pipelines and systems
⢠Collaborate with cross\-functional teams, including data scientists, analysts, and business stakeholders
⢠Document data pipelines, data schemas, and system configurations, making it easier for others to understand and work with the data infrastructure
⢠Monitor data pipelines, databases, and data infrastructure for errors, performance issues, and system failures
⢠Set up monitoring tools, alerts, and logging mechanisms to proactively identify and resolve issues to ensure the availability and reliability of data
⢠It would be a plus if he has software engineering background
Requirements
⢠Bachelor's or master's degree in computer science, information technology, data engineering, or a related field
⢠Strong knowledge of databases, data structures, algorithms
⢠Proficiency in working with data engineering tools and technologies including knowledge of data integration tools (e.g., Apache Kafka, Azure IoTHub, Azure EventHub), ETL/ELT frameworks (e.g., Apache Spark, Azure Synapse), big data platforms (e.g., Apache Hadoop), and cloud platforms (e.g., Amazon Web Services, Google Cloud Platform, Microsoft Azure)
⢠Expertise in working with relational databases (e.g., MySQL, PostgreSQL, Azure SQL, Azure Data Explorer) and data warehousing concepts.
⢠Familiarity with data modeling, schema design, indexing, and optimization techniques is valuable for building efficient and scalable data systems
⢠Proficiency in languages such as Python, SQL, KQL, Java, and Scala
⢠Experience with scripting languages like Bash or PowerShell for automation and system administration tasks
⢠Strong knowledge of data processing frameworks like Apache Spark, Apache Flink, or Apache Beam for efficiently handling large\-scale data processing and transformation tasks
⢠Understanding of data serialization formats (e.g., JSON, Avro, Parquet) and data serialization libraries (e.g., Apache Avro, Apache Parquet) is valuable
⢠Having experience in CI/CD and GitHub that demonstrates ability to work in a collaborative and iterative development environment
⢠Having experience in visualization tools (e.g. Power BI, Plotly, Grafana, Redash) is beneficial
Preferred Skills & Characteristics
Consistently display dynamic independent work habits, goal oriented, passionate in growth mindsets, possess a âcan doâ attitude, and self\-motivated professional. Self\-driven and proactive in keeping up with new technologies and programming