Designation: BIG DATA ENGINEERJob Description:Your Role and Responsibilities:- Understand a data warehousing solution and able to work independently in such an environment- Responsible in Project development and delivery experience of a few good size projects- Design, build, optimize and support new and existing data models and ETL processes based on our clients business requirements.- Build, deploy and manage data infrastructure that can adequately handle the needs of a rapidly growing data driven organization.- Coordinate data access and security to enable data scientists and analysts to easily access to data whenever they need too.- Experiencing developing scalable Big Data applications or solutions on distributed platforms- Able to partner with others in solving complex problems by taking a broad perspective to identify innovative solutions- Strong skills building positive relationships across Product and Engineering.- Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders- Able to quickly pick up new programming languages, technologies, and frameworks- Experience working in Agile and Scrum development process- Experience working in a fast-paced, results-oriented environment- Experience in Amazon Web Services (AWS) or other cloud platform tools- Experience working with Data warehousing tools, including Dynamo DB, SQL, Amazon Redshift, and Snowflake- Experience architecting data product in Streaming, Serverless and Microservices Architecture and platform.- Experience working with Data platforms, including EMR, Data Bricks etc- Experience working with distributed technology tools, including Spark, Presto, Scala, Python, Databricks, Airflow- Developed the Pysprk code for AWS Glue jobs and for EMR.. Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR, MapR distribution..- Developed Python and pyspark programs for data analysis.. Good working experience with python to develop Custom Framework for generating of rules (just like rules engine).- Developed Hadoop streaming Jobs using python for integrating python API supported applications..- Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark. Apache Spark DataFrames/RDD's were used to apply business transformations and utilized Hive Context objects to perform read/write operations..- Re - write some Hive queries to Spark SQL to reduce the overall batch timeRequired Technical and Professional Expertise:- First and most important- Sound understanding of data structures & SQL concepts and experience in writing complex SQL especially around OLAP systems- Sound knowledge of the ETL tool like informatica, 5+ years of experience, Big Data technologies'' like Hadoop ecosystem, its various components, along with different tools including Spark, Hive, Sqoop,etc..- In-depth knowledge of MPP/distributed systemsPreferred Technical and Professional Expertise:- The ability to write precise, scalable, and high-performance code- The ability to write precise, scalable, and high-performance code- Knowledge/Exposure in data modeling with OLAP (Optional) (ref:hirist.tech)
Share this job with your friends
Copyright © 2024 Grabjobs Pte.Ltd. All Rights Reserved.