You will work closely with engineers, data scientists, and researchers to ensure a reliable, scalable, and efficient data infrastructure that powers advanced analytics and modeling workflows. The primary focus of the team is driving and developing systems and services that enable Quantcast to scale its distributed storage and compute platform. The ideal candidate has a passion for large-scale distributed systems (e.g., HDFS and Spark).
What you'll do:
- Contribute to the development and optimization of large-scale data workflows using technologies such as Apache Spark or similar frameworks
- Debug and resolve issues in distributed environments, including data inconsistencies and job failures
- Maintain and enhance the services that support the distributed storage and compute platform
- Assist in deploying and maintaining production systems, including CI/CD workflows
- Work to make our platform more elastic and fault-tolerant
- Provide technical input into roadmaps for the team
- Write clean, maintainable, and well-tested code
Who you are:
- BS in computer science or equivalent experience
- 1-3 years of professional software engineering experience (internships included)
- You must be work-authorized in the United States without the need for employer sponsorship.
- This is a hybrid role based in our San Francisco. To ensure a manageable commute for in-office days, candidates must reside within a 60-mile radius of San Francisco, CA. No relocation candidates at this time.
- Familiarity with data processing frameworks such as Apache Spark, Hadoop, or similar.
- Familiarity with containerization tools (e.g., Docker and Kubernetes)
- Experience with workflow orchestration tools (e.g., Airflow) is a plus
- Proficient in Java and/or Python programming languages
- Linux system administration/automation experience
- Strong problem-solving and debugging skills
- Organized, detail-oriented personality
What you'll learn in this role:
- Hands-on experience with large-scale distributed systems in production.
- Exposure to real-world data science and modeling workflows.
- Best practices in building scalable and reliable data infrastructure.
- Collaboration across engineering and modeling teams.