What you'll do
- Design and implement systems to collect, extract, and normalize external data from a variety of sources.
- Collaborate with researchers and analysts to identify new sources of valuable company data and define integration strategies.
- Build robust, scalable pipelines that ingest structured and semi-structured data into our database.
- Ensure high levels of accuracy, coverage, and freshness across incoming data streams.
- Contribute to the evolution of our data platform and internal tooling.
- Improve system reliability, observability, and performance over time.
Who you are
- 3+ years of experience as a backend or full-stack software engineer, ideally working with data ingestion or ETL systems.
- Intimate knowledge of how to crawl the internet at scale.
- Strong programming skills, especially in Python.
- Experience working with structured and unstructured data from diverse external systems.
- Comfortable debugging complex issues involving networking, content rendering, or inconsistent source data.
- Proficient with SQL and relational databases.
- A clear communicator who collaborates effectively with both technical and non-technical teammates.
- Passionate about turning raw data into meaningful insight, and eager to work on technically nuanced challenges.
Ideally you'll have
- Familiarity with headless browser automation or techniques for collecting data from dynamic content sources.
- Expertise in the architure, technologies, and tools that run the modern internet such as DNS, networking, CDNs, WAFs, proxies and reverse proxies.
- Experience with event-driven architecture.
- Eagerness to incorporate new technologies and validate their usefulness using structured experiments and thorough testing.
- Experience building health monitoring and observability tools for consumption by automated tools, engineers, and non-technical stakeholders.