S

Research Crawling Engineer

salary Salary :

$150,000 - 225,000 yearly

icon briefcase Job Type : Full Time
icon remote-alt Remote / Work from Home

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now
Open only for candidates based in the US

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Research Crawling Engineer

The employer is a decentralized, Solana -based web -scraping network that allows users to monetize their unused internet bandwidth. By installing a browser extension, users securely share bandwidth to help AI companies crawl the web for public data, receiving Points (convertible to crypto tokens) as compensation.
They also operate a massive distributed crawler, giving them unique access to high -quality public web data at global scale.

They are hiring a Research Crawling Engineer (Full remote - USA/EU 6 hour overlap with EST) 

You will join a company at the forefront of developing a web -scale crawler and knowledge graph that improves access to public web data and extends the value of AI to the people.

As a Research Crawling Engineer, you will design and operate large -scale web data acquisition systems for research and model development. You will work will span distributed systems, scraping infrastructure, and data pipelines.



This Role Involves:

- Operating at the boundary of scale and reliability
- Adapting to constantly changing web environments
- Balancing throughput, coverage, and data quality
- Owning end -to -end data acquisition pipelines


MISSIONS


  • Design high -throughput, fault -tolerant systems for data collection (millions to billions of URLs/day)
  • Handle anti -bot systems, rate limits, and dynamic/JS -heavy sites
  • Develop pipelines for cleaning, deduplication, filtering, and normalisation
  • Construct and maintain datasets for research and model training
  • Monitor crawl performance, coverage, and data quality; iterate quickly
  • Collaborate with research teams to align data collection with modeling needs
  • Optimize infrastructure for cost, latency, and reliability

Example Projects you could work on :

- Build a distributed crawler for a continuously updated, high -quality web project
- Design a system to classify and filter billions of pages for pretraining
- Extract structured data from dynamic, JS -heavy sites at scale
- Improve deduplication and quality scoring across multimodal datasets

Requirements

  • Strong programming experience in one or more of : Go, Rust, Python, Java, or C++
  • Experience working for reputable companies
  • Experience building and maintaining large -scale web crawlers or large -scale data pipelines
  • Experience designing high -throughput, fault -tolerant systems for data collection (millions to billions of URLs/day)
  • Experience handling anti -bot systems, rate limits, and dynamic/JS -heavy sites
  • Experience constructing and maintaining datasets for research and model training
  • Solid understanding of HTTP, networking, and browser behavior
  • Familiarity with distributed systems and parallel processing
  • Experience working with large datasets (TB–PB scale preferred)
  • Ability to debug unstable or adversarial environments

Preferred / Bonus:

  • Experience with NLP pipelines or dataset curation for ML
  • Familiarity with LLM pretraining data or retrieval systems
  • Experience with headless browsers (e.g., Chrome DevTools Protocol, Playwright, Puppeteer)
  • Knowledge of proxy systems, IP rotation, and large -scale request orchestration
  • Background in data quality evaluation or benchmarking
  • Experience running workloads on cloud or bare -metal infrastructure

Main Evaluation Criteria:

  • Ability to design systems that scale without degrading quality
  • Practical problem -solving under real -world constraints
  • Speed of iteration and ownership
  • Measurable improvements in data coverage, quality, or efficiency


Benefits

  • Contract : Permanent role (Full remote - USA or 6 hour overlap with EST).
  • Salary : $150k to $225k based on experience and demonstrated ability to operate at scale + Equity package / tokens

Recruitment process :

  • Recruiter / HR Call
  • Technical Interview
  • CEO Interview
  • Final Interview


Original job Research Crawling Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Open only for candidates based in the US
Share Job
Share Job

Auto-Apply to Research Crawling Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Research Crawling Engineer Jobs in the US

GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast! Find the best jobs in the US, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.