Architectural Leadership & Spec: Design and specify large-scale storage architectures for specialized HPC workloads. This includes evaluating and selecting between traditional parallel file systems (Lustre, GPFS/Spectrum Scale) and modern, high-concurrency platforms like VAST Data or WEKA. Hybrid Cloud Integration: Build and operate automated data pipelines between on-premise infrastructure and cloud platforms (AWS, Azure, or GCP). NetApp & GIS Collaboration: Partner with the GIS Enterprise Storage team to leverage existing NetApp ecosystems. Implement NetApp FlexCache to provide globally distributed access to datasets, ensuring low-latency performance for remote HPC nodes. Data Mobility & Automation: Design and implement automated replication, auto-migration, and tiered storage workflows to ensure data is available where it is needed most without manual intervention. Scalability & Performance Engineering: Perform rigorous benchmarking (IOR, FIO, MDTest) to identify and eliminate bottlenecks. Ensure the storage fabric (InfiniBand/HDR/NDR and NVMe-over-Fabrics) is tuned for maximum throughput and IOPS. Availability & Disaster Recovery: Design high-availability (HA) configurations that eliminate single points of failure, ensuring mission-critical research and engineering data is always accessible. Experience: 10+ years in Storage Engineering, with at least 6 years dedicated to HPC environments. Experience in Cluster computing and Server, Storage and Networking components and HPC schedulers like SLURM and tools like xCAT and Warewulf. Parallel File System Mastery: Expert-level experience in deploying and tuning Lustre or IBM Spectrum Scale (GPFS) at scale (Petabyte+ environments). Modern Storage Expertise: Proven track record with VAST Data or WEKA, specifically in high-performance NVMe/Flash-first environments. NetApp Advanced Skills: Proficiency in NetApp ONTAP, specifically for hybrid cloud workflows using FlexCache, SnapMirror, and FlexClone. Infrastructure as Code (IaC): Ability to manage storage configurations using automation tools like Ansible, Terraform, or SaltStack. Experience in programming and debugging using Python and YAML scripting Networking Knowledge: Deep understanding of HPC interconnects, including InfiniBand, RoCE, and high-speed Ethernet (100GbE+). Security Mindset: Demonstrated ability to implement storage security best practices, including Kerberos, LDAP integration, and SEC-compliant data immutability. Experience with Container Storage Interfaces (CSI) for running HPC workloads in Kubernetes. Relevant certifications (e.g., NetApp Certified Data Administrator, VAST Data Specialist, or Red Hat Certified Architect). Experience with S3 Object Storage integration for long-term archival and data tiering.
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in the US.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast!
Find the best jobs in the US, apply in 1 click and get a job today!