Engineer - AI Inference Performance

Company : Huaweicanada

Job Type : Full Time

Waterloo, Ontario

Number of Applicants

000+

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Engineer - AI Inference Performance

About the team:

The Intelligent Complex Systems Team, currently a part of the Waterloo Research Centre, examines recent advancements in artificial intelligence (AI) and robotics to determine its potential for broader applications. This innovative team researches AI challenges such as matching human capabilities and ensuring the safety of collaborative AI systems.

Develop and maintain real-time and historical performance monitoring tools for AI inference workloads, including profiling tools for various AI model types (small models, LLMs, VLMs, and multimodal systems) in applications like conversational AI, video processing, and real-time analytics.
Analyze and classify inference workloads based on characteristics like profile, decode, pre/post-processing overheads, and computational complexity to develop tailored optimization strategies.
Develop performance models that consider the systematic factors of AI inference, including model size, architecture (e.g., transformers, CNNs), application-specific constraints (e.g., latency for conversational AI), and compute resource characteristics (GPU, TPU, CPU, and specialized accelerators).
Optimize inference workloads across various hardware resources by reducing latency, minimizing memory overhead, and improving throughput. Techniques include quantization, pruning, fusion, and caching. Ensure that models can scale efficiently across diverse compute platforms, from edge devices to large-scale cloud infrastructures.
Lead efforts in creating benchmarks for different types of inference tasks. Utilize tools such as NVIDIA Nsight, PyTorch Profiler, and TensorBoard to gain insights into inference performance across diverse hardware platforms.
Conduct benchmarking and performance comparisons across various hardware platforms (e.g., GPUs, TPUs, edge accelerators) to identify bottlenecks and optimization opportunities. Provide recommendations for software and hardware improvements based on inference throughput, latency, and power consumption.
Work closely with AI research, software engineering, and DevOps teams to improve the end-to-end AI inference pipeline, ensuring optimized deployments across different production environments. Collaborate with system architects to incorporate resource-aware optimizations into design practices.
Develop strategies to ensure the scalability of inference workloads in production environments, considering both model performance and resource scaling, whether in on-premises environments, cloud infrastructure, or edge computing devices.

Develop and maintain real-time and historical performance monitoring tools for AI inference workloads, including profiling tools for various AI model types (small models, LLMs, VLMs, and multimodal systems) in applications like conversational AI, video processing, and real-time analytics.

Analyze and classify inference workloads based on characteristics like profile, decode, pre/post-processing overheads, and computational complexity to develop tailored optimization strategies.

Develop performance models that consider the systematic factors of AI inference, including model size, architecture (e.g., transformers, CNNs), application-specific constraints (e.g., latency for conversational AI), and compute resource characteristics (GPU, TPU, CPU, and specialized accelerators).

Optimize inference workloads across various hardware resources by reducing latency, minimizing memory overhead, and improving throughput. Techniques include quantization, pruning, fusion, and caching. Ensure that models can scale efficiently across diverse compute platforms, from edge devices to large-scale cloud infrastructures.

Lead efforts in creating benchmarks for different types of inference tasks. Utilize tools such as NVIDIA Nsight, PyTorch Profiler, and TensorBoard to gain insights into inference performance across diverse hardware platforms.

Conduct benchmarking and performance comparisons across various hardware platforms (e.g., GPUs, TPUs, edge accelerators) to identify bottlenecks and optimization opportunities. Provide recommendations for software and hardware improvements based on inference throughput, latency, and power consumption.

Work closely with AI research, software engineering, and DevOps teams to improve the end-to-end AI inference pipeline, ensuring optimized deployments across different production environments. Collaborate with system architects to incorporate resource-aware optimizations into design practices.

Develop strategies to ensure the scalability of inference workloads in production environments, considering both model performance and resource scaling, whether in on-premises environments, cloud infrastructure, or edge computing devices.

Original job Engineer - AI Inference Performance posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Engineer Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Engineer Jobs in Canada

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Engineer - AI Inference Performance

Job Description - Engineer - AI Inference Performance

Similar Engineer Jobs in Canada

Mobile Apps