Fundamentals of GPU programming (2)

本文探讨了CPU和GPU在处理并行任务时的差异,重点在于任务并行性和数据并行性两种加速方式。任务并行性是将大任务拆分为多个子任务分配给多个处理器,而数据并行性则是将输入/输出数据分割,每个处理器处理一部分。SIMD(单指令多数据)是GPU中常见的数据并行处理模型,所有核心执行相同指令但使用不同数据。指导原则指出,对于串行过程和小规模数据,CPU更优,而GPU则强调每条指令的重要性,避免变量迭代次数不确定的if语句或循环导致的延迟。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Parallelism and GPU Architecture

CPU is optimized to process a single sequence of instructions. It is extremely fast but there are some walls, such as memory, power, and instruction level parallelism.

Two speedup ways.
Given a process that requires time TTT, we can use PPP processors to reduce the processing time to ideally T/PT/PT/P.

  • task parallelism. Break the problem up into T>=PT>=PT>=P tasks and pass them off to a process.
  • data parallelism. Break the input/output data into D>=PD>=PD>=P subsets and lauch one thread for each piece of data.

Task Prallelism

Assign the first P tasks to a process --> When any processor finishes a task TnT_nTn, move to task TP+1T_{P+1}TP+1 --> Repeat until all tasks are completed

This has generally been the primary model for cluster computing and supercomputing.

Data Prallelism

Send the first P threads on different processors --> once any thread TnT_nTn completes, lauch another thread --> Repeat until all threads have completed

SIMD --Single instruction multiple data

  • All cores execute the same instruction and different data can be used.

Guiding principles

CPU is always faster for a serial process and small data. On GPU, every instruction is important. There might be stalls when if statement or loops with viriable numbers of iterations occur.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值