Multi-GPU Load Balancing
- htugrulbuyukisik
- 14 Nis 2017
- 1 dakikada okunur
Let your OpenCL kernel code run on all devices concurrently and efficiently with work partitioning for each new compute method call.
Whenever a compute method with same compute-id parameter is called, it retains older calls' work partitioning percentages and makes it better for the next call. This way, a work is iteratively load balanced on all OpenCL-enabled devices and work partitioning ratio convergences to a latency-minimizing point. This is how a developer can speed-up a repeated algorithm by 1000x.
Comments