

Multi-GPU Load Balancing
Let your OpenCL kernel code run on all devices concurrently and efficiently with work partitioning for each new compute method call....
Offload image-resize to all GPUs and FPGAs so server feels more relaxed to host websites
Move compute-heavy sql table joins to C# side to let sql server handle the data-heavy parts.
Make particle physics programs performance-aware, even a mild overclock to one of GPUs will increase overall performance.
Write your own genuine kernel code to accomplish multi-GPU computing, easily without getting low-level on host side.
Device to device pipelining.
Built-in image resizer functions.
Built-in matrix-multiplication functions.
​