top of page

Synchronization Functions

Each workitem in a kernel can freely use __private memory without any synchronization since they can't see other workitems' __private memory.

Even though the OpenCL abstracts hardware away from developer, it exposes four different memory levels for vendors to implement. They are __global, __local, __constant and __private.

The constants of a kernel are not changed in scope of the kernel and also don't need any synchronization between workitems of the kernel.

__global and __local memories are both readable and writable by all workitems of a kernel, so they raise an exception in developer's mind to start thinking "what if", whenever a data is read by a workitem and written by another, in same kernel. Because the order of execution of workitems is not predictable. Not for prediction of order, but for prediction of data, there are synchronization functions. Similar to atomic functions, they have __global and __local versions that synchronize on data but for only in-group(a workitem's own workgroup). Kernel side synchronization functions are barrier() and fence(). Thinking from a workitem's point of view, barrier is for its workgroup, fence is for itself.

Note: In same kernel, there is no synchronization between workgroups or compute units.

Apart from kernel-side synchronization, there are host-side synchronization methods too. Easiest but heaviest one is clFinish(). This makes everything on a commandqueue to be completed and make its relevant global memory to be up-to-dately visible from host-side. Another method is event-waiting clWaitForEvents(). This makes it easier to wait for multiple commandqueues or even multiple devices(if they are in same opencl-context). Workitems' final frontier is their kernel. When kernel ends, next in-order kernel sees the most up-to-date data generated by them.  More detailed info about memory visibility for command-queues can be found here:

© 2017 by Huseyin Tugrul BUYUKISIK. Proudly created with Wix.com

  • Twitter B&W
bottom of page