{"id":6164,"date":"2014-04-13T04:36:09","date_gmt":"2014-04-13T04:36:09","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2014\/04\/13\/opencl-image2d_t-writing-mostly-zeros-collection-of-common-programming-errors\/"},"modified":"2014-04-13T04:36:09","modified_gmt":"2014-04-13T04:36:09","slug":"opencl-image2d_t-writing-mostly-zeros-collection-of-common-programming-errors","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2014\/04\/13\/opencl-image2d_t-writing-mostly-zeros-collection-of-common-programming-errors\/","title":{"rendered":"OpenCL image2d_t writing mostly zeros-Collection of common programming errors"},"content":{"rendered":"<ul>\n<li><img decoding=\"async\" src=\"http:\/\/www.gravatar.com\/avatar\/acdb1f686ed51b443f0c0ad4d6ca6850?s=32&amp;d=identicon&amp;r=PG\" \/><br \/>\nAflixion<\/p>\n<p>I am trying to use OpenCL and image2d_t objects to speed up image convolution. When I noticed that the output was a blank image of all zeros, I simplified the OpenCL kernel to a basic read from the input and write to the output (shown below). With a little bit of tweaking, I got it to write a few scattered pixels of the image into the output image.<\/p>\n<p>I have verified that the image is intact up until the call to read_imageui() in the OpenCL kernel. I wrote the image to GPU memory with CommandQueue::enqueueWriteImage() and immediately read it back into a brand new buffer in CPU memory with CommandQueue::enqueueReadImage(). The result of this call matched the original input image. However, when I retrieve the pixels with read_imageui() in the kernel, the vast majority of the pixels are set to 0.<\/p>\n<p>C++ source:<\/p>\n<pre><code>int height = 112;\nint width = 9216;\nunsigned int numPixels = height * width;\nunsigned int numInputBytes = numPixels * sizeof(uint16_t);\nunsigned int numDuplicatedInputBytes = numInputBytes * 4;\nunsigned int numOutputBytes = numPixels * sizeof(int32_t);\n\ncl::size_t origin;\norigin.push_back(0);\norigin.push_back(0);\norigin.push_back(0);\ncl::size_t region;\nregion.push_back(width);\nregion.push_back(height);\nregion.push_back(1);\n\nstd::ifstream imageFile(\"hri_vis_scan.dat\", std::ifstream::binary);\ncheckErr(imageFile.is_open() ? CL_SUCCESS : -1, \"hri_vis_scan.dat\");\nuint16_t *image = new uint16_t[numPixels];\nimageFile.read((char *) image, numInputBytes);\nimageFile.close();\n\n\/\/ duplicate our single channel image into all 4 channels for Image2D\ncl_ushort4 *imageDuplicated = new cl_ushort4[numPixels];\nfor (int i = 0; i &lt; numPixels; i++)\n    for (int j = 0; j &lt; 4; j++)\n        imageDuplicated[i].s[j] = image[i];\n\ncl::Buffer imageBufferOut(context, CL_MEM_WRITE_ONLY, numOutputBytes, NULL, &amp;err);\ncheckErr(err, \"Buffer::Buffer()\");\n\ncl::ImageFormat inFormat;\ninFormat.image_channel_data_type = CL_UNSIGNED_INT16;\ninFormat.image_channel_order = CL_RGBA;\ncl::Image2D bufferIn(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, inFormat, width, height, 0, imageDuplicated, &amp;err);\ncheckErr(err, \"Image2D::Image2D()\");\n\ncl::ImageFormat outFormat;\noutFormat.image_channel_data_type = CL_UNSIGNED_INT16;\noutFormat.image_channel_order = CL_RGBA;\ncl::Image2D bufferOut(context, CL_MEM_WRITE_ONLY, outFormat, width, height, 0, NULL, &amp;err);\ncheckErr(err, \"Image2D::Image2D()\");\n\nint32_t *imageResult = new int32_t[numPixels];\nmemset(imageResult, 0, numOutputBytes);\n\ncl_int4 *imageResultDuplicated = new cl_int4[numPixels];\nfor (int i = 0; i &lt; numPixels; i++)\n    for (int j = 0; j &lt; 4; j++)\n        imageResultDuplicated[i].s[j] = 0;\n\nstd::ifstream kernelFile(\"convolutionKernel.cl\");\ncheckErr(kernelFile.is_open() ? CL_SUCCESS : -1, \"convolutionKernel.cl\");\nstd::string imageProg(std::istreambuf_iterator(kernelFile), (std::istreambuf_iterator()));\ncl::Program::Sources imageSource(1, std::make_pair(imageProg.c_str(), imageProg.length() + 1));\ncl::Program imageProgram(context, imageSource);\nerr = imageProgram.build(devices, \"\");\ncheckErr(err, \"Program::build()\");\n\ncl::Kernel basic(imageProgram, \"basic\", &amp;err);\ncheckErr(err, \"Kernel::Kernel()\");\n\nbasic.setArg(0, bufferIn);\nbasic.setArg(1, bufferOut);\nbasic.setArg(2, imageBufferOut);\n\nqueue.finish();\n\ncl_ushort4 *imageDuplicatedTest = new cl_ushort4[numPixels];\nfor (int i = 0; i &lt; numPixels; i++)\n{\n    imageDuplicatedTest[i].s[0] = 0;\n    imageDuplicatedTest[i].s[1] = 0;\n    imageDuplicatedTest[i].s[2] = 0;\n    imageDuplicatedTest[i].s[3] = 0;\n}\ndouble gpuTimer = clock();\n\nerr = queue.enqueueReadImage(bufferIn, CL_FALSE, origin, region, 0, 0, imageDuplicatedTest, NULL, NULL);\ncheckErr(err, \"CommandQueue::enqueueReadImage()\");\n\n\/\/ Output from above matches input image\n\nerr = queue.enqueueNDRangeKernel(basic, cl::NullRange, cl::NDRange(height, width), cl::NDRange(1, 1), NULL, NULL);\ncheckErr(err, \"CommandQueue::enqueueNDRangeKernel()\");\n\nqueue.flush();\n\nerr = queue.enqueueReadImage(bufferOut, CL_TRUE, origin, region, 0, 0, imageResultDuplicated, NULL, NULL);\ncheckErr(err, \"CommandQueue::enqueueReadImage()\");\n\nqueue.flush();\n\nerr = queue.enqueueReadBuffer(imageBufferOut, CL_TRUE, 0, numOutputBytes, imageResult, NULL, NULL);\ncheckErr(err, \"CommandQueue::enqueueReadBuffer()\");\n\nqueue.finish();\n<\/code><\/pre>\n<p>OpenCL kernel:<\/p>\n<pre><code>__kernel void basic(__read_only image2d_t input, __write_only image2d_t output, __global int *result)\n{\nconst sampler_t smp = CLK_NORMALIZED_COORDS_TRUE | \/\/Natural coordinates\n     CLK_ADDRESS_NONE | \/\/Clamp to zeros\n     CLK_FILTER_NEAREST; \/\/Don't interpolate\n\nint2 coord = (get_global_id(1), get_global_id(0));\n\nuint4 pixel = read_imageui(input, smp, coord);\nresult[coord.s0 + coord.s1 * 9216] = pixel.s0;\nwrite_imageui(output, coord, pixel);\n}\n<\/code><\/pre>\n<p>The coordinates in the kernel are currently mapped to (x, y) = (width, height).<\/p>\n<p>The input image is a single channel greyscale image with 16 bits per pixel, which is why I had to duplicate the channels to fit into OpenCL&#8217;s Image2D. The output after convolution will be 32 bits per pixel, which is why numOutputBytes is set to that. Also, although the width and height appear weird, the input image&#8217;s dimensions are 9216&#215;7824, so I&#8217;m only taking a portion of it to test the code first, so it doesn&#8217;t take forever.<\/p>\n<p>I added in a write to global memory after reading from the image in the kernel to see if the issue was reading the image or writing the image. After the kernel executes, this section of global memory also contains mostly zeros.<\/p>\n<p>Any help would be greatly appreciated!<\/p>\n<\/li>\n<li><img decoding=\"async\" src=\"http:\/\/www.gravatar.com\/avatar\/82d780ae9f48d23834df77d3b0d5881c?s=32&amp;d=identicon&amp;r=PG\" \/><br \/>\nananthonline<\/p>\n<p>The documentation for read_imageui states that<\/p>\n<blockquote>\n<p>Furthermore, the read_imagei and read_imageui calls that take integer coordinates must use a sampler with normalized coordinates set to <strong>CLK_NORMALIZED_COORDS_FALSE<\/strong> and addressing mode set to <strong>CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE<\/strong>; otherwise the values returned are undefined.<\/p>\n<\/blockquote>\n<p>But you&#8217;re creating a sampler with CLK_NORMALIZED_COORDS_TRUE (but seem to be passing in non-normalized coords :S ?).<\/p>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Aflixion I am trying to use OpenCL and image2d_t objects to speed up image convolution. When I noticed that the output was a blank image of all zeros, I simplified the OpenCL kernel to a basic read from the input and write to the output (shown below). With a little bit of tweaking, I got [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-6164","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/6164","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=6164"}],"version-history":[{"count":0,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/6164\/revisions"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=6164"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=6164"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=6164"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}