Running OpenCL kernel on multiple GPUs?-Collection of common programming errors


  • Vladimir

    Right now I programmed made several algorithms running in parallel on one GPU, but all of them have the same problem, when I try to execute them on several GPUs (for example, 3). The problem is that the code, executed on one GPU executes exactly the same amount of time on 3 GPUs (not faster). I tried to execute with more data, tried different tasks to be executed, nothing helped. Finally, I ended up trying to run the easiest task like elements sum and still got this awful mistake. That is why I don’t believe it is a problem of a particular algorithm and I feel there is a mistake in my code (or even in my approach to parallelizing code on several GPUs).

    Here is the header file for my Parallel.cpp class:

    #ifndef PARALLEL_H
    #define PARALLEL_H
    
    #define __NO_STD_VECTOR // Use cl::vector and cl::string and
    #define __NO_STD_STRING // not STL versions, more on this later
    #include 
    
    class Parallel
    {
        public:
            Parallel();
            int executeAttachVectorsKernel(int*, int*, int*, int);
            static void getMaxWorkGroupSize(int*, int*, int*);
            virtual ~Parallel();
        protected:
        private:
            char* file_contents(const char*, int*);
            void getShortInfo(cl_device_id);
            int init(void);
            cl_platform_id platform;
            cl_device_id* devices;
            cl_uint num_devices;
            cl_command_queue* queues;
            int* WGSizes;
            int* WGNumbers;
            cl_context context;
            cl_program program;
            cl_kernel kernel;
            cl_mem input1;
            cl_mem input2;
            cl_mem output;
    };
    
    #endif // PARALLEL_H
    

    Here is the initialization method init:

    int Parallel::init() {
    cl_int err;
    
    //Connect to the first platfrom
    err = clGetPlatformIDs(1, &platform, NULL);
    if (err != CL_SUCCESS) {
        cerr