Running OpenCL kernel on multiple GPUs?-Collection of common programming errors

12 years ago

admin

2 minutes

Vladimir

Right now I programmed made several algorithms running in parallel on one GPU, but all of them have the same problem, when I try to execute them on several GPUs (for example, 3). The problem is that the code, executed on one GPU executes exactly the same amount of time on 3 GPUs (not faster). I tried to execute with more data, tried different tasks to be executed, nothing helped. Finally, I ended up trying to run the easiest task like elements sum and still got this awful mistake. That is why I don’t believe it is a problem of a particular algorithm and I feel there is a mistake in my code (or even in my approach to parallelizing code on several GPUs).

Here is the header file for my Parallel.cpp class:

#ifndef PARALLEL_H
#define PARALLEL_H

#define __NO_STD_VECTOR // Use cl::vector and cl::string and
#define __NO_STD_STRING // not STL versions, more on this later
#include 

class Parallel
{
    public:
        Parallel();
        int executeAttachVectorsKernel(int*, int*, int*, int);
        static void getMaxWorkGroupSize(int*, int*, int*);
        virtual ~Parallel();
    protected:
    private:
        char* file_contents(const char*, int*);
        void getShortInfo(cl_device_id);
        int init(void);
        cl_platform_id platform;
        cl_device_id* devices;
        cl_uint num_devices;
        cl_command_queue* queues;
        int* WGSizes;
        int* WGNumbers;
        cl_context context;
        cl_program program;
        cl_kernel kernel;
        cl_mem input1;
        cl_mem input2;
        cl_mem output;
};

#endif // PARALLEL_H

Here is the initialization method init:

int Parallel::init() {
cl_int err;

//Connect to the first platfrom
err = clGetPlatformIDs(1, &platform, NULL);
if (err != CL_SUCCESS) {
    cerr