from theano import function, config, shared, tensor, sandbox import numpy import time
vlen = 10 * 30 * 768# 10 x #cores x # threads per core iters = 1000
rng = numpy.random.RandomState(22) x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) f = function([], tensor.exp(x)) print(f.maker.fgraph.toposort()) t0 = time.time() for i inrange(iters): r = f() t1 = time.time() print("Looping %d times took %f seconds" % (iters, t1 - t0)) print("Result is %s" % (r,)) if numpy.any([isinstance(x.op, tensor.Elemwise) and ('Gpu'notintype(x.op).__name__) for x in f.maker.fgraph.toposort()]): print('Used the cpu') else: print('Used the gpu')
Performance:
Only use CPU (1.662902 s)
1 2 3 4 5
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] Looping 1000 times took 1.662902 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284]
Using the cpu
OpenCL over CPU (1.057008 s)
1 2 3 4 5 6
$ THEANO_FLAGS=mode=FAST_RUN,device=opencl0:0,floatX=float32 python check1.py Mapped name None to device opencl0:0: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz [GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)] Looping 1000 times took 1.057008 seconds Result is [ 1.23178029 1.61879325 1.52278078 ..., 2.20771813 2.29967737 1.62323272]
Using the cpu
OpenCL over Intel GPU (0.554572 s)
1 2 3 4 5 6
$ THEANO_FLAGS=mode=FAST_RUN,device=opencl0:1,floatX=float32 python check1.py Mapped name None to device opencl0:1: Iris Pro [GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)] Looping 1000 times took 0.554572 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284]
Using the Intel gpu
OpenCL over AMD GPU (0.470640 s)
1 2 3 4 5 6
$ THEANO_FLAGS=mode=FAST_RUN,device=opencl0:2,floatX=float32 python check1.py Mapped name None to device opencl0:2: AMD Radeon R9 M370X Compute Engine [GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no\_inplace}.0)\] Looping 1000 times took 0.470640 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284]
Using the AMD gpu
If you using stable(0.8.2) Theano, you will meet this error when you try to use OpenCL:
RuntimeError: ('Wrong major API version for gpuarray:', -9998, 'Make sure Theano and libgpuarray/pygpu are in sync.')
so, you should install development version of Theano.