Académique Documents
Professionnel Documents
Culture Documents
Créez des programmes 5 fois plus rapides - en toute simplicité. Maintenant sur chaque GPU !
Télécharger la démo
Télécharger OpenCLlib.zip
Source de téléchargement
Fonctionne sur plusieurs GPU et tout autre appareil OpenCL maintenant. Veuillez
commenter votre résultat de la démo
Introduction
Ce projet va vous àmontrer
Se connecter qu'un Core
codeproject.com I7Google
avec moderne est probablement le matériel
programmable le plus lent que vous ayez sur votre PC. Les processeurs Quad Core
modernes ont environ
Ramaroson Tiana 6 Gflops alors que les GPU modernes ont environ 6 Tflops de
puissance de calcul.
masterdreammaker@gmail.com
Ce projet peut
Tiana exécuter dynamiquement des programmes simples écrits dans un dialecte C
Ramaroson
ramaroson.tiana.bq@gmail.com
(OpenCL C) sur votre GPU, CPU ou les deux. Ils sont compilés et exécutés au moment de
l'exécution.
Cela va également montrer que la programmation GPU n'a pas à être difficile. En fait, vous
n'avez besoin que d'un peu de compétences de base en programmation pour ce projet.
Si vous souhaitez ignorer l'introduction et plonger dans son utilisation, n'hésitez pas à
télécharger le code source.
If you have a piece of code which is concurrent and you want to speed it up, this is the right
project for you. Ideally, all your data fits into some float or other numeric arrays.
Keep in mind that this project uses OpenCL. Unlike Cuda, it runs on any GPU (Amd, Nvidia,
Intel) and also on the CPU. So any program you write can be used on any device. (Even
phones)
OpenCL code always runs faster than C# on arrays and is really easy and quick to just use
with this project.
(See example below) The overhead you have as a developer is literally zero. Just write a
function and you are done. Dont think about computedevices, pinvoke, marshalling and other
stuff.
Imagine you wanted to know all prime numbers from 2 to 10^8. Here is a simple
implementation in C# (yes, I know there are much better algorithms to calculate primes).
C# Shrink ▲
static void IsPrimeNet(int[] message)
{
Parallel.ForEach(message, (number, state, index) =>
{
int upperlimit = (int)Math.Sqrt(number);
for(int i=2;i<=upperlimit;i++)
{
if (message[index]%i == 0) //no lock needed. every index
is independent
{
message[index] = 0;
break;
}
}
});
}
C++
int upperl=(int)sqrt((float)message[index]);
for(int i=2;i<=upperl;i++)
{
if(message[index]%i==0)
{
//printf("" %d / %d\n"",index,i );
message[index]=0;
return;
}
}
//printf("" % d"",index);
}
OpenCL does wrap your kernel (piece of code to run) in a loop. For simple 1D Arrays, you
can get the index by calling get_global_id(0); The upper index of your index is
passed when you invoke the kernel.
Instead of int[], you write int* and so on. You can also pass every other primitive type
(int, float...).
You have to pass arguments in the same order in which you declared them. You can also call
printf inside your kernel to debug later. You can define as many methods as you like
inside the kernel. You pick the entry point later by calling Invoke("Name Here").
OpenCL C is the same as C but you cannot use pointers and you also have some special data
types.
For in depth information, check out this link.
C# Shrink ▲
static void Main(string[] args)
{
int upperl=(int)sqrt((float)message[index]);
for(int i=2;i<=upperl;i++)
{
if(message[index]%i==0)
{
//printf("" %d / %d\n"",index,i );
message[index]=0;
return;
}
}
//printf("" % d"",index);
}";
}
}
With this, you can dynamically compile and invoke OpenCL kernels. You can also change
your accelerator (CPU, GPU) after you have loaded the kernel.
If you want to use every bit of computational power of your PC, you can use the class
MultiCL. This class works by splitting your work into N parts. Every part is pushed onto the
GPU or CPU whenever possible. This way, you get the maximum performance from your PC.
You also know how much work is already done which is not possible with EasyCL.
C# Shrink ▲
static void Main(string[] args)
{
int[] Primes = Enumerable.Range(2, 1000000).ToArray();
int N = 200;
MultiCL cl = new MultiCL();
cl.ProgressChangedEvent += Cl_ProgressChangedEvent1;
cl.SetKernel(IsPrime, "GetIfPrime");
cl.SetParameter(Primes);
cl.Invoke(0, Primes.Length, N);
}
It basically hides all the implementation details you need to know to use OpenCL and Cloo.
To get more information about your kernel or device, use the class OpenCL.
Internally, every call to Invoke calls the corresponding methods in the OpenCL API:
C# Shrink ▲
void Setargument(ComputeKernel kernel, int index, object arg)
{
if (arg == null) throw new ArgumentException("Argument " + index +
" is null");
Type argtype = arg.GetType();
if (argtype.IsArray)
{
Type elementtype = argtype.GetElementType();
//ComputeBuffer<int> messageBuffer = new ComputeBuffer<int>
(context,
//ComputeMemoryFlags.ReadOnly |
ComputeMemoryFlags.UseHostPointer, (int[])arg);
ComputeMemory messageBuffer =
(ComputeMemory)Activator.CreateInstance
(typeof(ComputeBuffer<int>), new
object[]
{
context,
ComputeMemoryFlags.ReadWrite |
ComputeMemoryFlags.UseHostPointer,
arg
});
kernel.SetMemoryArgument(index, messageBuffer); // set the
array
}
else
{
//kernel.SetValueArgument(index, (int)arg); // set the array
size
typeof(ComputeKernel).GetMethod("SetValueArgument").MakeGenericMethod(a
rgtype).Invoke
(kernel, new object[] { index, arg });
}
}
Every time you change the kernel or the accelerator, the program gets recompiled:
For a faster prototyping phase, this class also tells you why you cannot compile your kernel.
C# Shrink ▲
public void LoadKernel(string Kernel)
{
this.kernel = Kernel;
program = new ComputeProgram(context, Kernel);
try
{
program.Build(null, null, null, IntPtr.Zero); //compile
}
catch (BuildProgramFailureComputeException)
{
string message = program.GetBuildLog(platform.Devices[0]);
throw new ArgumentException(message);
}
}
It is very important to know that if your GPU driver crashes or kernels use 100% of your GPU
for more than 3 seconds (on pre Win10 machines), the kernel will get aborted. You should
dispose the EasyCL object after that.
C#
For some reason, I don't know it is faster to invoke an empty kernel first and then all
subsequent calls are faster. (OpenCL initialization maybe).
What is Missing?
You cannot choose if you want to use the host pointer or read write access to int[] passed
to the kernel. I did not see any performance gain by setting an array to read only. This seems
to be a legacy function.
This class is written for PCs. With Visual Studio/Xamarin, it should be easy to adapt it for
phones. (Modern Smartphones with 8 Cores do rival most Laptops in performance.)
http://www.nvidia.com/Download/index.aspx?lang=en-us
http://support.amd.com/en-us/download
https://software.intel.com/en-us/articles/opencl-drivers#latest_CPU_runtime
Download demo
My Results:
(06.10.2021 - Time flies - Hardware gets faster!)
(13.12.2016)
AMD RX480:
5527,46 Single GFlops
239,78 Double GFlops
(10.09.2016)
License
This article, along with any associated source code and files, is licensed under The MIT
License
Written By
D. Infuehr
Student
Austria
Search Comments
Result
remindmeplease 8-Oct-22 20:45
Bug in MultiCL?
Henry Varley 7-Jun-22 6:51
Result
Member 15541673 20-Feb-22 3:59
Re: Result
D. Infuehr 3-Mar-22 12:50
Result rx 5500xt 8G
P.Metaxas 18-Feb-22 11:32
Result rx 570 4G
P.Metaxas 8-Jan-22 10:25
Result RX480
Member 15453036 2-Dec-21 7:59
Result
eldeperky 12-Nov-21 19:45
result
Member 15198070 10-Oct-21 10:01
Refresh 1 2 3 4 5 6 Next ᐅ