Applications of Artificial Intelligence for Smart Technology by Swarnalatha P

Applications of Artificial Intelligence for Smart Technology by Swarnalatha P

Author:Swarnalatha P.
Language: eng
Format: epub
Publisher: Engineering Science Reference


GPU does not contain functions to access main memory directly. Likewise, CPU cannot access GPU memory directly. So all the data which is needed has to be copied to the device explicitly. This is done by a function called Cudamemcpy. CUDA kernels are subdivided into grids. The grids are then divided into blocks and then into threads. Each thread in a block executes the code independently and stores the result. Each thread can be accessed by indexing the block and grid of the kernel Sanders et.al (2010).

Programming in CUDA

CUDA programming is an objective of parallel computing design consists of a novel parallel programming design and an instruction regular architecture Nasridinov et.al(2014) & Nguyen (2007).

Before starting to write programs in CUDA, we need a basic understanding of C or C++ programming. Few things which we need to keep in mind while programming in CUDA are listed below.

1. CUDA produce function type qualifiers that aren’t in C/C++ to allow the programmer to describe where a function would run.

2. The key words __host__ if the function statement contains this qualifier then it specifies that the program must run on the host CPU (it is the default) Nickolls et.al(2010) & NVIDIA(2010).

3. __device__ if the function declaration contains this qualifier then it specifies that the program ought to run on the GPU and the purpose can only be called by program consecutively on the GPU.

4. __global__ if the function statement contains this qualifier then it specifies that the program ought to run on the GPU but have to be named from the host (CPU) - this is the entrée point to start multi-threaded programmes consecutively on the GPU NVIDIA (2014).

5. Inside the <<< >>> syntax, we need at least two arguments to be present for calling any global or device function, one for blockgrid and another for number of thread blocks. A typical function call looks like function name <<<bg; tb>>>, bg identifies the dimensions of the block grid and tb identifies the dimensions of each thread block Munshi (2008).

6. __host__: if the function declaration contains this qualifier then it requires the code must run on the host CPU (it is the default).

7. GPU device could not execute code on the CPU host.

8. CUDA imposes a few limitations, for example, the only GPU code is C (CPU code can be C++), GPU code can’t be called recursively.

9. All calls to a global function must specify how many threaded copies are to launched and in what configuration.

10. Call for any global or device function is defined by a specific syntax<<< >>> Sanders et.al(2011).

11. The keywords __device__ in the event that the variable presentation contains this qualifier, at that point it determines that the variable resides in the GPU global memory and is described while the code runs.

12. The __constant__ in the event that the variable announcement contains this qualifier, at that point it determines that the variable resides in the constant memory space of the device (GPU) and is characterized while the code runs.

13. The __shared__



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.