2011年1月16日

Graphics Processing Unit Architecture

Abstract

As the demand of graphics solutions increases, more and more money is put to enhance the development of better Graphics Processing Unit (GPU) in a computer. The GPU thus plays an important role in gaming, drafting in both the field of industry and business, such requirement makes GPU development leads the CPU development by a long distance.

Nowadays, GPU are designed not only for calculation in graphics, but also in other field including programming and generic computing.

This project will dive into the architecture of the GPU, and explore the principle and situation of how GPU works. The possibilities of future GPU development will also be discussed.





Introduction

A Graphics Processing Unit (GPU)[8] is a dedicated microprocessor designed especially for image data translation which is used in personal computer, 2D/3D graphics, visual computing and display. GPU is efficient at calculation of graphics because of the highly parallel and highly multithread structure and it makes GPUs more effective and powerful in graphics operation than general purpose CPUs.





Architecture

GPUs have to work with a huge amount of data when processing. Nowadays, either AGP bus or PCI bus is used to connect the GPU with CPU via the Northbridge. Improved buses of PCI-e 16X takes the maximum bandwidth of the GPU to 8000 MB/sec and makes efficient in data translation. As a part of the PC architecture GPUs are very sensitive to the memory bottlenecks. Unlike the von Neumann architecture used by conventional CPIs, GPUs use a new technique called stream processing[2] to transfer data.

A stream is defined to be an ordered set of data, works with kernel which is a small program. In stream processing, these kernels operate one or more streams as input, performing more operation on every single element of streams as the output. It makes elements in stream operate simultaneously and several levels of parallelism can be achieved including instruction parallelism, data parallelism and task parallelism. The number of elements in streams is high so that the level of parallelism varies on the number of microprocessors used in computational process.

So, it is important to know about the graphics pipeline in stream processing.

Copyright © 2007 Unversiteit van Amsterdam. All rights reserved
Figure 1.1 The Graphics Pipeline represented by the stream model.

As figure 1.1 shown, the stream formulation of the graphics pipeline expresses all data as streams (indicated by arrows) and all computation as kernels (indicated by blue boxes), programmable stages in the pipeline includes the following:


Vertex shader - which allows the manipulations of image vertices;

Geometry shader - which transforms 3D position into screen position and attributes computation;

Rasterization - This is the expansion of the data to interpolated value (fragments);

Fragment shader - which performs operation with 16/32 floating point precision, as SIMD (Single instruction, multiple data) units at a time.


However, dedicated design of GPUs has a problem that streams need to be passed into kernel one by one which involves more steps and may slow down the processing time. Combined shader core can be used to get different stages together to become one single stage that can handle all the shaders.


Copyright © 2008 University of Innsbruck. All rights reserved.

Figure 1.2 Workload balancing with both architectures.



As figure 1.2 shown, without the unified shader[4] the vertex shader and the pixel shader are working separately and both the shaders have idle hardware time and this will make the GPU operation less efficient. When using unified shader the workload is balanced, performing in one stage and the units can adapt dynamically and further increase efficiency.








Forms[1]

●Integrated graphics solution

As the word “integrated” is used, it means that the GPU is combined in the computer system and consume part of PAM of the PC. However, it gradually replaced by the dedicated one as they are less compatible.


Copyright © 2009 Elsevier, Inc. All rights reserved.
Figure 1.3 Historical PC. VGA controller drives graphics display from frame buffer memory.


As figure 1.3 shown, integrated solution is a part of PC architecture. It is considered to be unfit to play 3D graphics games or run programs which are graphically intensive except the one like Adobe Flash. Also with the fact that if either the motherboard or the video card fails on the integrated solution, both components must be replaced. This is inconvenient because they cannot be modified or changed.



●Dedicated graphics cards

The term “dedicated” refers to the fact that dedicated graphics cards have RAM for the cards use, typically it is built-in which enhances the processing speed and reduces time in graphics operation. The more the ram in the card, the higher the graphics can be processed with certain levels of quality.


Copyright © 2009 Elsevier, Inc. All rights reserved
Figure 1.4 Intel and AMD CPUs


As figure 1.4 shown, using a 16X PCI-e as a link, GPU inside dedicated card can perform high-speed graphics translation by consuming its own RAM in GPU memory, thus the high quality graphics solution can be achieved and with it a failed one can be easily replaced.






Application

General Purpose GPU (GPGPU) and Compute Unified Device Architecture (CUDA)


GPGPU[3] is the technique of using a GPU, which handles computation only for computer graphics to perform computation in application traditionally handled by the CPU and it is a modified version stream processing which includes unified shaders as discussed before, an example of GPU computing[9].



CUDA5 is NVIDIA’s parallel computing architecture that allows specified functions from a normal C program to run on the GPU’s stream processor. Using CUDA GPUs will become accessible in calculation and operation of graphics just like CPUs.


Copyright © Wikipedia, the free encyclopedia.
Figure 1.5 Example of CUDA processing flow.



As figure 1.5 shown, the steps of flow include:

- Copy data from main memory to GPU memory

- CPU instructs the process to GPU

- GPU executes parallel in each core

- Copy the result from GPU memory to main memory







Conclusion

The need for faster GPU is increasing day by day, the market for GPUs is moving fast. With newer technologies developed on GPUs, great stuff is expected to see in the future in the field of processing.









Exercise

Q1. What is the difference between GPU and CPU?
Both the GPU and the CPU are microprocessor. However, GPU is a dedicated hardware used for graphics-related calculation especially in matrix or vertex operation while CPU is the portion of a computer system that carries out the instructions of a computer program.


Q2. Is it possible to use the GPU on a mobile device to accelerate a particle physics engine?
It is certainly possible to run a particle simulation on the GPU using vertex shaders, but you may be very limited in the kinds of behaviors that you can simulate, depending on the capabilities of the GPU. 


Q3. What popular software takes advantage of General Purpose GPUs?
Of course almost all modern games use the GPU in some way or another, as do graphics editors and video editing software.


Q4. What do you think the future of GPU as a CPU initiative like CUDA is?
The GPU is a very interesting alternative whenever you do vector-based float mathematics. However this translates to: It will not become main stream. Most mainstream (Desktop) applications do very few floating-point calculations. It has already gained traction in games (physics-engines) and in scientific calculations.






Reference

[1]. Graphics processing unit, Wikipedia, Web site: http://en.wikipedia.org/wiki/GPU

[2]. Minh Tri Do Dinh, GPUs - Graphics Processing Units[Electronic version], University of Innsbruck, Retrieved 7 July 2008 ,from http://informatik.uibk.ac.at/teaching/ss2008/seminar/gpu.pdf

[3]. Graphics Processing Unit[Electronic version], Kent State University, from http://www.cs.kent.edu/~wcheng/GAworks/GraphicsProcessingUnit.ppt

[4]. John Owens. UC Davis, GPU Architecture Overview[Electronic version], from http://gpgpu.org/static/s2007/slides/02-gpu-architecture-overview-s07.pdf

[5]. NVIDIA© CUDA™, CUDA Architecture Overview, Retrieved April 2009, Web site: http://developer.download.nvidia.com/compute/cuda/docs/CUDA_Architecture_Overview.pdf

[6]. CUDA, Wikipedia, Web site: http://en.wikipedia.org/wiki/CUDA

[7]. GPGPU.org, General-Purpose Computation on Graphics Hardware, Web site: http://gpgpu.org/

[8]. Joey Brakefield, What Is a Graphics Processing Unit?, eHow Contributor, from: http://www.ehow.com/video_4767052_graphics-processing-unit_.html

[9]. NVIDIA, What is GPU Computing?, website: http://www.nvidia.com/object/GPU_Computing.html