CS593- Ray Tracing Algorithms
Fall 2011
|
Week 5 - Oct 7, 2011GPU Implementation of a disk drawing programIntitial Coding code This week I started programming in openCL. I began developement on a machine with a core 2 duo 2.66 ghz processor with a gforce 8500gt. I finished a simple disk drawing program as one of my first opencl programs. In my program I randomly generated 100 circles on a 1500x1500 grid and rendered the output to a ppm file. The program was divided into a 2 dimentional workload. Each pixel was assigned to a work-item and had access to a an array of disk structures which was passing to the global memory of the gpu. Each work-item was responsible for calculating it own color.
For each pixel
// set the intial color of the pixel to black
For each disk{
// - Calculate the distance from the current pixel, to the disk
// -if the distance is less than the disk's radius then
// calculate the color of the pixel and add that to the current pixel color
}
I ran the executable without and optimization and it ran with times around 40 seconds for the cpu implementation and 13secs for the gpu implementation. The results were promising, but compiling in a optimization on, the cpu run time sped up so that it's execution time was only 9 seconds. summary of intitial results
After I did some research I found out that the peek processing rate of the gpu I was using is about 43 gflops and the core 2 runs about 38 gflops. That would explain why the run times two of the two different implementations were similar. Which that in mind, I decided to run my exectuable on a computer with a faster GPU to see what kind of speedup I can get on a GPU. I setup my main desktop which is a faster computer. It is running with a core 2 duo 3.0 ghz with an Ati 4670 as the graphics card. I had to installed Ubuntu along side of Windows and setup all of the libraries and programs that I normally use. Installation of the programs and the libraries I was used took in total about a day to setup.
Initial Runs on main desktopThe speedup on using the ATI 4670 was a considerable amount. The peak processing rate for the ATI 4670 is around 443 gflops. Runing the program with 100 circles at 1500x1500 resolution with the GPU implementation ran in about .3 seconds and about 40 seconds on the CPU. Bumping up the circle count from 100 to 500, the GPU ran in 1.5 seconds, and the cpu ran in 1 minute and 38 seconds. Also running without debug mode, the cpu implementation ran in 32 seconds, which is about 20 times slower than the gpu implementation.summary of results on main desktop
GPU implementation of a Raytracer (OpenCl) - code
|
|||||||||||||||||||||||||||||||
| CPU | GPU | |
| Grid of 5x5x5 spheres | 40 secs | 12 secs |
| 125 randomly generated spheres | 40 secs | 12 secs |
| 1000 randonly genreated spheres | 2 mins | 40 seconds |