CS593- Ray Tracing Algorithms
Fall 2011

 

 

 

Week 5 - Oct 7, 2011

GPU Implementation of a disk drawing program

Intitial Coding code

This week I started programming in openCL. I began developement on a machine with a core 2 duo 2.66 ghz processor with a gforce 8500gt. I finished a simple disk drawing program as one of my first opencl programs.

In my program I randomly generated 100 circles on a 1500x1500 grid and rendered the output to a ppm file. The program was divided into a 2 dimentional workload. Each pixel was assigned to a work-item and had access to a an array of disk structures which was passing to the global memory of the gpu. Each work-item was responsible for calculating it own color.

 
For each pixel 
    // set the intial color of the pixel to black

    For each disk{
    // - Calculate the distance from the current pixel, to the disk

    // -if the distance is less than the disk's radius then
    //    calculate the color of the pixel and add that to the current pixel color
    }
			

I ran the executable without and optimization and it ran with times around 40 seconds for the cpu implementation and 13secs for the gpu implementation. The results were promising, but compiling in a optimization on, the cpu run time sped up so that it's execution time was only 9 seconds.

summary of intitial results

  CPU GPU
100 circles on 1500x1500 grid 40 secs .3 secs
500 circles on 1500x1500 grid 1 min 38 secs 1.5 secs

After I did some research I found out that the peek processing rate of the gpu I was using is about 43 gflops and the core 2 runs about 38 gflops. That would explain why the run times two of the two different implementations were similar. Which that in mind, I decided to run my exectuable on a computer with a faster GPU to see what kind of speedup I can get on a GPU.

I setup my main desktop which is a faster computer. It is running with a core 2 duo 3.0 ghz with an Ati 4670 as the graphics card. I had to installed Ubuntu along side of Windows and setup all of the libraries and programs that I normally use. Installation of the programs and the libraries I was used took in total about a day to setup.

Initial Runs on main desktop

The speedup on using the ATI 4670 was a considerable amount. The peak processing rate for the ATI 4670 is around 443 gflops. Runing the program with 100 circles at 1500x1500 resolution with the GPU implementation ran in about .3 seconds and about 40 seconds on the CPU. Bumping up the circle count from 100 to 500, the GPU ran in 1.5 seconds, and the cpu ran in 1 minute and 38 seconds. Also running without debug mode, the cpu implementation ran in 32 seconds, which is about 20 times slower than the gpu implementation.

summary of results on main desktop

  CPU GPU
100 circles on 1500x1500 grid 40 secs .3 secs
500 circles on 1500x1500 grid 1 min 38 secs 1.5 secs

 

GPU implementation of a Raytracer (OpenCl) - code

I converted part of my ray tracer into openCL. Right now it limited to only to spheres, and light sources.

The code has been coded such that array of view rays are generated and is passed to opencl along with a list of spheres, and lightsources.

Basic Algorithrim

    Generate Rays for each pixel
    
    Copy the rays, and list of objects and light sources 
    to GPU Memory
    
    For each ray ( each ray is handled by a GPU work-item ){
    
        Intialize the color of the ray to black
        For each object in the scene
            Find the closest intersection
        
            If there is an intsertion
            For each lightsource{
                see if there is an object inbetween
                if not,  calculate the color contribution of the light
                and add it to the ray color

                

    
        Write the color of the ray to GPU global memory
    }    
    
    Copy the color array of each pixel from the GPU to main memory
    
    Write color array to File
            

Execution Times code

The results were good for the GPU were good, with a execution time up of about 3 times Better than the CPU. The results were not as good as in the disk drawing program. This is where the code would benefit from better memory allocation and management in the code.

summary of results





output : grid of 5x5x5 spheres



125 randomly generated spheres



1000 randomly generated spheres
  CPU GPU
Grid of 5x5x5 spheres 40 secs 12 secs
125 randomly generated spheres 40 secs 12 secs
1000 randonly genreated spheres 2 mins 40 seconds





output : 1000 disks 1500x1500 resolution