DigiLib GPU extension

image

Digilib is graphical library written in C, developed by Adam Herout and Pavel Zemèík. Aim of this project is to create set of functions, running on GPU to accelerate image processing. It should be almost transparent to program, using digilib (almost means there is possibility of some additional optimalizations and of course a few initialization functions)

See digilib homepage.

status: complete, yet to debug CPU reference routines and FFT and improve expression evaluator (operation scheduling, temp resource allocation)
language: C / C++ (C interface and C++ engine)
os: os independent

First real results! All the operations are now working on GPU, their CPU equivalents are worked on, day and night. For now, you can see how fast this is going to be:

operation time @ 256 x 256 8bit RGBA fillrate
ImageImageAdd(p_dest, p_src1, p_src2) 0.105581 ms 620.720594 Mpix/s
ImageImageMult(p_dest, p_src1, p_src2) 0.104784 ms 625.440271 Mpix/s
ImageImageSub(p_dest, p_src1, p_src2) 0.105940 ms 618.613186 Mpix/s
ImageImageDiv(p_dest, p_src1, p_src2) 0.118815 ms 551.581884 Mpix/s
ImageImageBlend(p_dest, p_src1, p_src2) 0.108930 ms 601.635003 Mpix/s
ImageImageDecal(p_dest, p_src1, p_src3) 0.106401 ms 615.935221 Mpix/s
ImageInvert(p_dest, p_src1) 0.105159 ms 623.208359 Mpix/s
ImageAbs(p_dest, p_src1) 0.104733 ms 625.745922 Mpix/s
ImageImageMin(p_dest, p_src1, p_src2) 0.106247 ms 616.824894 Mpix/s
ImageImageMax(p_dest, p_src1, p_src2) 0.106563 ms 614.997094 Mpix/s
ImageThresh(p_dest, p_src1, .5f) 0.106709 ms 614.156536 Mpix/s
ImageLog(p_dest, p_src1) 0.106350 ms 616.229134 Mpix/s
ImageExp(p_dest, p_src1) 0.106403 ms 615.920339 Mpix/s
ImageSqrt(p_dest, p_src1) 0.148221 ms 442.150207 Mpix/s
ImagePow(p_dest, p_src1, 2.0f) 0.149427 ms 438.582068 Mpix/s
ImageScale(p_dest, p_src2, 1.5f, -.5f) 0.108324 ms 604.999379 Mpix/s
ImageColorMatrix(p_dest, p_src2, p_mat4x4, false) 0.108516 ms 603.927137 Mpix/s
ImageGreyscale(p_dest, p_src2) 0.105775 ms 619.577177 Mpix/s
ImageResize(p_dest, p_src2) 1.182514 ms 55.420917 Mpix/s
ImageRotate(p_dest, p_src2, 30deg, false) 1.134724 ms 57.755021 Mpix/s
ImageTransform(p_dest, p_src2, ...) 1.073847 ms 61.029151 Mpix/s
ImageSquareErode(p_dest, p_src1, 8, 8) 8.859968 ms 7.396867 Mpix/s
ImageSquareDilate(p_dest, p_src1, 8, 8) 9.873257 ms 6.637729 Mpix/s
ImageSquareOpen(p_dest, p_src1, 3, 3) 3.776635 ms 17.353014 Mpix/s
ImageSquareClose(p_dest, p_src1, 3, 3) 3.771434 ms 17.376946 Mpix/s
ImageDiamondErode(p_dest, p_src1, 8, 8) 9.608934 ms 6.820319 Mpix/s
ImageDiamondDilate(p_dest, p_src1, 8, 8) 10.509161 ms 6.236083 Mpix/s
ImageDiamondOpen(p_dest, p_src1, 3, 3) 3.552302 ms 18.448883 Mpix/s
ImageDiamondClose(p_dest, p_src1, 3, 3) 3.551453 ms 18.453292 Mpix/s
ImageImageErode(p_dest, p_src1, p_src4) 10.521133 ms 6.228987 Mpix/s
ImageImageDilate(p_dest, p_src1, p_src4) 10.490948 ms 6.246909 Mpix/s
ImageImageOpen(p_dest, p_src1, p_src4) 20.963980 ms 3.126124 Mpix/s
ImageImageClose(p_dest, p_src1, p_src4) 20.952363 ms 3.127857 Mpix/s
ImageImageHitMiss(p_dest, p_src1, p_src5) 2.403412 ms 27.267904 Mpix/s
ImageMean(p_dest, p_src2, 256, 256) 53.493827 ms 1.225113 Mpix/s
ImageMedian(p_dest, p_src1, 15) 8.967916 ms 7.307829 Mpix/s
ImageLocalMin(p_dest, p_src2, 8, 8) 1.724755 ms 37.997284 Mpix/s
ImageLocalMax(p_dest, p_src2, 8, 8) 1.724741 ms 37.997599 Mpix/s
ImageConvolveSeparable(p_dest, p_src2, G16, G16) 4.186004 ms 15.655981 Mpix/s
ImageConvolve2D(p_dest, p_src2, Gauss16) 38.650736 ms 1.695595 Mpix/s
ImageGetRGBAHistogram(p_src1, 256, ...) 14.407298 ms 4.548806 Mpix/s
ImageGetMinMax(p_src1, p_minmax) 6.911876 ms 9.481652 Mpix/s

version history

versionchanges from previous version
v0-
v1now it works
v1.01updated to work with lame r17 and under linux, fixed new (163.71) NV drivers bug

a few words on how it works

Every ImageStruct has it's assigned texture. (assignments based on ImageStruct address) Those asignments are created when first doing some operation with a given image. Operations may fail in case it was unable to create texture to hold the image (max texture size limit, image format limit)

Every operation is represented by some shader (almost every, some basic operations can be taken care of just by opengl blending or image processing subset) which is loaded in the moment it's necessary (i.e. when calling some operation first time, it may take more time because shader is being compiled)

Every operation (by default) uploads source images from system memory to texture memory, renders to dest image texture and transfer it to system memory. Such a behavior can be disabled by calling:

void Disable_AutoDownloads()
void Disable_AutoUploads()

Where download means transfer from texture memory to system memory and upload vice versa. In case the source image has no texture associated with it, it's going to be transferred anyway. There are two complementar Enable_* functions as well. There can be situation it's necessary to upload just some images. It can be done either manually by calling:

bool Upload_ImageStruct(const ImageStruct *p_image);
bool Download_ImageStruct(const ImageStruct *p_image);
bool Upload_Async_ImageStruct(const ImageStruct *p_image, int *p_fence_id);
bool Download_Async_ImageStruct(const ImageStruct *p_image, int *p_fence_id);

(note p_fence_id is output parameter, it will contain id of fence, i.e. OpenGL object that can be used to query transfer completeness)

Or by setting "dirty flag". Every image has dirty flag as well. It's a bit, telling wheter image is up-to-date in system memory or in texture memory. Normally those flags are automatically set by image operations, uploads and downloads. In case it's necessary it can be done manualy using function:

void Set_DirtyFlag(const ImageStruct *p_image, bool b_current_on_server);

In case parameter b_current_on_server is true, it means image is up-to-date on server-side (texture memory). In case it's false, it's up-to-date in system memory. Image uploads can be triggered by invalidating the texture (calling with parameter b_current_on_server = false). For now it doesn't affect image downloading as the system memory version of image is marked outdated (dirty flag bit is set high) by image processing functions. Manual image transfer functions doesn't read the flag, they only set it.

use case

For the simplest processing, it's necessary only to include proper headers and to call some initialization code on the beginning of program and some cleanup code on it's end. (cleanup -should- be made automatically by OS's opengl drivers, same as OS frees program's allocated memory when it ends) Here is the simplest use case:
#include <stdio.h> // fprintf
#include <image.h> // ImageStruct
#include "DigiLib_Ext.h" // GPU functions

Init_OpenGL();
// you can use this function from ÜberLame or use GLUT

if(!b_FramebufferObjectSupported()) {
    fprintf(stderr, "error: framebuffer objects not supported, "
        "unable to process images\n");
    return -1;
}
if(!b_NPOTTexturesSupported())
    fprintf(stderr, "warning: non-power-of-two textures not supported\n");
if(!b_FloatTexturesSupported())
    fprintf(stderr, "warning: float textures not supported\n");
// check OpenGL capabilities

// from now on, GPU image processing is transparent

ImageStruct *a, *b, *c; // filled by some data

ImageImageAdd(a, a, b); // a = a + b
ImageImageAdd(a, a, c); // a = a + c
// calculate image sum into a (example image processing)

// image processing end

Free_OpenGL_Objects();
Shutdown_OpenGL();
// cleanup, shutdown opengl

It's kind of dumb, because when calculating a = a + b, image data of a and b are copied to textures, then adding shader is executed and the result is copied back to image a. Then, in next step images a and c are copied to textures (texture for image a actualy contains up-to-date image), images are added and data of image a is downloaded back to a. We could save one image upload and image download here.

The simplest sollution would be to alloc images using

ImageStruct *p_Create_GL_Image(int n_width, int n_height, short n_format);

which creates imagestruct in graphics card memory, mapped to system memory which should save most of unnecessary copying on PCI-X systems. Note memory mapping has to be repeated after every operation with the image (no matter wheter source or destination image) and so it's internal data pointer may change. This will make no difference on AGP systems.

Another, more complicated way is enabling and disabling transfers as follows:

#include <stdio.h> // fprintf
#include <image.h> // ImageStruct
#include "DigiLib_Ext.h" // GPU functions

Init_OpenGL();
// you can use this function from ÜberLame or use GLUT

if(!b_FramebufferObjectSupported()) {
    fprintf(stderr, "error: framebuffer objects not supported, "
        "unable to process images\n");
    return -1;
}
if(!b_NPOTTexturesSupported())
    fprintf(stderr, "warning: non-power-of-two textures not supported\n");
if(!b_FloatTexturesSupported())
    fprintf(stderr, "warning: float textures not supported\n");
// check OpenGL capabilities

// from now on, GPU image processing is transparent

ImageStruct *a, *b, *c; // filled by some data

Disable_AutoDownloads(); // disables image downloading
Disable_AutoUploads(); // disables uploading of up-to-date images

ImageImageAdd(a, a, b); // a = a + b
// a and b weren't uploaded before so they will be uploaded now
// downloads are disabled so a will not be downloaded

Enable_AutoDownloads(); // re-enables image downloading

ImageImageAdd(a, a, c); // a = a + c
// a was uploaded before and it's associated texture contains
// fresh data so it won't be uploaded now
// c wasen't uploaded before so it will be uploaded now
// downloads are enabled and a will be downloaded

Download_ImageStruct(a);
// just another way of getting image a to system memory

// image processing end

Free_OpenGL_Objects();
Shutdown_OpenGL();
// cleanup, shutdown opengl

downloads

versionrelease datefilerelease notes
v02006-02-02digilib_gpu_ext_v000.zipincomplete. see 'DigiLib_Ext.h', you can send me your comments
v02006-02-05digilib_gpu_ext_v010.zipvs2005 version (still not functional)
v02006-03-20digilib_gpu_ext_v020.zip"cranes" version (ops on images from avi file)
v02006-05-22digilib_gpu_ext_v030.zipfirst working release version
v12007-10-10digilib_gpu_ext_v100.zipcomplete distribution with extensive documentation
v1.012007-11-26digilib_gpu_ext_v101.zipsource code update (lame r17, linux, new (163.71) NV drivers)