A fixed point arithmetic approach to floating point image processing algorithms


Image processing involves complicated mathematics operations so as a natural consequence, almost all standard data structures for image handling store pixel values as float variables. The c++ opencv library implements iplimage structure to store an image array (now replaced by the mat structure). An important operation in image processing is the 2d convolution. Smoothing of images is achieved by convolving images with a gaussian filter(a kernel whose values are generated by the gaussian function). The kernel values, generated by this complicated function, are float values with 6 trailing digits after the decimal in the c++ convention.

Floating point arithmetic is considerably slower than fixed point arithmetic. Also, fixed point arithmetic is fairly tractable while dealing with reconfigurable platforms as design with floating point libraries can get pretty complex pretty soon. There are some facts related to fixed point arithmetic which can come in handy while carrying out basic mathematical operations. To add, just align the decimal point and add. To multiply, the implementation can consist of just consequent shift and add operations. If you want to multiply or divide a fixed number by a power of 2, you just have to left or right shift the decimal point respectively. Floating point on the other hand consists of three basic entities called the sign, the exponent, and the mantissa. In C++, you can convert a float number to fixed via two major ways - a)casting a floating point data type as int and b)casting a floating point data type to int and shifting left by a certain number of binary digits (using the operator <<)

Both ways would yield different results in further mathematical calculations. The second method provides the luxury of choosing a desired decimal precision; e.g. if you want to have 4 digits after the decimal point, just shift your (int) casted float number by 4 digits to the left, carry out the mathematical operations and shift the number to the right by four digits again. If these concepts seem foreign, consult any of the online resources available on fixed point arithmetic for computers. A good starting point would be stackoverflow.

A popular image superpixel segmentation algorithm "Efficient graph based Image segmentation" was proposed by P. Felzenszwalb. The algorithm works on the principle of hierarchical clustering and region growing. The algorithm first smoothes the image and then forms disjoint set forests with weighted edges. The weights are the Euclidian distances in terms of intensities of the neighborhood of the disjoint set forests. This functions output is also floating point. A sorting on the weights is carried out and The algorithm then invokes union find to grow regions on that sorted list of weights. This is a sample output when the algorithm is implemented in floating point data type...
While implemented in fixed point (details in the code provided), the result of the segmentation for a typical urban scene are:





The first image to the left goes through the algorithm implemented in fixed point arithmetic, using the C++ techniques explained above. The second one through a floating point implementation. Visually, these images are not very different. Especially combined with a later stage contextual analysis using bayesian graphical models, it seems the two implementations would work almost the same (more on this in a later post). Let's look at a different image.





As far as visual inspection goes, these are also similar. Let's dig a little deeper and see what the numbers compare in the two Implementations. The floating point implementation yields.... Clusters while the fixed point yields..... Clusters. The maximum size of the cluster in the former is.... And the latter is....
Memory efficient algorithms can be designed when these facts are taken into consideration. The opencv library stores the mat structure as a three dimensional array. While this strategy could work on a general purpose architecture without considerable storage constraints, there are several problems. The bigger the memory used, slower the system, even if the processing is super fast e.g. an i7 processor.

Moving the data to and fro consumes energy and hence power. Efficient architectural techniques combined with memory constrained algorithm design could solve these problems. Utilizing a fixed point implementation could form a part of that solution. Especially in scenarios like urban scene segmentation where a certain region of the image is of utmost importance (road for traversability). Another design trick that could be employed while designing hardware is to implement stream based design. Instead of storing intermediate results, they could be streamed to be computed upon. This would reduce storage and compute power required.

Traditional computer vision algorithms still consume lesser compute resources and storage space than the state of the art deep CNNs. The storage required for deep CNNs is in the order of several hundreds of megabytes. In such situations, to ensure relatively low power consumption, one has to attack the system design problem from all three major perspectives - algorithm, architecture and circuit design. Techniques discussed in this post could be employed to cut down on resources and make the even shrinking cnns even smaller to achieve the Holy grail of Neuromorphic architecture design - on Chip learning.




Image segmentation, semantic segmentation, fixed point arithmetic, floating point, computer architecture