We want to improve an image so that it becomes easier to analyze for CV.
We'll not go over too many image processing techniques. Intensity transformation are image processing operators aimed at enhancing the quality (the contrast) of the image. At most such operators rely on the computation of the gray-level histogram of the input image, we start by defining this useful function. The gray-level histogram of ani mage is simply a function associating to each gray-level the number of pixels in the image taking that level.
Computing the histogram is straightforward: we define a vector having as many elements as the number of grayscale levels, then scan the image to increment the element of the vector corresponding to the level of the pixel.
These are operators that modify the histogram to improve the quality of the image. The intensity of a pixel in the output is computed only basing on its intensity in the input. It's just a mapping function from a gray level to another one.
An example of this is the thresholding operator, where we can set a threshold graylevel and set to black all the intensities under that threshold and to white everything over that.
This is used to get ultra simple images, maybe to get objects.
These operators can be implemented as a LookUp Table, which is often convenient.
We can formalize these intensity transformations as follows:
So, apart from thresholding, we can cite linear contrast stretching. Now, we don't know anything about the content of the image, but from the histogram we can get an approximate quality determination. For example, we can check if the full range of grayscales is present: the contrast may not be good.
So, if we stretch the graylevel range, we should improve the contrast.
We are spreading noise too, but we'll not deal with this now.
The second equation is the most common formulation, where we remap the range into the whole range.
If the histogram is stretched but with a peak, it is kind of useless (minimum around 0 maximum around 255). Therefore, it needs a peak only with no tails. To solve this, we can apply the linear contrast stretching by using percentiles instead of min and max.
Until now, we treated all the pixels the same way, but working on the interesting area only might be useful.
This can be done with non-linear operators, like the exponential (aka gamma correction), where fiven the output and
the input graylevel, we compute
with
where we increase contrast in bright areas if
in dark areas if
.
Now, if for example we had the range with
, which gets the darkest area a higher contrast, while the brighter ones are going to be squeezed into a smaller range.
The formula, with typical levels becomes .
The operator is called gamma correction because in the past it was used in CRT monitors, which resulted in an exponential operator of gamma , so they used a negative gamma as countermeasure.
This is the most known operator. It uniformly spreads pixel intensities aceoss the whole available range, which usually improves the contrast. Why do we do this? Improving the contrast, not flatting the histogram. Unlike linear stretching, which requires manual intervention to set min and max, this one works automatically.
To find the mapping, considering a continuous variable x and a strictlt monotonically increasing function T:
The goal is turning the histogram into a flat one, using the theory of random variables. We know that there's a link between the PDF and the graylevel histogram.
Now, denoting as the PDF of x and
the PDF of y, as
is monotonically increasing:
We take an infinitesimal interval of x, mapping to
. What if we pick any
in the interval? Due to T monotonically increasing, we know that it is in the interval. So, the relation between the probabilities of RV x and y in the interval, we know that the probability of x being in the interval is the same.
So, the probability is and if we take an infinitesimal interval we can establish this relationship between these two probabilities: the probability of x and y to belong to their infinitesimal intervals is exactly the same, which allows deriving the pdf of y as a function of T and the pdf of x:
Note that is the derivative of the inverse function :
.
Now, what kind of function T should we use if we want a uniform result? We choose the Cumulative Distribution Function CDF, which is guaranteed to map into and be monotonically increasing:
So, our previous relationship now becomes
Now, if T is that function, its derivative is exactly the PDF getting to
. We got a RV whose PDF is always 1, therefore a uniform random variable.
We have thus found that by mapping any continuous random variable through its cdf we obtain a uniformly distributed random variable. Now, how can we use this to equalize an image?
We proceed by discretizing the previous result, i.e. by considering the cumulative PMF of the discrete RV associated with the graylevel of a pixel, whose PMF is given by the normalized histogram.
So, these operators are sometimes called local operators. We now compute the output basing on the input pixel as before, and its neighbourhood (aka support of the pixel).
We use these for denoising and sharpening.
An important subclass is called linear shift-invariant operators, which we'll consider first.
Straightforward extension of 1D signal theory dictates their application to consist in 2D convolution between the input image and the impulse response function (point spread function or kernel, i.e. the output of the pixel when the input is an impulse) of the LSI.
First of all, considering a 2D signal let's call an operator
which generates the output signal. T is linear iff
.
If the input is a weighted sum of signals, the output is too.
Shift invariance is defined as: so what we get is the same shift in the output as in the input.
Let us now assume and pose
which is the output of the operator when the input is
.
Due to the two properties, the output is the same weighted sum, shifted.
If the input is a weighted sum of displaced elementary functinos, the output is given by the same weighted sum of the displaced responses to elementary functions.
We know that if an operator is an LSI (linear/shift-invariant), if the input can be expressed as a weighted sum, it turns out that the output is given by the same weighted sum (the aren't gonna change) of the responses to the elementary functions shifted by the same quantity.
This is useful: every signal can be expressed as a weighted sum of elementary functions. More specifically, a sum of displaced unit impulses (dirac delta function):
We normally have a sum of a certain number of functions , but now this is an infinite sum. We have this double integral summing across the whole 2D plane. The weights are the and not the anymore.What we can see is that the amount of shift is given by . So, how can we read this formula? This is expressed as impulses which can be located everywhere, and each fo them is multiplied by the value of the function in that position. We are seeing the function as impulses multiplied by the value of the function in the position they are located.
This property (that we can see a function as a weighted sum of weighted impulses) is known as the sifting property of the unit impulse.
The output we get if we feed the operator by the elementary function dirac delta is known as
, the impulse response or point-spread function:
.
So, we know that for whatever input signal (every one of them can be expressed in this form) we can get the output of an LSI operator in this form.
Now that we've found this formula to compute the output of T, a convolution gets applied. More precisely, since we're working with 2D continuous signals, this is known as a **2D continuous convolution**.We often denote the convolution operation by the symbol :
A practical interpretation of convolution: for every point in the domain, we're kind of multypling them (by a value linked to the function's value) and adding up all these products.
So, we have two functions and
defined in the domain known as plane
.
The first square is representing the set in the domain in which
is non-zero. Then we have
, another function which is non-zero in another region. Usually we consider
as the input and
as the filter: this is why
has a smaller square. So, now,
represent the set of values
takes in the four subregions. We said convolution is about multiplying corresponding values, and if we look at
in the definition, it appears unchanged, while
is not appearing as
but is manipulated. We have
: this means that we're flipping
around the origin.
not only undergoes a flip, but a shift too by
. So, to compute the convolution at
we leave
unchanged, then pick
, reflect it around the origin and shift it at
. Then, we multiply them together and sum them.
We can introduce the correlation:
Accordingly, the correlation of vs : Note that correlation, differently from convolution, **is not commutative**.Convolution is about flipping and shifting, correlation is about shifting only. In convolution you take flip and shift, in correlation of
vs
you take
and shift it only. Now, because
gets reflected in convolution, left unchanged in correlation, there's a special case in which the two coincide.
If the function is symmetric around the origin (which often happens), if you flip you don't really change anything!
Let us now consider a discrete 2D KSI operator, , whose response to the 2D discrete unit impulse (Kronecker Delta Function) is denoted as
.
Mean filtering is the simplest way to carry out an image smoothing (i.e. a low-pass filtering). Note that the notion of frequency in images is applicable (Fourier's theory). When you low-pass a signal, remind that high frequencies are responsible for rapid changes in the signal, so the signal will be smoother. This is foten aimed at image denoising, though sometimes the purpose is to just cancel out small-size unwanted details that might hinder the image analysis task.
Note that noise is usually in the high frequencies!
Another reason to perform smoothing is to create a so-called scale-space, which is a representation made of multiple images, smoothed by larger and larger filters, used, for example, to recognize objects.
Scale is the term used to denote size in the image: a small scale object occupies a small portion of the image.
The mean filter just replaces each pixel intensity by the average intensity over a given neighbourhood.
Formally, a mean filter is an LSI operator, but in practice we can just compute the mean. We can use box filtering to efficiently compute the mean by incremental calculation.
In box filtering, we proceed adding a column and removing another one. Calling these columns and
. So, when we compute the sum, given the sum at position
, we simply add a column's sum and subtract the other one. Doing that from scratch, we would need
computations, resulting in a complexity of
, while using box filtering results in a complexity of
:
The latter is sometimes called salt and pepper noise, being built by adding outliers.
This is the best filter among the linear operators. It's the most widespread for smoothing operations.
It's a filter whose kernel is a Gaussian function. Since we're dealing with 2D signals, this will be a 2D Gaussian.
Note that a 2D Gaussian is just the product of 2 Gaussians along x and y.
The larger the , the stronger is the smoothing filter. This can be understood by observing that as
increases, the weights of closer points get smaller while those of farther points become larger.
Another way of proving this is computing the Fourier transform, which is another Gaussian with sigma , meaning that the higher the
the narrower the bandwidth of the filter.
If the Gaussian is large in the signal domain, it shall be narrow in the Fourier domain, and vice versa.
The Gaussian is more effective than the Mean filter, as the frequency response of the former is monotonically decreasing, while the latter exihibits significant ripple.
The discrete Gaussian kernel can be obtained by sampling the corresponding continuous function, which is however of infinite extent. A finite size must therefore be properly chosen.
Sigma is what decides how many coefficients we need for a precise approximation of the continuous filter: the larger the size of the kernel, the more accurate the approximaiton. Note that the computational cost rises with filter size, and the gaussian gets smaller when we move away from the origin. Therefore, we can use a rule of thumb: it is ok if we use a (size of the kernel,
squared for 2D) equal to
.
It may be convenient/mandatory to convolve the image by an integer rather than floating point kernel.
An integer Gaussian kernel can be obtained by dividing all coefficients by the smallest one, rounding to the nearest integer and finally normalizing by the sum of the integer coefficients.
Another important thing is how we can make the filter faster. We can deploy the separability property, due to the 2D Gaussian being the product of 2 1D Gaussians, the original 2D convolution can be split into the chain of two 1D convolutions, i.e. either along x first and then along y.
This is a non-linear filter where each pixel intensity is replaced by th median over a neighbourhood, the median being the value falling half-way in the sorted set of intensities (50-th percentile).
This is able to deal with impulse noise too, without introducing significant blur. Yet, gaussian like noise can't be solved with a median filter.
This is an advanced non-linear filter to accomplish the denoising of Gaussian-like noise without blurring the image: edge preserving smoothing. It basically only denoises the noisy areas.
Let's start with an example. Let's say we had a filter built like this:
A bilateral filter would smooth the step.
The function of the filter is the product of two Gaussian functions:
where is the distance between intensities, which is large only for those pixels
such that
is near to
.
We want the filter to have a unitary gain, so if we multiply two gaussians, we get that all the coefficients have to be normalized by the sum of all the coefficients.
The bilateral filter is not super-fast: it is computationally pretty heavy.
This is a more recent edge preserving smoothing filter, based on the key idea that the similarityamong parches spread over the image can be deployed to achieve denoising.
How large is the computational complexity of this? If we have pixels, we'll get a complexity of
: for each pixel, we have to take a weighted sum of all the other pixels in the image.




