The Depth II: Block Matching

Published in

Python in Plain English

5 min readDec 24, 2020

Hello everyone! Welcome to the second part of the Depth. I will briefly explain the block matching algorithm. I assume that you have read the first blog and if you haven’t, please do so. We will use the codes from my stereo depth project.

In the first blog, we took left and right images and calibrated the cameras, individually and together. After the rectification process, we had stereo images that have the same points in the same line, which means that if we took a pair of images that shows a pen, the point of the pen should be (X1, Y) and (X2, Y) where Y is the line number and X1 and X2 are the columns from the images.

In this case, the difference between X1 and X2 gives us the disparity value. We have mentioned that if we look at a close object with the left eye is closed and vice versa, the location changes according to the opened eye perspective. And the difference is more and more significant when the object is more distant. This means that if the disparity value is smaller, the object is closer. This operation can be done for each point, but it’s not efficient because:

It’s a slow process. Time is valuable.
If rectification is not good enough, it’s not ideal to work with each point. We need to reduce the error and processing pointwise won’t help in this case.

That’s why we use block matching. Some of them are using optical flow(for image streams), or reduce the image size to use less processing power. I will mention the second one briefly, if you want detailed info please research the related papers.

We are starting with a block from the left image. Using the left is not mandatory, I use it that way. Too big blocks create smoother images, too small blocks will create noisy images. You should experiment with it to find the sweet spot. After choosing the leftmost block, we are searching from left to right and try to match as much as possible with the right image. Since the images are rectified, one axis is enough for searching.

Let’s take the block showed as a square.

Searching in one axis. We are doing it from left to right but you can follow any approach as you like.

How to match the blocks? There are various formulas. Most people use the Sum of Absolute Differences and Sum of Squared Differences. You can research the related papers and how people experimented with it. In these methods, the smaller result means much similar. That will be our disparity value for the chosen block.

Example of the Sum of Absolute Differences.

(Left the right) Left rectified image, Right rectified image, Disparity map after the block matching process.

Since we covered the basics, let’s check the code. After the block matching, I used the WLS(Weighted Least Square) filter to get smoother and closer disparity values which can represent the image better. The function:

You can check the project on Github. The code is basically creating a matcher. OpenCV has poor documentation(I added some, yet it’s only the tip of the iceberg) but the coding side is really easy. Let’s look at the parameters:

minDisparity: Minimum disparity value. Normally we expect 0 here but it’s sometimes required when the rectification algorithm shifts the image.
numDisparities: Max disparity value. Has to be greater than zero. Defines the disparity boundary.
blockSize: Block size for the matching blocks. Recommended is [3–11] and an odd number is recommended since odd-sized blocks have a center.
P1 ve P2: Responsible for smoother images. The rule is P2>P1.
disp12MaxDiff: Maximum pixel difference for the disparity calculation.
preFilterCap: A value to be used before the filtering. Before the block matching, a derivative of the image x-axis is calculated and used to check the boundary [-prefiltercap, prefiltercap]. The rest of the values are processed with the Birchfield-Tomasi cost function.
uniquenessRatio: After the cost function calculation, this value used to compare. Suggested value range [5–15].
speckleWindowSize: Filter to remove big values and get a smoother image. Suggested value range [50–200].
speckleRange: Used to check disparity differences with the neighbours to get a smooth image. If you decide to experiment with that, I suggest 1 or 2. Be careful, the values will be multiplied with 16! OpenCV does it so you don’t have to.

In the code we use SGBM. Left and right disparity map is created and WLS filter used for a smooth optimized image. I have not mastered this topic, I experimented with the values so I won’t go into more detail.

The project can be used for single and stereo camera calibration, and disparity map calculation. After that, you can do whatever you want with the results. In my project, I use it for obstacle detection by checking if there is an object ahead or too close. It is all about drones and most importantly, human safety! You can use your imagination and create a lot of stuff with that. Please visit the project GitHub and star it if you like.

Stereo Vision is completed at a basic level. You have some idea of how to create projects, do some educated research, and discuss with people! I hope I will document my knowledge about different topics and help people on their journey. Until then, goodbye!

REFERENCES

https://upload.wikimedia.org/wikipedia/commons/thumb/0/02/Lecture_1027_stereo_01.jpg/440px-Lecture_1027_stereo_01.jpg
http://mccormickml.com/2014/01/10/stereo-vision-tutorial-part-i/
http://www.diegm.uniud.it/fusiello/teaching/mvg/stereo.pdf
Learning OpenCV 3: Computer Vision with Python, page 79

More content at plainenglish.io

The Depth II: Block Matching

Written by Ali Yasin Eser