How to Create Synthetic Images Using OpenCV (Python)

Praveen Krishna Murthy
Python in Plain English
4 min readJun 19, 2021

--

Separation of foreground and background images to produce different synthetic images. Subjective analysis on the image to check realism.

Source: Google

Introduction

It is evident from the evolution of Deep learning architectures that the foreground object or person can be easily segmented from the background. Once the foreground object is segmented, the object can be overlaid on another image. But this involves time and complexities. The time i’m talking about is in preparing the dataset, network architecture, and training. The complexities are data preparation and model tuning.

Before the evolution of deep learning architectures and breakthroughs of computation power, people used minimum number of images. A minimum number of images were used because there was no breakthrough in computational power. A minimum number of images were generated through synthetic data using foreground, background separation, and also synthetic data generated from 3D CAD models. Let’s go back in time and see whether we can see the realism in these data. Also, let’s learn a little bit of open-cv which comes in handy during image-data processing.

Block Diagram:

Primarily, the steps followed to generate the synthetic images can be summarised in the block diagram as below.

A sneak peak at the process

Description

Here is the process explained with words and reasoning:

So, first we take the foreground image and make it a transparent image. Transparency is important because it will ease the blending in to the background image. Basically, it is the process of converting the RGB to RGBA, adding an alpha channel, and representing it as tuple. Then read the background image and see to it 4 channels are present instead of 3 channels.

Once the foreground and background image is ready, let’s plan of blending both as one. So now I have to think about the size of the foreground image, obviously, it can’t be the same size as the background. Resize the foreground image and set a position in background image. Now blend-in the image with this x and y offset into background using for loop and slicing methodologies.

Now let’s dive into the code and check the result obtained from adding synthetic images

Code

Let’s import the required libraries

import cv2 
import math
import numpy as np
from PIL import Image

Give the path, convert the foreground image to transparent

foregrnd_img = 'foreground.jpg'
backgrnd_img = 'background.jpg'
def convertTransparent(path):
img = Image.open(path)
img = img.convert("RGBA")
datas = img.getdata()
newData = []
for item in datas:
if item[0] == 255 and item[1] == 255 and item[2] == 255:
newData.append((255, 255, 255, 0))
else:
if item[0] > 255:
newData.append((0, 0, 0, 255))
else:
newData.append(item)

img.putdata(newData)
return imgfore_img = np.array(convertTransparent(foregrnd_img))
fore_height, fore_width, fore_channels = fore_img.shape
fore_img = image_resize(fore_img, 100)

Resizing the foreground image, setting the offset for the same again background image, and blending in both the image can be given as below.

# set channels to 4
if fore_channels < 4:
fore_img = cv2.cvtColor(fore_img, cv2.COLOR_BGR2BGRA)
try:
# read the background image
backgrnd_img = cv2.imread(backgrnd_img)
backgrnd_height, backgrnd_width, backgrnd_channels = backgrnd_img.shape
# check if backgrnd_img as same channels as foregrnd_img
if backgrnd_channels < 4:
backgrnd_img = cv2.cvtColor(backgrnd_img, cv2.COLOR_BGR2BGRA)
TARGET_PIXEL_AREA = (backgrnd_height * backgrnd_width) * 0.05# resize the foregrnd based on backgrnd_img size
ratio = float(fore_img.shape[1]) / float(fore_img.shape[0])
fore_img_new_h = int(math.sqrt(TARGET_PIXEL_AREA / ratio) + 0.5)
fore_img_new_w = int((fore_img_new_h * ratio) + 0.5)
fore_img = cv2.resize(fore_img,(fore_img_new_w, fore_img_new_h))# set the x-y off-set
x_offset= 300
y_offset= 135
y1, y2 = y_offset, y_offset + fore_img.shape[0]
x1, x2 = x_offset, x_offset + fore_img.shape[1]
alpha_fore_img = fore_img[:, :, 3] / 255.0
alpha_l = 1.0 - alpha_fore_img
# create RGBA image
for c in range(0, 3):
backgrnd_img[y1:y2, x1:x2, c] = (alpha_fore_img * fore_img[:, :, c] + alpha_l *
backgrnd_img[y1:y2, x1:x2, c])
# save the image
fResult = "synthetic_image.png"
cv2.imwrite(fResult, backgrnd_img)
except IndexError:
pass

The results for the following code and experiment is as below:

Foreground Image

Foreground Image with white background

Background Image

Any Background Image

Synthetic Image with Foreground and Background

Foreground and Background fused Image

Conclusion

Clearly, the synthetic image generated looks unrealistic. But for an objective such as object detection, this can be used along with the bounding box coordinates to train the model. The bounding box coordinates helps in exactly understanding the pattern we are looking for in scene or image.

But if we use these image data for classification tasks it would not be good enough for the objective. So the realism is not maintained. Maybe one can look for deep learning methods using GANs to generate images with different backgrounds to maintain realism. That can be the next project in the pipeline. 😉

More content at plainenglish.io

--

--