Dinghow的个人博客

Current post : 《Parking-slot Dataset Generating with GAN》

1/11/2021 ——

# 1. Introduction

This is the Computer Vision course final project, School of Software Engineering, Tongji University.

Prof. Lin Zhang gave us a real scene dataset with different environments and parking-slot types, and the task is generating parking-slot images, which is a typical style migration problem.

Summary of Image2Image problem: https://github.com/lzhbrian/image-to-image-papers

## 1.1 Backgroud

Domain adaptation is a hot research topic in the fields of machine learning and computer vision. Roughly speaking, methods in this area can be categorized into two classes, pixel-level ones and feature-level ones. Image-level domain adaptation aims to learn the mapping between different visual domains. It is quite useful for augmenting training sets. In this project, your task is to complete a system which can create virtual parking-slot image samples that are realistic looking.

## 1.2 Requirements

1. Develop a virtual parking-slot image generation system using Unity. Taking a real road image as the texture. Then, feed the texture into Unity. Your system is expected to create various parking-slot image samples by adjusting the lighting conditions, the trees, the objects nearby, and the parking-lines.

1. Virtual image samples may be not as real as samples collected from the real world. So, you need to use GAN-based domain adaptation technologies to improve their reality. You may try the following suggested candidate solutions: 1) Learning from simulated and unsupervised images through adversarial training, CVPR 2017; 2) Unsupervised pixel-level domain adaptation with generative adversarial networks, CVPR 2017; 3) Diverse image-to-image translation via disentangled representations, ECCV 2018. To train GAN-based pixel-level domain adaptation networks, you may need to use real parking-slot image samples, which can be obtained here https://cslinzhang.github.io/deepps/.

# 2. Solution

## 2.1 Generate dataset

Unity Scenes are created by @Mei Luo & @Pomevak

We use unity to construct some virtual scenes.

According to the requirements, we generated parking space pictures for different scenes.

We mainly classify parking spaces as follows:

1. Lighting: We simulated the changes in lighting under one day, especially in the case of strong light during the day and sunset
2. Weather: sunny, rainy, cloudy, we have made a weather control system to change the weather in one scene.
4. Obstacles: cars, pedestrians, manhole covers on the ground, fire rafts, short bushes, street lights, etc.
5. Direction of the parking space: horizontal, lateral, oblique
6. Parking line: solid line, dotted line, different mottled degree
7. Ground texture: cement floor, masonry ground, shallow lawn, mooring road surface, etc.

We mainly set up the following scenarios.

### CycleGAN[^1]

CycleGAN is a creative network to solve the unpaired pix2pix domain adapation problem.

The innovation of CycleGAN is that it enables this migration between the source and target domains without the need for one-to-one mapping between training data.

The main part is Cycle-consistency loss:

$$L_{cyc} = E_{x\sim P_{data(x)}}[||G(F(x))-x||]+E_{y\sim P_{data(y)}}[||F(G(y))-y||]$$

$x, y$ are the images from $X$ and $Y$ domains , and $F, G$ are the transfer from $X\rightarrow Y$ and $Y\rightarrow X$

In this method, the image in the source domain is transformed by two steps:

• Map the source image to the target domain
• Returne the generated image in target domain to the source domain, get the generated image, thus eliminating the requirement of image pairing in the target domain.

Using the generator network to map the image to the target domain, and by matching the generator and discriminator, the quality of the generated image can be improved.

CyCADA is a new network structure based on CycleGAN, which adapts representations at both the pixel-level and feature-level. By adding feature loss & semantic loss, CyCADA can perform better on domain adaptation problem, especially on the dataset with semantic information. And the experiment designed by the authors is a transform from GTA game screenshots to real scene, similar to our project.

### Our network

Since our dataset doesn’t have the semantic information, so I just add feature loss to CycleGAN, use resnet as the encoder to extract feature map. Then Compare the features from A domain and regenerated B domain images, I set the initial feature loss weight as 0.2, you can change that to adjust the influence by feature loss. --feature_loss_weight

# module.py
def ResnetEncoder(input_shape=(256, 256, 3),
output_channels=3,
dim=64,
n_downsamplings=2,
n_blocks=9,
norm='instance_norm'):
Norm = _get_norm_layer(norm)

def _residual_block(x):
dim = x.shape[-1]
h = x

h = Pad([[0, 0], [1, 1], [1, 1], [0, 0]], mode='REFLECT')(h)
h = keras.layers.Conv2D(dim, 3, padding='valid', use_bias=False)(h)
h = Norm()(h)
h = keras.layers.ReLU()(h)

h = Pad([[0, 0], [1, 1], [1, 1], [0, 0]], mode='REFLECT')(h)
h = keras.layers.Conv2D(dim, 3, padding='valid', use_bias=False)(h)
h = Norm()(h)

# 0
h = inputs = keras.Input(shape=input_shape)

# 1
h = Pad([[0, 0], [3, 3], [3, 3], [0, 0]], mode='REFLECT')(h)
h = keras.layers.Conv2D(dim, 7, padding='valid', use_bias=False)(h)
h = Norm()(h)
h = keras.layers.ReLU()(h)

# 2
for _ in range(n_downsamplings):
dim *= 2
h = keras.layers.Conv2D(dim, 3, strides=2, padding='same', use_bias=False)(h)
h = Norm()(h)
h = keras.layers.ReLU()(h)

# 3
for _ in range(n_blocks):
h = _residual_block(h)

h = keras.layers.ReLU()(h)

return keras.Model(inputs=inputs, outputs=h)

# train.py
A2B = G_A2B(A, training=True)
B2A = G_B2A(B, training=True)
A2B2A = G_B2A(A2B, training=True)
B2A2B = G_A2B(B2A, training=True)

G_A_Feature = F_A(A2B, training=True) # Get the feature of generated image from A
B_Feature = F_B(B, training=True) # Get the feature of B image

A2B_d_logits = D_B(A2B, training=True)
B2A_d_logits = D_A(B2A, training=True)

A2B_g_loss = 2*g_loss_fn(A2B_d_logits)
B2A_g_loss = g_loss_fn(B2A_d_logits)
A2B2A_cycle_loss = cycle_loss_fn(A, A2B2A)
B2A2B_cycle_loss = cycle_loss_fn(B, B2A2B)
A2B_id_loss = identity_loss_fn(A, A2B)
B2A_id_loss = identity_loss_fn(B, B2A)
G_A_B_Feature_loss = identity_loss_fn(G_A_Feature, B_Feature) # Calculate the loss between two features

G_loss = (A2B_g_loss + B2A_g_loss) + (A2B2A_cycle_loss + B2A2B_cycle_loss) * args.cycle_loss_weight + (A2B_id_loss + B2A_id_loss) * args.identity_loss_weight + G_A_B_Feature_loss*args.feature_loss_weight


Result

Maybe using the real material patches for GTA5 can solve this problem in a totally different way :)

# 5. References

### CycleGAN

#### Paper:

[^1]: J. Zhu, et al, Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017 International Conference on Computer Vision (ICCV 2017), IEEE, 2017.

#### Codes:

https://github.com/LynnHo/CycleGAN-Tensorflow-2