GENERATE FACES WITH SPECIFIC ATTRIBUTES

This article concerns the creation of a face generator and of a conditional face generator. In particular, the structure that has been used is the one of Generative Adversarial Networks (GAN) and Conditional Generative Adver- sarial Networks.

3 min readDec 17, 2020

--

The data and the attributes that have been used for the training come from the CelebA dataset (images 64x64x3). Moreover, the assessment of the two face generators has been done through the Fréchet Inception Distance (FID). The models have been built by using Tensorflow.

GAN

The developed GAN has been trained on 192 599 images and its assessment through the FID score has been conducted on the remaining 10 000 images. In particular, the discriminator has been trained separately on fake and real images and with batches of size 64.

Following the tricks to improve the stability of GANs by S.Chintala [2], the hereunder best practices has been introduced in the model:

The input data has been normalized to the range [-1, 1] before the training
The output layer of the generator has the hyperbolic tangent as activation function
For the remaining layers, in both discriminator and generator, the leaky version of a Rectified Linear Unit was used as activation function with a negative slope coefficient (alpha) of 0.2.
A spherical latent space has been used, with 100 random variables identically distributed according to a standard normal distribution.
A dropout layer has been added to the discriminator with a rate parameter of 0.4

The model was trained for 50 epochs with a resulting FID score of 23.

Conditional GAN (cGAN)

For generating images depending on given attributes, a conditional GAN has been used with the same features of the above mentioned GAN.

https://it.mathworks.com/help/deeplearning/ug/train-conditional-generative-adversarial-network.html

In the discriminator, labels given as inputs were treated by inserting a Lambda layer, applying a function which allows to expand their dimension so that they could be concatenated with the activations of the first hidden layer, which has a stride of (2,2), giving as output a shape of (nbatch, 32, 32, 64). Thus, after concatenation, the input to the second hidden layer is of shape (nbatch, 32, 32, 104). In the first attempts, labels have been concatenated before the first hidden layer but the gan was really unstable, with bad results.

However, for this model, it has been acknowledged that by applying random normal initializers, with standard deviation parameter of 0.02 to the layers of both discriminator and generator, the learning time was greatly improved.