A Gentle Introduction to Generative Adversarial Network (GAN) Model

Generative Adversarial Network or commonly known as GAN model is getting more attention in the Artificial Intelligence (AI) area since last 10 years ago. So basically, how does GAN’s work and what is it good for?

Let’s get started on a quick summary of the GAN model in this blogpost. This blogpost are composed with 4 sections which are:

Overview of GAN Model
Chronology of GAN Model
GAN Model Applications
Wrap Up!

1. Overview of GAN Model

Before we go in details of all jargon terms in GAN model, let’s have a quick test, try to have some guesses, do you think this person is exist?

If you are guessing all of them are exist, well actually they do not exist at all! Sounds creepy right?! coz from our naked eyes it seems real to us. The big question is how all of this happened? Well, let me tell you this magic hidden in the GAN model.

Early implementation of GAN model was introduced by Ian J. Goodfellow and team in 2014. It’s a generative model where we can generate or produce new synthetic content for instance images, voices, music etc from the trained ground truth dataset. GAN model composed with 2 networks trained simultaneously which are:

The generator to create new images that look real
The discriminator to differentiate which image is real and fake(photo credit: https://www.tensorflow.org/tutorials/generative/dcgan)

This generator will keep on generate new synthetic data that resemble our training data by adding random noise to the dataset, while the discriminator will try not to be fooled by the generator and keep identifying which generated images is fake or real.

Therefore, this process will continuously trained until it reach equilibrium states where the discriminator can no longer distinguish which one generated images is real or fake. The details of mathematical thingy for both networks (generator and discriminator) I will share it in my next blogpost.

2. Chronology of GAN Model

There are lots of GAN model available out there to be tested and explored. To make it simple and easily for you to identify latest version of GAN model, I try to summarize most common GAN model available out there that you can play around.

Generative Adversarial Network by Goodfellow et al. (2014)
Conditional GANs by Mirza and Osindero (2014)
DCGANs by Radford et al. (2015)
Improved Technique for Training GANs by Salimans et al. (2016)
Pix2Pix by Isola et al. (2016)
Progressively Growing of GANs (PROGAN) for Improved Quality, Stability and Variation by Karras et al. (2017)
CycleGAN by Zhu et al. (2017)
StackGAN by Zhang et al. (2017)
StyleGAN by Karras et al. (2019)

3. GAN Model Applications

The application of GAN model has been widely used for text to image generation, image to image, face frontal view generation, face aging, photo blending, video prediction and generate fake image.

Since create face image caught many people attention, there are 3 types of GAN models were commonly applied that you can play around. You can check out their repositories from the GitHub link below:

StyleGAN (https://github.com/NVlabs/stylegan2)
ProGAN (https://github.com/tkarras/progressive_growing_of_gans)
GAN

StyleGAN is the modified version from the ProGAN and GAN model. Up to current work, StyleGAN outperform both models (ProGAN and GAN model) and capable to produce more realistic human face image by considering the coarse feature such as human pose, face shape, finer feature of eyes, color, nose shape with high resolution image.

Sounds cool right? this GAN thing can also work like an intelligent photoshop for us.

4. Wrap Up!

There are so many things about GAN can be write inside here. However, to make it easily to understand and digest for newbies like us, I try to put the content part by part. In this post, brief overview on GAN model including the applications and GANs version was highlighted.

So for the next post, I am planning to have like slightly details (mathematical framework) on how the generator and discriminator works to create fake face image including the architecture involve in the training process.

Later, let’s have some coding hands-on with the GAN model using the Tensorflow and Streamlit in generate new face images.

Feel free to comment and throw your idea for my future work improvement!