Pixel generative models

PixelRNN/ PixelCNN

Each pixel is depended on other observed pixels. D

Screenshot from 2017-07-21 15:26:46

http://kawahara.ca/conditional-image-generation-with-pixelcnn-decoders-slides/

Gated PixelCNN

PixelRNN is accurate, but it is slow to train since RNN is hard to parallelize. PixelCNN is proposed to solve this problem. A mask is used to ensure convolution operation only uses previous pixels. To avoid “blind spot” problem, there are horizontal and vertical convolutions.

Screenshot from 2017-07-21 15:03:57

The authors believe another reason for the good performance of PixelRNN is that they have “multiplicative units” in the form of LSTM gates, which may be helpful to model more complex interactions. “Gated convolutional layers” were proposed in this way.

y = tanh(Wx)⊗sigma(Wx) , where ⊗ is element-wise production

Screenshot from 2017-07-21 14:28:56

When we stack these gated convolutional layers together, we’ll obtain Gated PixelCNN.

Screenshot from 2017-07-21 15:21:42

PixelCNN++

This paper made several modifications to Gated PixelCNN. One interesting change is the long range connections as shown in figure bellow.

Screenshot from 2017-07-21 15:02:25

Multiscale PixelCNN

This work aims at speeding up the computation of PixelCNN. By generating multiple pixels in parallel, this algorithm is able to reduce the complexity of original PixelCNN from linear to logarithm.

PixelVAE

VAE can generate smooth images with good global feature, however, it is not good at recover  local features.

PixelCNN can extract local features pretty well, but not good at extract global features.

Screenshot from 2017-07-21 15:07:13

PixelGAN

The coolest idea of this is to decoupled categorical variable with continuous variable, which is kind of similar to InfoGAN. The difference is that here, categorical variables are learned by Adversarial Autoencoder and continuous variables are learned by PixelCNN

Screenshot from 2017-07-21 15:10:35

GPU usage

Tensorflow and VTK can’t tolerant Theano access GPU at the same time. Errors similar to bellow will appear.

F tensorflow/stream_executor/cuda/cuda_driver.cc:316] current context was not created by the StreamExecutor cuda_driver API: 0x40e0f10; a CUDA runtime call was likely performed without using a StreamExecutor context

 

The way that I used to get around is:

  1. Allocate Theano to another GPU or to CPU

https://github.com/tensorflow/tensorflow/issues/916

 

  1. Store data first before calling VTK.
ERROR: In /export/doutriaux1/build/build/ParaView/VTK/Rendering/OpenGL/vtkXOpenGLRenderWindow.cxx, line 382
vtkXOpenGLRenderWindow (0x2a1d710): Could not find a decent visual

ERROR: In /export/doutriaux1/build/build/ParaView/VTK/Rendering/OpenGL/vtkXOpenGLRenderWindow.cxx, line 601
vtkXOpenGLRenderWindow (0x2a1d710): GLX not found.  Aborting.

 

Technical implementing issues of Convolutional Neural Network

This blog gives a very thorough introduction to general issues.

This blog explains why should we choose cross-entropy instead of error rate as loss function.

 This blog offers a very comprehensive list of how can GAN be better trained.

softmax

regularization

Weight initialization

From Tensorflow tutorial:

To create this model, we’re going to need to create a lot of weights and biases. One should generally initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients. Since we’re using

To create this model, we’re going to need to create a lot of weights and biases. One should generally initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients. Since we’re using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias to avoid “dead neurons”.

How to choose a neural network’s hyper-parameters?

 

Moving average from Tensorflow tutorial:

When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.

 

batch normalization, for gradient vanishing

stride convolution replace pooling, for fast computation

This page gives a very good example on how to implement stride convolution.

drop out

max out

Autoencoder

An autoencoder has a lot of freedom and that usually means our AE can overfit the data because it has just too many ways to represent it. To constrain this we should use sparse autoencoders where a non-sparsity penalty is added to the cost function. In general when we talk about autoencoders we are really talking about sparse autoencoders.