ac GAN MNIST代码详细解析
创建于:2019-06-18 07:45:51 更新于:2025-02-22 08:35:08
深度学习 深度学习,GAN


最近希望对UCR Time Series的时间序列进行数据扩增,考虑用GAN实现。但是GAN的种类似乎很多,各种教程也只是粗浅说说原理,贴贴代码,达不到能够自己写代码的程度。所以打算花点时间对AC GAN的代码剖析一下,使其与原理较好地对应起来。



1. 根据latent生成原始3x3图像
py def build_generator(latent_size): # we will map a pair of (z, L), where z is a latent vector and L is a # label drawn from P_c, to image space (..., 28, 28, 1) cnn = Sequential() cnn.add(Dense(3 * 3 * 384, input_dim=latent_size, activation='relu')) cnn.add(Reshape((3, 3, 384)))
2. 上采样到7x7

这里提到上采样到7x7, 为什么就能实现上采样呢?因为这里是反卷积,filter数目是192, kernel size是5,并且padding是valid,表示仅对有效像素卷积。如果原始图像是7x7,那么经过这样的卷积之后,输出3x3的图像。
py # upsample to (7, 7, ...) cnn.add(Conv2DTranspose(192, 5, strides=1, padding='valid', activation='relu', kernel_initializer='glorot_normal')) cnn.add(BatchNormalization())

3. 上采样到14x14

只有原始大小为14x14,经过kernel size=5,且步长为2,有padding的卷积之后,输出为7x7.

由此观察可得,如果希望反卷积之后的大小不成比例,就用no padding, 步长为1;反之用padding,步长为2.

py # upsample to (14, 14, ...) cnn.add(Conv2DTranspose(96, 5, strides=2, padding='same', activation='relu', kernel_initializer='glorot_normal')) cnn.add(BatchNormalization())

4. 继续上采样到28x28,且输出一个通道


py # upsample to (28, 28, ...) cnn.add(Conv2DTranspose(1, 5, strides=2, padding='same', activation='tanh', kernel_initializer='glorot_normal')) # this is the z space commonly referred to in GAN papers latent = Input(shape=(latent_size, )) # this will be our label image_class = Input(shape=(1,), dtype='int32')

5. 嵌入层的用处?

文档上说,Embedding层的输入是(batch_size, sequence_length), 输出是(batch_size, sequence_length, output_dim)

官方说嵌入层的作用是Turns positive integers (indexes) into dense vectors of fixed size.
eg1. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
eg2. 32x10(1000内的数字) -> 32x10x64(64是output_dim)

py cls = Embedding(num_classes, latent_size, embeddings_initializer='glorot_normal')(image_class) # hadamard product between z-space and a class conditional embedding h = layers.multiply([latent, cls]) fake_image = cnn(h) return Model([latent, image_class], fake_image)

1. 反卷积层

py keras.layers.convolutional.Conv2DTranspose(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

当使用该层作为第一层时,应提供input_shape参数。例如input_shape = (3,128,128)代表128*128的彩色RGB图像

padding:补0策略,为“valid”, “same” 。“valid”代表只进行有效的卷积,即对边界数据不处理。“same”代表保留边界处的卷积结果,通常会导致输出shape与输入shape相同。





1. 获取训练集的一部分,即真实图片
py image_batch = x_train[index * batch_size:(index + 1) * batch_size] label_batch = y_train[index * batch_size:(index + 1) * batch_size]
2. 生成noise,这个noise对生成假图片有重要作用
py noise = np.random.uniform(-1, 1, (len(image_batch), latent_size))
3. 随机生成一些标签
py sampled_labels = np.random.randint(0, num_classes, len(image_batch))
4. 生成假图片

此处用到了noise和假标签。 generator接受两个输入,一个是随机噪声,另一个是label,并且根据label生成对应的图片。
py generated_images = generator.predict( [noise, sampled_labels.reshape((-1, 1))], verbose=0)
5. 连接真假图片

这里的image_batch是真图片,generated_images是假图片。 np.concatenate()将两个矩阵连在一起,如 [[1,2],[3,4]] 连接 [[5,6]]之后就是[[1,2],[3,4],[5,6]].
python3 x = np.concatenate((image_batch, generated_images))

6. 生成标签?

py soft_zero, soft_one = 0, 0.95 y = np.array([soft_one] * len(image_batch) + [soft_zero] * len(image_batch)) aux_y = np.concatenate((label_batch, sampled_labels), axis=0)
# we don’t want the discriminator to also maximize the classification
# accuracy of the auxiliary classifier on generated images, so we
# don’t train discriminator to produce class labels for generated
# images (see
# To preserve sum of sample weights for the auxiliary classifier,
# we assign sample weight of 2 to the real images.
7. 分配权重并训练分类器

TODO 这里的权重有点看不太懂
分类器在训练的时候输入的x是真假图片都有,标签由两部分构成,一部分是y,代表是不是真的,一部分是aux_y, 代表对应的类别。

py disc_sample_weight = [np.ones(2 * len(image_batch)),np.concatenate((np.ones(len(image_batch)) * 2, np.zeros(len(image_batch))))] epoch_disc_loss.append(discriminator.train_on_batch(x, [y, aux_y], sample_weight=disc_sample_weight))
# make new noise. we generate 2 * batch size here such that we have
# the generator optimize over an identical number of images as the
# discriminator
8. 再次生成假数据

py noise = np.random.uniform(-1, 1, (2 * len(image_batch), latent_size)) sampled_labels = np.random.randint(0, num_classes, 2 * len(image_batch))
# we want to train the generator to trick the discriminator
# For the generator, we want all the {fake, not-fake} labels to say
# not-fake
9. 训练combined模型

注意这里训练时输入的第二组数[trick, sampled_labels], trick代表是否是真图片,sampled_labels代表图片的分类。这里为了在训练时让生成器更倾向于生成更像真图片的假图片,人为让trick的值全部为soft_one,即非常接近1.

py trick = np.ones(2 * len(image_batch)) * soft_one epoch_gen_loss.append(combined.train_on_batch( [noise, sampled_labels.reshape((-1, 1))], [trick, sampled_labels]))