keras - 构建并训练一个model--超级基础篇

点开这个博客的人，估计都知道Keras是什么。但是我作为一个小白，还是先来聊一下keras是什么。

像tensorflow一样，Keras是个python 库，不过里面都是神经网络的东西。我们深度学习要训练一个模型，而一个模型中有很多小组件。比如

用什么激活函数，relu还是sigmoid
用什么optimizer，gradient descent 还是adam
要不要加个 regularization避免过拟合，用L2 regularization呢还是 dropout呢
要不要用batch normalization
.....

这些都自己手写的话，费时费力。而Keras就是已经把这些小组件写好了，只需要调用就ok了！

======================正文 == 分割线=========================

一、模型定义阶段

二、模型初始化与编译

2.1 optimizer [ref]

2.1.1 SGD

2.1.2 RMSprop

2.1.3 Adagrad

2.1.4 Adadelta

2.1.5 Adam

2.2 loss

2.3 metrics

2.4 其余参数

三、模型调试

四、测试模型

五、绘制模型结构

keras 中搭建一个模型，然后训练、测试一共分以下几步：

定义model
初始化，并编译model
用训练集进行调试
用测试集进行测试

一、模型定义阶段

def HappyModel(input_shape):"""Implementation of the HappyModel.Arguments:input_shape -- shape of the images of the dataset(height, width, channels) as a tuple.  Note that this does not include the 'batch' as a dimension.If you have a batch like 'X_train', then you can provide the input_shape usingX_train.shape[1:]Returns:model -- a Model() instance in Keras"""# Define the input placeholder as a tensor with shape input_shape. Think of this as your input image!X_input = Input(input_shape)# Zero-Padding: pads the border of X_input with zeroesX = ZeroPadding2D((3, 3))(X_input)# CONV -> Batch Normalization -> RELU Block applied to XX = Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0')(X)X = BatchNormalization(axis = 3, name = 'bn0')(X)X = Activation('relu')(X)# MAXPOOLX = MaxPooling2D((2, 2), name='max_pool')(X)# FLATTEN X (means convert it to a vector) + FULLYCONNECTEDX = Flatten()(X)X = Dense(1, activation='sigmoid', name='fc')(X)# Create model. This creates your Keras model instance, you'll use this instance to train/test the model.model = Model(inputs = X_input, outputs = X, name='HappyModel')### END CODE HERE ###return model

二、模型初始化与编译

# 初始化模型
model = HappyModel(X_train.shape[1:])
#编译模型
model.compile(optimizer="adam",loss="binary_crossentropy",metrics=["accuracy"])

这里详细说一下 model.compile() 函数

compile(optimizer, loss=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)

参数详解：

2.1 optimizer [ref]

optimizer 是个类。调用方法有两种：

1、用string 指明一个参数默认的optimizer

model.compile(loss='mean_squared_error', optimizer='sgd')

2、创建一个optimizer实例，并调用

sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)

optimizer一共有如下5种：

2.1.1 SGD

keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False)

Stochastic gradient descent optimizer.

Includes support for momentum, learning rate decay, and Nesterov momentum.

Arguments

learning_rate: float >= 0. Learning rate.
momentum: float >= 0. Parameter that accelerates SGD in the relevant direction and dampens oscillations.
nesterov: boolean. Whether to apply Nesterov momentum.

2.1.2 RMSprop

keras.optimizers.RMSprop(learning_rate=0.001, rho=0.9)

RMSProp optimizer. [source]

It is recommended to leave the parameters of this optimizer at their default values (except the learning rate, which can be freely tuned).

Arguments

learning_rate: float >= 0. Learning rate.
rho: float >= 0.

References

rmsprop: Divide the gradient by a running average of its recent magnitude

2.1.3 Adagrad

keras.optimizers.Adagrad(learning_rate=0.01)

Adagrad optimizer. [source]

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the learning rate.

It is recommended to leave the parameters of this optimizer at their default values.

Arguments

learning_rate: float >= 0. Initial learning rate.

References

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

2.1.4 Adadelta

keras.optimizers.Adadelta(learning_rate=1.0, rho=0.95)

Adadelta optimizer. [source]

Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done. Compared to Adagrad, in the original version of Adadelta you don't have to set an initial learning rate. In this version, initial learning rate and decay factor can be set, as in most other Keras optimizers.

It is recommended to leave the parameters of this optimizer at their default values.

Arguments

learning_rate: float >= 0. Initial learning rate, defaults to 1. It is recommended to leave it at the default value.
rho: float >= 0. Adadelta decay factor, corresponding to fraction of gradient to keep at each time step.

References

Adadelta - an adaptive learning rate method

2.1.5 Adam

keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)

Adam optimizer. [source]

Default parameters follow those provided in the original paper.

Arguments

learning_rate: float >= 0. Learning rate.
beta_1: float, 0 < beta < 1. Generally close to 1.
beta_2: float, 0 < beta < 1. Generally close to 1.
amsgrad: boolean. Whether to apply the AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and Beyond".

2.2 loss

loss 是一个有两个参数（y_true, y_predict），返回值为一标量(a single tensor value) 的 TensorFlow/Theano symbolic function 。两个参数分别为两个TensorFlow/Theano tensor，具有相同形状。

loss 的两种设置方法如下：

#方法一 loss funtion 实体法
from keras import losses
model.compile(loss=losses.mean_squared_error, optimizer='sgd')

# 方法二，直接用名字调用（都是keras中定义好的loss function）
model.compile(loss='mean_squared_error', optimizer='sgd')

鉴于keras 中定义的 loss function实在太多了，不一一列举了，这里只写两个看起来常用的，其余详见 losses。

2.2.1 binary_crossentropy

用于做0，1判断的模型

keras.losses.binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)

2.2.2 categorical_crossentropy

用于多分类器（如softmax）

keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)

2.3 metrics

用于评估模型好坏，和参数loss 一样，也是一个有两个参数（y_true, y_predict），返回值为一标量（a single tensor value）的 TensorFlow/Theano symbolic function。两个参数也都为 TensorFlow/Theano tensor。

metrics也有两种调用方法，实例&string类型的名字

#方法一， 实例
from keras import metrics
model.compile(loss='mean_squared_error',optimizer='sgd',metrics=[metrics.mae, metrics.categorical_accuracy])#也可以自己定义metrics，来评估模型的好坏。
import keras.backend as K
def mean_pred(y_true, y_pred):return K.mean(y_pred)model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy', mean_pred])

# 方法二，直接按名字调用
model.compile(loss='mean_squared_error',optimizer='sgd',metrics=['mae', 'acc'])

常见metric之一如下，其余见metrics

2.3.1 accuracy

keras.metrics.accuracy(y_true, y_pred)

2.4 其余参数

其余参数还没有用到，用到再来补充

三、模型调试

终于来到模型调试这一步，调用 fit 函数来调试。如果前面调用过fit，再次调用 fit 时，会接着上次训练得到的parameters继续训练。

model.fit(x = X_train, y = Y_train, epochs=10, batch_size = 64)

这个fit函数其实有特别多参数，完整版如下：参考 model.fit

fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False)

参数如下：

x ：训练集的 data
y ：训练集的 labels
epochs ：遍历多少遍训练集
batch_size ：一个batch的大小
.....

返回值：

A History object. Its History.history attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).

四、测试模型

model.evalue(x=X_test, y=Y_test)

当然完整版的函数还是很长：

evaluate(x=None, y=None, batch_size=None, verbose=1, sample_weight=None, steps=None, callbacks=None, max_queue_size=10, workers=1, use_multiprocessing=False)

参数：

x ：测试集 data
y ：测试集 labels
...

Returns:

Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.

五、绘制模型结构

所有都搞定后，可以输出模型结构看一下。

有两种方法来看模型的结构：

以文字形式返回model结构：model.summary()
以图片形式返回model结构：plot_model(model,to_file='Model.png')

#文字形式返回model结构
model.summary()# 图片格式绘制model结构
plot_model(happyModel, to_file='HappyModel.png')
# 保存图片
SVG(model_to_dot(happyModel).create(prog='dot', format='svg'))

reference：https://keras.io/