源码地址:https://github.com/fzliu/style-transfer

scipy.optimize.minimize的使用

scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)

fun目标函数,也即要最小化的函数。

x0初始猜测值。

args可将额外参数传给fun

method论文使用的是L-BFGS-B

options为字典形,其中maxiter表示最大迭代次数。

L-BFGS-B的方法前提下(其它方法可选参数不一样):

jac表示fun除了要返回loss之外,还要另外返回变量的向量梯度grad

bounds表示各个变量的范围,形式为列表中的元组。

举一个例子如下:

minx  f(x)=||x||2

min_x\ \ f(x)=||x||^2

s.t.   −5 < x1 <5 ,  12 < x2 <23s.t. \ \ \ -5\

如下图所示:greeting是额外参数,因为jacTrue, 所以除了loss还返回了梯度向量2*x

Loss解释

Ltotal(p⃗ ,a⃗ ,x⃗ )=αLcontent(p⃗ ,x⃗ )+βLstyle(a⃗ ,x⃗ )

L_{total}(\vec{p},\vec{a},\vec{x})={\alpha}L_{content}(\vec{p},\vec{x})+{\beta}L_{style}(\vec{a},\vec{x})

其中p⃗ 、a⃗ 、x⃗ \vec{p}、\vec{a}、\vec{x}分别代表content照、artwork照、生成照。α,β\alpha, \beta表示对两种loss的权重。

LstyleL_{style}强调纹理、颜色的损失,以Gram矩阵来表达。

LcontentL_{content}强调直接损失,以特征图直接表达。

其它具体细节请参照论文。

源码

两个“##”指示的程序运行顺序,这里只标示出默认情况下的运行顺序。
可以先找到 “## 1”开始读,注意有个空格
argparse模块请参考:http://blog.csdn.net/stepleave/article/details/51737211

# system imports
import argparse
import logging
import os
import sys
import timeit# library imports
import caffe
import numpy as np
import progressbar as pb
from scipy.fftpack import ifftn
from scipy.linalg.blas import sgemm
from scipy.misc import imsave
from scipy.optimize import minimize
from skimage import img_as_ubyte
from skimage.transform import rescale# logging
LOG_FORMAT = "%(filename)s:%(funcName)s:%(asctime)s.%(msecs)03d -- %(message)s"# numeric constants
INF = np.float32(np.inf)
STYLE_SCALE = 1.2# weights for the individual models
# assume that corresponding layers' top blob matches its name
VGG19_WEIGHTS = {"content": {"conv4_2": 1},"style": {"conv1_1": 0.2,"conv2_1": 0.2,"conv3_1": 0.2,"conv4_1": 0.2,"conv5_1": 0.2}}
VGG16_WEIGHTS = {"content": {"conv4_2": 1},"style": {"conv1_1": 0.2,"conv2_1": 0.2,"conv3_1": 0.2,"conv4_1": 0.2,"conv5_1": 0.2}}
GOOGLENET_WEIGHTS = {"content": {"conv2/3x3": 2e-4,"inception_3a/output": 1-2e-4},"style": {"conv1/7x7_s2": 0.2,"conv2/3x3": 0.2,"inception_3a/output": 0.2,"inception_4a/output": 0.2,"inception_5a/output": 0.2}}
CAFFENET_WEIGHTS = {"content": {"conv4": 1},"style": {"conv1": 0.2,"conv2": 0.2,"conv3": 0.2,"conv4": 0.2,"conv5": 0.2}}# argparse
parser = argparse.ArgumentParser(description="Transfer the style of one image to another.",usage="style.py -s <style_image> -c <content_image>")
parser.add_argument("-s", "--style-img", type=str, required=True, help="input style (art) image")
parser.add_argument("-c", "--content-img", type=str, required=True, help="input content image")
parser.add_argument("-g", "--gpu-id", default=0, type=int, required=False, help="GPU device number")
parser.add_argument("-m", "--model", default="vgg16", type=str, required=False, help="model to use")
parser.add_argument("-i", "--init", default="content", type=str, required=False, help="initialization strategy")
parser.add_argument("-r", "--ratio", default="1e4", type=str, required=False, help="style-to-content ratio")
parser.add_argument("-n", "--num-iters", default=512, type=int, required=False, help="L-BFGS iterations")
parser.add_argument("-l", "--length", default=512, type=float, required=False, help="maximum image length")
parser.add_argument("-v", "--verbose", action="store_true", required=False, help="print minimization outputs")
parser.add_argument("-o", "--output", default=None, required=False, help="output path")## 5.1.6.2
def _compute_style_grad(F, G, G_style, layer):"""Computes style gradient and loss from activation features."""# compute loss and gradient(Fl, Gl) = (F[layer], G[layer])c = Fl.shape[0]**-2 * Fl.shape[1]**-2El = Gl - G_style[layer]loss = c/4 * (El**2).sum()grad = c * sgemm(1.0, El, Fl) * (Fl>0)return loss, grad## 5.1.6.3
def _compute_content_grad(F, F_content, layer):"""Computes content gradient and loss from activation features."""# compute loss and gradientFl = F[layer]El = Fl - F_content[layer]loss = (El**2).sum() / 2grad = El * (Fl>0)return loss, grad##5.1.2 计算content特征和style特征,并返回
def _compute_reprs(net_in, net, layers_style, layers_content, gram_scale=1):"""Computes representation matrices for an image."""# input data and forward pass(repr_s, repr_c) = ({}, {})net.blobs["data"].data[0] = net_innet.forward()# loop through combined set of layersfor layer in set(layers_style)|set(layers_content):F = net.blobs[layer].data[0].copy()F.shape = (F.shape[0], -1)repr_c[layer] = Fif layer in layers_style:repr_s[layer] = sgemm(gram_scale, F, F.T)return repr_s, repr_c## 5.1.6 目标函数,返回损失、和梯度向量
def style_optfn(x, net, weights, layers, reprs, ratio):"""Style transfer optimization callback for scipy.optimize.minimize().:param numpy.ndarray x:Flattened data array.:param caffe.Net net:Network to use to generate gradients.:param dict weights:Weights to use in the network.:param list layers:Layers to use in the network.:param tuple reprs:Representation matrices packed in a tuple.:param float ratio:Style-to-content ratio."""# 解析输入参数layers_style = weights["style"].keys()layers_content = weights["content"].keys()net_in = x.reshape(net.blobs["data"].data.shape[1:])(G_style, F_content) = reprs## 5.1.6.1又进入_compute_reprs函数,之前都是单独计算style特征或者content特征,## 这里对(初始猜测net_in=img0)在各层(layers_style|layers_content)提取content特征,## 在layers_style层提取style特征。(G, F) = _compute_reprs(net_in, net, layers_style, layers_content)# 按层反向传播# 初始化loss和layers中最后一层的diffloss = 0net.blobs[layers[-1]].diff[:] = 0  # diff函数指的是 T 时刻与 T-1 时刻的差分,初始化为零for i, layer in enumerate(reversed(layers))next_layer = None if i == len(layers)-1 else layers[-i-2]# 循环的过程:conv5_1 --> conv4_2 --> conv4_1 --> conv3_1 --> conv2_1#            --> conv1_1 --> data层,这个过程是一步一步进行的,#            首次循环是conv5_1(layer) --> conv4_2(next_layer)grad = net.blobs[layer].diff[0]# style对loss的贡献if layer in layers_style:wl = weights["style"][layer]## 5.1.6.2进入_compute_style_grad函数,## 计算style特征对loss的贡献,## 以及loss对 F(初始猜测的该层的content特征)的导数。(l, g) = _compute_style_grad(F, G, G_style, layer)loss += wl * l * ratio  # ratio是style损失和content损失对总loss贡献的权重折中。grad += wl * g.reshape(grad.shape) * ratio # 更新梯度# content对loss的贡献if layer in layers_content:wl = weights["content"][layer]## 5.1.6.3 进入_compute_content_grad,## 计算content特征对loss的贡献,## 以及loss对 F(初始猜测的该层的content特征)的导数。(l, g) = _compute_content_grad(F, F_content, layer)loss += wl * lgrad += wl * g.reshape(grad.shape)# 梯度反向传播net.backward(start=layer, end=next_layer)if next_layer is None:grad = net.blobs["data"].diff[0]else:grad = net.blobs[next_layer].diff[0]# 将梯度拉成一个长向量grad = grad.flatten().astype(np.float64)# 一次优化结束,返回损失和梯度向量return loss, gradclass StyleTransfer(object):## 4.1 生成对象实例'''初始化了如下参数self.net         #网络self.transformer #网络转换器self.weights     #网络权重self.layers      #网络层self.callback    #函数self.use_pbar    #是否显示进度条'''def __init__(self, model_name, use_pbar=True):"""Initialize the model used for style transfer.:param str model_name:Model to use.:param bool use_pbar:Use progressbar flag."""style_path = os.path.abspath(os.path.split(__file__)[0])base_path = os.path.join(style_path, "models", model_name)# vgg19if model_name == "vgg19":model_file = os.path.join(base_path,"VGG_ILSVRC_19_layers_deploy.prototxt")pretrained_file = os.path.join(base_path,"VGG_ILSVRC_19_layers.caffemodel")mean_file = os.path.join(base_path, "ilsvrc_2012_mean.npy")weights = VGG19_WEIGHTS# vgg16elif model_name == "vgg16":model_file = os.path.join(base_path,"VGG_ILSVRC_16_layers_deploy.prototxt")pretrained_file = os.path.join(base_path,"VGG_ILSVRC_16_layers.caffemodel")mean_file = os.path.join(base_path, "ilsvrc_2012_mean.npy")weights = VGG16_WEIGHTS# googlenetelif model_name == "googlenet":model_file = os.path.join(base_path, "deploy.prototxt")pretrained_file = os.path.join(base_path, "bvlc_googlenet.caffemodel")mean_file = os.path.join(base_path, "ilsvrc_2012_mean.npy")weights = GOOGLENET_WEIGHTS# caffenetelif model_name == "caffenet":model_file = os.path.join(base_path, "deploy.prototxt")pretrained_file = os.path.join(base_path,"bvlc_reference_caffenet.caffemodel")mean_file = os.path.join(base_path, "ilsvrc_2012_mean.npy")weights = CAFFENET_WEIGHTSelse:assert False, "model not available"# add model and weights## 4.1.1 进入self.load_model, 初始化了self.net、self.transformer        self.load_model(model_file, pretrained_file, mean_file)self.weights = weights.copy()self.layers = []for layer in self.net.blobs:if layer in self.weights["style"] or layer in self.weights["content"]:self.layers.append(layer)self.use_pbar = use_pbar# set the callback function# 定义的callback函数,每次优化后都会被调用一次,显示进度条。if self.use_pbar:def callback(xk):self.grad_iter += 1try:self.pbar.update(self.grad_iter)except:self.pbar.finished = Trueif self._callback is not None:net_in = xk.reshape(self.net.blobs["data"].data.shape[1:])self._callback(self.transformer.deprocess("data", net_in))else:def callback(xk):if self._callback is not None:net_in = xk.reshape(self.net.blobs["data"].data.shape[1:])self._callback(self.transformer.deprocess("data", net_in))self.callback = callback##4.1.1 比较基础,不做多解释def load_model(self, model_file, pretrained_file, mean_file):"""Loads specified model from caffe install (see caffe docs).:param str model_file:Path to model protobuf.:param str pretrained_file:Path to pretrained caffe model.:param str mean_file:Path to mean file."""# load net (supressing stderr output)null_fds = os.open(os.devnull, os.O_RDWR)out_orig = os.dup(2)os.dup2(null_fds, 2)net = caffe.Net(model_file, pretrained_file, caffe.TEST)os.dup2(out_orig, 2)os.close(null_fds)# all models used are trained on imagenet datatransformer = caffe.io.Transformer({"data": net.blobs["data"].data.shape})transformer.set_mean("data", np.load(mean_file).mean(1).mean(1))transformer.set_channel_swap("data", (2,1,0))transformer.set_transpose("data", (2,0,1))transformer.set_raw_scale("data", 255)# add net parametersself.net = netself.transformer = transformer## 5.2 获得生成的图像,## 之所以能从'data'层得到的,## 是因为在5.1.6 style_optfn中,## 反向传播时对初始猜测不断进行更新了。def get_generated(self):"""Saves the generated image (net input, after optimization).:param str path:Output path."""data = self.net.blobs["data"].dataimg_out = self.transformer.deprocess("data", data)return img_out## 5.1.1 调整网络输入尺度,以适应imgdef _rescale_net(self, img):"""Rescales the network to fit a particular image."""# get new dimensions and rescale net + transformernew_dims = (1, img.shape[2]) + img.shape[:2]self.net.blobs["data"].reshape(*new_dims)self.transformer.inputs["data"] = new_dimsdef _make_noise_input(self, init):"""Creates an initial input (generated) image."""# specify dimensions and create grid in Fourier domaindims = tuple(self.net.blobs["data"].data.shape[2:]) + \(self.net.blobs["data"].data.shape[1], )grid = np.mgrid[0:dims[0], 0:dims[1]]# create frequency representation for pink noiseSf = (grid[0] - (dims[0]-1)/2.0) ** 2 + \(grid[1] - (dims[1]-1)/2.0) ** 2Sf[np.where(Sf == 0)] = 1Sf = np.sqrt(Sf)Sf = np.dstack((Sf**int(init),)*dims[2])# apply ifft to create pink noise and normalizeifft_kernel = np.cos(2*np.pi*np.random.randn(*dims)) + \1j*np.sin(2*np.pi*np.random.randn(*dims))img_noise = np.abs(ifftn(Sf * ifft_kernel))img_noise -= img_noise.min()img_noise /= img_noise.max()# preprocess the pink noise imagex0 = self.transformer.preprocess("data", img_noise)return x0def _create_pbar(self, max_iter):"""Creates a progress bar."""self.grad_iter = 0self.pbar = pb.ProgressBar()self.pbar.widgets = ["Optimizing: ", pb.Percentage(), " ", pb.Bar(marker=pb.AnimatedMarker())," ", pb.ETA()]self.pbar.maxval = max_iter## 5.1 关键函数,转换风格def transfer_style(self, img_style, img_content, length=512, ratio=1e5,n_iter=512, init="-1", verbose=False, callback=None):"""Transfers the style of the artwork to the input image.:param numpy.ndarray img_style:A style image with the desired target style.:param numpy.ndarray img_content:A content image in floating point, RGB format.:param function callback:A callback function, which takes images at iterations."""# 假设输入图像:高=宽orig_dim = min(self.net.blobs["data"].shape[2:])# 对图像进行尺度变换,但我不知作用是什么。。scale = max(length / float(max(img_style.shape[:2])),orig_dim / float(min(img_style.shape[:2])))img_style = rescale(img_style, STYLE_SCALE*scale)scale = max(length / float(max(img_content.shape[:2])),orig_dim / float(min(img_content.shape[:2])))img_content = rescale(img_content, scale)# 计算style特征## 5.1.1 进入_rescale_net函数,调整网络输入参数,使尺度适应img_style图像self._rescale_net(img_style)layers = self.weights["style"].keys()net_in = self.transformer.preprocess("data", img_style)gram_scale = float(img_content.size)/img_style.size  #这一句作者没使用。。## 5.1.2 进入_compute_reprs,计算并返回img_style的style特征G_style = _compute_reprs(net_in, self.net, layers, [],gram_scale=1)[0]# 计算content特征## 5.1.3进入_rescale_net函数,调整网络输入参数,以适应图像img_content## 注意之后就没有改变网络输入参数了,一直沿用适应img_content的这些参数self._rescale_net(img_content)layers = self.weights["content"].keys()net_in = self.transformer.preprocess("data", img_content)## 5.1.4 进入_compute_reprs,计算并返回img_content的content特征F_content = _compute_reprs(net_in, self.net, [], layers)[1]# 生成初始网络输入img0# 默认的init = 'content', 见开头argpaser部分if isinstance(init, np.ndarray):  #如果init是数组类型img0 = self.transformer.preprocess("data", init)elif init == "content":img0 = self.transformer.preprocess("data", img_content) elif init == "mixed":img0 = 0.95*self.transformer.preprocess("data", img_content) + \0.05*self.transformer.preprocess("data", img_style)else:img0 = self._make_noise_input(init)  #生成随机噪声图像,默认情况下不会用到。# 计算每个像素的范围data_min = -self.transformer.mean["data"][:,0,0]data_max = data_min + self.transformer.raw_scale["data"]data_bounds = [(data_min[0], data_max[0])]*(img0.size/3) + \[(data_min[1], data_max[1])]*(img0.size/3) + \[(data_min[2], data_max[2])]*(img0.size/3)# 设置优化参数grad_method = "L-BFGS-B"reprs = (G_style, F_content)minfn_args = {"args": (self.net, self.weights, self.layers, reprs, ratio),"method": grad_method, "jac": True, "bounds": data_bounds,"options": {"maxcor": 8, "maxiter": n_iter, "disp": verbose}}# 进行优化self._callback = callback   #callback为输入参数,默认为Noneminfn_args["callback"] = self.callback #self.callback是一个函数,与上一句有区别if self.use_pbar and not verbose: #默认情况## 5.1.5 进入_create_pbar函数,生成进度条。self._create_pbar(n_iter)self.pbar.start()## 5.1.6 最最关键的函数!!!我们来看style_optfn函数## minimize返回一个类,这里作者只使用了类的nit属性:迭代次数res = minimize(style_optfn, img0.flatten(), **minfn_args).nitself.pbar.finish()else:res = minimize(style_optfn, img0.flatten(), **minfn_args).nitreturn resdef main(args):## 1. logginglevel = logging.INFO if args.verbose else logging.DEBUGlogging.basicConfig(format=LOG_FORMAT, datefmt="%H:%M:%S", level=level)logging.info("Starting style transfer.")## 2. 设置 GPU/CPU 模式if args.gpu_id == -1:caffe.set_mode_cpu()logging.info("Running net on CPU.")else:caffe.set_device(args.gpu_id)caffe.set_mode_gpu()logging.info("Running net on GPU {0}.".format(args.gpu_id))## 3. 载入图像,RGB格式,值在(0,1)之间img_style = caffe.io.load_image(args.style_img)  img_content = caffe.io.load_image(args.content_img)logging.info("Successfully loaded images.")## 4. 生成StyleTransfer对象use_pbar = not args.verbose## 4.1 进入StyleTransfer类st = StyleTransfer(args.model.lower(), use_pbar=use_pbar)logging.info("Successfully loaded model {0}.".format(args.model))## 5. 进行风格转换start = timeit.default_timer()## 5.1 关键函数,转换风格n_iters = st.transfer_style(img_style, img_content, length=args.length, init=args.init, ratio=np.float(args.ratio), n_iter=args.num_iters, verbose=args.verbose)end = timeit.default_timer()logging.info("Ran {0} iterations in {1:.0f}s.".format(n_iters, end-start))## 5.2 获得输出图像img_out = st.get_generated()## 6. 输出路径if args.output is not None:out_path = args.outputelse:out_path_fmt = (os.path.splitext(os.path.split(args.content_img)[1])[0], os.path.splitext(os.path.split(args.style_img)[1])[0], args.model, args.init, args.ratio, args.num_iters)out_path = "outputs/{0}-{1}-{2}-{3}-{4}-{5}.jpg".format(*out_path_fmt)## 7. 保存图像imsave(out_path, img_as_ubyte(img_out))logging.info("Output saved to {0}.".format(out_path))if __name__ == "__main__":args = parser.parse_args()main(args)

艺术风格转换之《A Neural Algorithm of Artistic Style》相关推荐

  1. A Neural Algorithm of Artistic Style

    油画风格(Neural style) 参考文献:< A Neural Algorithm of Artistic Style>

  2. NS之VGG(Keras):基于Keras的VGG16实现之《复仇者联盟3》灭霸图像风格迁移设计(A Neural Algorithm of Artistic Style)

    NS之VGG(Keras):基于Keras的VGG16实现之<复仇者联盟3>灭霸图像风格迁移设计(A Neural Algorithm of Artistic Style) 导读 通过代码 ...

  3. 【每个人都是梵高】A Neural Algorithm of Artistic Style

    文章地址:A Neural Algorithm of Artistic Style 代码:https://github.com/jcjohnson/neural-style 这篇文章我觉得可以起个浪漫 ...

  4. 计算机也能成为艺术家?(基于论文A Neural Algorithm of Artistic Style的图像风格迁移)

    文章目录 引言 可解释性 一种途径:特征可视化 特征和风格,两者或许是一种东西 图像纹理生成 格拉姆矩阵 纹理生成网络 纹理损失函数 从纹理合成到风格迁移 内容损失函数 总损失函数 Torch代码实战 ...

  5. 【转】模仿绘画风格的算法:A Neural Algorithm of Artistic Style

    http://blog.csdn.net/bat67/article/details/52049983 有代码,论文方面说的不多,有图,很有趣.等有时间看看这篇论文和代码,自己实现下.

  6. 【A Neural Algorithm of Artistic Style】 Pics

    图中是我市的标志

  7. Convolutional neural networks for artistic style transfer

    https://harishnarayanan.org/writing/artistic-style-transfer/ 转载于:https://www.cnblogs.com/guochen/p/6 ...

  8. A Learned Representation for Artistic Style论文理解

    A Learned Representation for Artistic Style论文理解 这篇论文是在Perceptual losses for real-time style transfer ...

  9. CVPR-Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

    [CVPR-2021] Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Tr ...

最新文章

  1. 来吧,用设计模式来干掉 if-else
  2. PowerDesigner的安装
  3. 通过网易的在线捐款捐了10元钱
  4. 看《疯狂的程序员》,很好很新奇
  5. python图片保存重命名_Python实现重命名一个文件夹下的图片
  6. 02 - java 标识符命名规范
  7. 《你好,李焕英》总票房逆袭《哪吒》 成中国影史亚军
  8. 课程笔记--复习专用
  9. 怎么下载php文件的电影,下电影下下来是.php格式,请问怎么打开?
  10. python动力学_用python学振动分析(一)
  11. centos使用迅雷远程下载
  12. 华为_ensp_vlan接口模式
  13. 菜鸟第一次使用pycharm+mysql+django运行项目
  14. Android实现思维导图功能,Android实现思维导图
  15. python 隐藏excel的列和行
  16. 正弦信号、余弦信号与复指数信号(欧拉公式)
  17. python系列笔记--耗子(巨细)
  18. PID控制器原理详解
  19. mpvue 搭配 minui
  20. picpick尺子像素大小精度不够准确_picpick尺子像素大小精度不够准确_【论文解读】像素级分割里程碑算法——FCN全卷积神经网络......

热门文章

  1. Android实现切换主题功能
  2. SkyWalking 9.X 入门保姆教程
  3. Eclipse配置Tomcat教程
  4. JAVA类与对象tank_5.编写一个java程序,该程序有两个类:Tank(用于刻画坦克)和Fight(主类)。 已知坦克类如下:...
  5. win11中Python环境配置
  6. fedora23_x86_64通过dnf升级到fedora24
  7. 漫谈Web开发技术流派
  8. Ubuntu下使用ffmpeg分割和合并视频文件
  9. 【HiFlow】定期发送腾讯云短信发送群
  10. html制作电影宣传效果,为你的网页做出电影的过场效果_html