首先使用 numpy 实现网络。

Numpy 提供了一个 n 维数组对象，以及许多用于操纵这些数组的函数。 Numpy 是用于科学计算的通用框架。它对计算图，深度学习或梯度一无所知。但是，我们可以使用 numpy 操作手动实现通过网络的前向和后向传递，从而轻松地使用 numpy 使两层网络适合随机数据：

# -*- coding: utf-8 -*-
import numpy as np# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)learning_rate = 1e-6
for t in range(500):# Forward pass: compute predicted yh = x.dot(w1)h_relu = np.maximum(h, 0)y_pred = h_relu.dot(w2)# Compute and print lossloss = np.square(y_pred - y).sum()print(t, loss)# Backprop to compute gradients of w1 and w2 with respect to lossgrad_y_pred = 2.0 * (y_pred - y)grad_w2 = h_relu.T.dot(grad_y_pred)grad_h_relu = grad_y_pred.dot(w2.T)grad_h = grad_h_relu.copy()grad_h[h < 0] = 0grad_w1 = x.T.dot(grad_h)# Update weightsw1 -= learning_rate * grad_w1w2 -= learning_rate * grad_w2

张量

PyTorch 张量在概念上与 numpy 数组相同：张量是 n 维数组，而 PyTorch 提供了许多在这些张量上运行的功能。在幕后，张量可以跟踪计算图和渐变，但它们也可用作科学计算的通用工具。

与 numpy 不同，PyTorch 张量可以利用 GPU 加速其数字计算。要在 GPU 上运行 PyTorch Tensor，只需要将其转换为新的数据类型。

在这里，我们使用 PyTorch 张量使两层网络适合随机数据。像上面的 numpy 示例一样，我们需要手动实现通过网络的正向和反向传递：

# -*- coding: utf-8 -*-import torchdtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)learning_rate = 1e-6
for t in range(500):# Forward pass: compute predicted yh = x.mm(w1)h_relu = h.clamp(min=0)y_pred = h_relu.mm(w2)# Compute and print lossloss = (y_pred - y).pow(2).sum().item()if t % 100 == 99:print(t, loss)# Backprop to compute gradients of w1 and w2 with respect to lossgrad_y_pred = 2.0 * (y_pred - y)grad_w2 = h_relu.t().mm(grad_y_pred)grad_h_relu = grad_y_pred.mm(w2.t())grad_h = grad_h_relu.clone()grad_h[h < 0] = 0grad_w1 = x.t().mm(grad_h)# Update weights using gradient descentw1 -= learning_rate * grad_w1w2 -= learning_rate * grad_w2

autograd

PyTorch 中的 autograd 软件包正是提供了此功能。使用 autograd 时，网络的正向传递将定义计算图；图中的节点为张量，边为从输入张量生成输出张量的函数。然后通过该图进行反向传播，可以轻松计算梯度。

这听起来很复杂，在实践中非常简单。每个张量代表计算图中的一个节点。如果x是具有x.requires_grad=True的张量，则x.grad是另一个张量，其保持x相对于某个标量值的梯度。

在这里，我们使用 PyTorch 张量和 autograd 来实现我们的两层网络。现在我们不再需要手动通过网络实现反向传递：

# -*- coding: utf-8 -*-
import torchdtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)learning_rate = 1e-6
for t in range(500):# Forward pass: compute predicted y using operations on Tensors; these# are exactly the same operations we used to compute the forward pass using# Tensors, but we do not need to keep references to intermediate values since# we are not implementing the backward pass by hand.y_pred = x.mm(w1).clamp(min=0).mm(w2)# Compute and print loss using operations on Tensors.# Now loss is a Tensor of shape (1,)# loss.item() gets the scalar value held in the loss.loss = (y_pred - y).pow(2).sum()if t % 100 == 99:print(t, loss.item())# Use autograd to compute the backward pass. This call will compute the# gradient of loss with respect to all Tensors with requires_grad=True.# After this call w1.grad and w2.grad will be Tensors holding the gradient# of the loss with respect to w1 and w2 respectively.loss.backward()# Manually update weights using gradient descent. Wrap in torch.no_grad()# because weights have requires_grad=True, but we don't need to track this# in autograd.# An alternative way is to operate on weight.data and weight.grad.data.# Recall that tensor.data gives a tensor that shares the storage with# tensor, but doesn't track history.# You can also use torch.optim.SGD to achieve this.with torch.no_grad():w1 -= learning_rate * w1.gradw2 -= learning_rate * w2.grad# Manually zero the gradients after updating weightsw1.grad.zero_()w2.grad.zero_()

定义torch.autograd.Function的子类

在幕后，每个原始的 autograd 运算符实际上都是在 Tensor 上运行的两个函数。正向函数从输入张量计算输出张量。向后函数接收相对于某个标量值的输出张量的梯度，并计算相对于相同标量值的输入张量的梯度。

在 PyTorch 中，我们可以通过定义torch.autograd.Function的子类并实现forward和backward函数来轻松定义自己的 autograd 运算符。然后，我们可以通过构造实例并像调用函数一样调用新的 autograd 运算符，并传递包含输入数据的张量。

在此示例中，我们定义了自己的自定义 autograd 函数来执行 ReLU 非线性，并使用它来实现我们的两层网络：

# -*- coding: utf-8 -*-
import torchclass MyReLU(torch.autograd.Function):"""We can implement our own custom autograd Functions by subclassingtorch.autograd.Function and implementing the forward and backward passeswhich operate on Tensors."""@staticmethoddef forward(ctx, input):"""In the forward pass we receive a Tensor containing the input and returna Tensor containing the output. ctx is a context object that can be usedto stash information for backward computation. You can cache arbitraryobjects for use in the backward pass using the ctx.save_for_backward method."""ctx.save_for_backward(input)return input.clamp(min=0)@staticmethoddef backward(ctx, grad_output):"""In the backward pass we receive a Tensor containing the gradient of the losswith respect to the output, and we need to compute the gradient of the losswith respect to the input."""input, = ctx.saved_tensorsgrad_input = grad_output.clone()grad_input[input < 0] = 0return grad_inputdtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)learning_rate = 1e-6
for t in range(500):# To apply our Function, we use Function.apply method. We alias this as 'relu'.relu = MyReLU.apply# Forward pass: compute predicted y using operations; we compute# ReLU using our custom autograd operation.y_pred = relu(x.mm(w1)).mm(w2)# Compute and print lossloss = (y_pred - y).pow(2).sum()if t % 100 == 99:print(t, loss.item())# Use autograd to compute the backward pass.loss.backward()# Update weights using gradient descentwith torch.no_grad():w1 -= learning_rate * w1.gradw2 -= learning_rate * w2.grad# Manually zero the gradients after updating weightsw1.grad.zero_()w2.grad.zero_()

nn包

计算图和 autograd 是定义复杂运算符并自动采用导数的非常强大的范例。但是对于大型神经网络，原始的 autograd 可能会有点太低了。

在构建神经网络时，我们经常想到将计算安排在层中，其中某些层具有可学习的参数，这些参数将在学习期间进行优化。

在 TensorFlow 中，像 Keras ， TensorFlow-Slim 和 TFLearn 之类的软件包在原始计算图上提供了更高层次的抽象，可用于构建神经网络。

在 PyTorch 中，nn包也达到了相同的目的。 nn包定义了一组模块，它们大致等效于神经网络层。模块接收输入张量并计算输出张量，但也可以保持内部状态，例如包含可学习参数的张量。 nn程序包还定义了一组有用的损失函数，这些函数通常在训练神经网络时使用。

在此示例中，我们使用nn包来实现我们的两层网络：

# -*- coding: utf-8 -*-
import torch# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(torch.nn.Linear(D_in, H),torch.nn.ReLU(),torch.nn.Linear(H, D_out),
)# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')learning_rate = 1e-4
for t in range(500):# Forward pass: compute predicted y by passing x to the model. Module objects# override the __call__ operator so you can call them like functions. When# doing so you pass a Tensor of input data to the Module and it produces# a Tensor of output data.y_pred = model(x)# Compute and print loss. We pass Tensors containing the predicted and true# values of y, and the loss function returns a Tensor containing the# loss.loss = loss_fn(y_pred, y)if t % 100 == 99:print(t, loss.item())# Zero the gradients before running the backward pass.model.zero_grad()# Backward pass: compute gradient of the loss with respect to all the learnable# parameters of the model. Internally, the parameters of each Module are stored# in Tensors with requires_grad=True, so this call will compute gradients for# all learnable parameters in the model.loss.backward()# Update the weights using gradient descent. Each parameter is a Tensor, so# we can access its gradients like we did before.with torch.no_grad():for param in model.parameters():param -= learning_rate * param.grad

优化模型

到目前为止，我们通过手动更改持有可学习参数的张量(使用torch.no_grad()或.data来避免在自动分级中跟踪历史记录）来更新模型的权重。对于像随机梯度下降这样的简单优化算法而言，这并不是一个巨大的负担，但是在实践中，我们经常使用更复杂的优化器(例如 AdaGrad，RMSProp，Adam 等）来训练神经网络。

PyTorch 中的optim软件包抽象了优化算法的思想，并提供了常用优化算法的实现。

在此示例中，我们将使用nn包像以前一样定义我们的模型，但是我们将使用optim包提供的 Adam 算法优化模型：

# -*- coding: utf-8 -*-
import torch# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(torch.nn.Linear(D_in, H),torch.nn.ReLU(),torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):# Forward pass: compute predicted y by passing x to the model.y_pred = model(x)# Compute and print loss.loss = loss_fn(y_pred, y)if t % 100 == 99:print(t, loss.item())# Before the backward pass, use the optimizer object to zero all of the# gradients for the variables it will update (which are the learnable# weights of the model). This is because by default, gradients are# accumulated in buffers( i.e, not overwritten) whenever .backward()# is called. Checkout docs of torch.autograd.backward for more details.optimizer.zero_grad()# Backward pass: compute gradient of the loss with respect to model# parametersloss.backward()# Calling the step function on an Optimizer makes an update to its# parametersoptimizer.step()

自定义 nn 模块

有时，您将需要指定比一系列现有模块更复杂的模型。对于这些情况，您可以通过子类化nn.Module并定义一个forward来定义自己的模块，该模块使用其他模块或在 Tensors 上的其他自动转换操作来接收输入 Tensors 并生成输出 Tensors。

在此示例中，我们将两层网络实现为自定义的 Module 子类：

# -*- coding: utf-8 -*-
import torchclass TwoLayerNet(torch.nn.Module):def __init__(self, D_in, H, D_out):"""In the constructor we instantiate two nn.Linear modules and assign them asmember variables."""super(TwoLayerNet, self).__init__()self.linear1 = torch.nn.Linear(D_in, H)self.linear2 = torch.nn.Linear(H, D_out)def forward(self, x):"""In the forward function we accept a Tensor of input data and we must returna Tensor of output data. We can use Modules defined in the constructor aswell as arbitrary operators on Tensors."""h_relu = self.linear1(x).clamp(min=0)y_pred = self.linear2(h_relu)return y_pred# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):# Forward pass: Compute predicted y by passing x to the modely_pred = model(x)# Compute and print lossloss = criterion(y_pred, y)if t % 100 == 99:print(t, loss.item())# Zero gradients, perform a backward pass, and update the weights.optimizer.zero_grad()loss.backward()optimizer.step()

控制流+权重共享?是不是级联的思想？

作为动态图和权重共享的示例，我们实现了一个非常奇怪的模型：一个完全连接的 ReLU 网络，该网络在每个前向传递中选择 1 到 4 之间的随机数，并使用那么多隐藏层，多次重复使用相同的权重计算最里面的隐藏层。

对于此模型，我们可以使用常规的 Python 流控制来实现循环，并且可以通过在定义前向传递时简单地多次重复使用同一模块来实现最内层之间的权重共享。

我们可以轻松地将此模型实现为 Module 子类：

# -*- coding: utf-8 -*-
import random
import torchclass DynamicNet(torch.nn.Module):def __init__(self, D_in, H, D_out):"""In the constructor we construct three nn.Linear instances that we will usein the forward pass."""super(DynamicNet, self).__init__()self.input_linear = torch.nn.Linear(D_in, H)self.middle_linear = torch.nn.Linear(H, H)self.output_linear = torch.nn.Linear(H, D_out)def forward(self, x):"""For the forward pass of the model, we randomly choose either 0, 1, 2, or 3and reuse the middle_linear Module that many times to compute hidden layerrepresentations.Since each forward pass builds a dynamic computation graph, we can use normalPython control-flow operators like loops or conditional statements whendefining the forward pass of the model.Here we also see that it is perfectly safe to reuse the same Module manytimes when defining a computational graph. This is a big improvement from LuaTorch, where each Module could be used only once."""h_relu = self.input_linear(x).clamp(min=0)for _ in range(random.randint(0, 3)):h_relu = self.middle_linear(h_relu).clamp(min=0)y_pred = self.output_linear(h_relu)return y_pred# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):# Forward pass: Compute predicted y by passing x to the modely_pred = model(x)# Compute and print lossloss = criterion(y_pred, y)if t % 100 == 99:print(t, loss.item())# Zero gradients, perform a backward pass, and update the weights.optimizer.zero_grad()loss.backward()optimizer.step()

09月27日 pytorch与resnet（二）相关推荐

2013年09月27日
原文地址:2013年09月27日作者:韩寒这么多年来,一直是我脚下的流沙裹着我四处漂泊,它也不淹没我,它只是时不时提醒我,你没有别的选择,否则你就被风吹走了.我就这么浑浑噩噩地度过了我所有热血的岁 ...
比特币早期投资家：没有人能够阻止其发展 TechWeb 09-27 09:10 凤凰科技讯据CNBC网站北京时间9月27日报道，风险投资家、“Social+Capital”基金创始人Chamath
比特币早期投资家:没有人能够阻止其发展 TechWeb 09-27 09:10 凤凰科技讯据CNBC网站北京时间9月27日报道,风险投资家."Social+Capital"基金创 ...
解密谷歌机器学习工程最佳实践——机器学习43条军规翻译 2017年09月19日 10:54:58 98310 本文是对Rules of Machine Learning: Best Practice
解密谷歌机器学习工程最佳实践--机器学习43条军规翻译 2017年09月19日 10:54:58 983 1 0 本文是对Rules of Machine Learning: Best Practi ...
【财经期刊FM-Radio｜2020年09月25日】
[财经期刊FM-Radio|2020年09月25日] 微信公众号: 张良信息咨询服务工作室 [今日热点新闻一览↓↓] 要闻: 欧股创三月新低,美股大震荡最终惊险收涨,期银抹平6%跌幅转涨,天然气两日高 ...
8月27日计算机视觉理论学习笔记——图说
文章目录前言一.RNN 二.LSTM 1.数学模型三.图说模型 Image Captioning 1.State-of-the-art 模型组成 2.NIC 模型 3.Beam Search 4 ...
交通规划、交通设计主题汇总（更新至2023年1月27日）
本部分以交通设计为主线,兼顾交通分析.交通规划.城市总体规划和相关导则和指南. (点击链接,可查看相关文档介绍.加入智能交通技术星球,可下载相关文档.) 交通规划 (更新至2023年1月27日) 研究 ...
6月27日服务器例行维护公告,2019年06月27日维护公告
原标题:2019年06月27日维护公告亲爱的玩家: 为了保证服务器的稳定和服务质量,<大话西游2经典版>将于2019年06月27日(本周四)早上8:00停机,进行每周例行的维护工作,预计 ...
2021年4月27日华为Cloud AI 通用软件开发实习面试（一面）
title: 2021年4月27日华为Cloud AI 通用软件开发实习面试(一面) tags: 面经 2021年4月27日华为Cloud AI 通用软件开发实习面试(一面) 自我介绍(这个地方由 ...
御剑情缘服务器维护,御剑情缘7月27日更新维护内容及活动详解介绍
导读御剑情缘7月27日更新维护了什么内容?御剑情缘目前在7月27日为玩家们进行了游戏更新,不少小伙伴们还不清楚有哪些玩法吧!下面是御剑情缘7月27日更新维护内容及活动详解介绍,一起来看下吧! ▲燕 ...
PANDAS 数据合并与重塑（concat篇）原创 2016年09月13日 19:26:30 47784 pandas作者Wes McKinney 在【PYTHON FOR DATA ANALYS
PANDAS 数据合并与重塑(concat篇) 原创 2016年09月13日 19:26:30 标签: 47784 编辑删除 pandas作者Wes McKinney 在[PYTHON FOR DA ...

09月27日 pytorch与resnet（二）

文章目录