colab的使用
把训练集取出一部分作为验证集
选择特征
tqdm,tensorboard使用
在训练和验证时要关闭梯度计算
要把模型和数据放在同一个device上
保证模型可复现性
pytorch和numpy在生成数组上的区别

一、colab的使用

1 把谷歌云盘mount到colab上面:

from google.colab import drive
drive.mount('/content/device')

/ 表示当前站点的根目录(域名映射的硬盘目录)

2魔法方法:
percentage (%) affects the process associated with the notebook, and it is called a magic command.

%cd /content/device/MyDrive/

Use % instead of ! for cd (change directory) command

3其他命令
exclamation mark (!) starts a new shell, does the operations, and
then kills that shell

二、从train_set提取做验证集

def train_valid_split(data_set, valid_ratio, seed):'''Split provided training data into training set and validation set'''valid_set_size = int(valid_ratio * len(data_set)) train_set_size = len(data_set) - valid_set_sizetrain_set, valid_set = random_split(data_set, [train_set_size, valid_set_size], generator=torch.Generator().manual_seed(seed))#返回的是dataset类,实现了__getitem__方法                                                                                                                           # fix the generator for reproducible resultsreturn np.array(train_set), np.array(valid_set)#必须得有这一步,否则返回的和列表差不多,无法选择特征(

三、选择特征

def select_feat(train_data, valid_data, test_data, select_all=True):'''Selects useful features to perform regression'''y_train, y_valid = train_data[:,-1], valid_data[:,-1]#所有行的最后一列raw_x_train, raw_x_valid, raw_x_test = train_data[:,37:-1], valid_data[:,37:-1], test_data#所有行且37到倒数第二列if select_all:feat_idx = list(range(raw_x_train.shape[1]))else:feat_idx = [0,1,2,3,4] # TODO: Select suitable feature columns.return raw_x_train[:,feat_idx], raw_x_valid[:,feat_idx], raw_x_test[:,feat_idx], y_train, y_valid

四、tdqm,tensorboard使用

显示进度条

 for epoch in range(n_epochs):model.train() # Set your model to train mode.loss_record = []# tqdm is a package to visualize your training progress.train_pbar = tqdm(train_loader, position=0, leave=True)for x, y in train_pbar:optimizer.zero_grad()               # Set gradient to zero.x, y = x.to(device), y.to(device)   # Move your data to device. pred = model(x)             loss = criterion(pred, y)loss.backward()                     # Compute gradient(backpropagation).optimizer.step()                    # Update parameters.step += 1loss_record.append(loss.detach().item())# Display current epoch number and loss on tqdm progress bar.train_pbar.set_description(f'Epoch [{epoch+1}/{n_epochs}]')train_pbar.set_postfix({'loss': loss.detach().item()})mean_train_loss = sum(loss_record)/len(loss_record)writer.add_scalar('Loss/train', mean_train_loss, step)

五、训练和验证进入验证模式,关闭梯度计算

        model.eval() # Set your model to evaluation mode.loss_record = []for x, y in valid_loader:x, y = x.to(device), y.to(device)with torch.no_grad():pred = model(x)loss = criterion(pred, y)loss_record.append(loss.item())

为什么要进入验证模式:
关闭batch_Norm和dropout
为什么要停止梯度计算
不需要更新模型,不需要求梯度了

六、模型和数据在同一个设备上

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = My_Model(input_dim=x_train.shape[1]).to(device)for x, y in valid_loader:x, y = x.to(device), y.to(device)

七、保证模型可复现性

def same_seed(seed): '''Fixes random number generator seeds for reproducibility.'''torch.backends.cudnn.deterministic = True#每次返回的卷积算法将是确定的torch.backends.cudnn.benchmark = False#A bool that, if True, causes cuDNN to benchmark multiple convolution algorithms and select the fastest.np.random.seed(seed)#保证每次生成随机数一样(可是np怎么确保torch)torch.manual_seed(seed)#设置生成随机数的种子if torch.cuda.is_available():torch.cuda.manual_seed_all(seed)#为当前GPU设置生成随机数的种子

八、pytorch和numpy在生成数组上的区别

numpy可以用含有numpy数组的列表生成数组,但pytorch不可以

李宏毅2022机器学习HW1收获相关推荐

李宏毅2022机器学习HW2解析
准备工作:去课程github下载原始代码,kaggle下载数据集.或者关注本公众号,下载代码和数据集(文末有方法).解压数据集,出现libriphone文件夹,将文件和代码放到同一目录下. kaggl ...
李宏毅2022机器学习HW5解析
准备工作作业五是机器翻译,需要助教代码,运行代码过程中保持联网可以自动下载数据集,已经有数据集的情况可关闭助教代码中的下载数据部分.关注本公众号,可获得代码和数据集(文末有方法). 提交地址这次作 ...
李宏毅2022机器学习HW10解析
准备工作作业十是黑箱攻击(Blackbox Attack),完成作业需要助教代码和数据集,运行代码过程中保持联网可以自动下载数据集,已经有数据集的情况可关闭助教代码中的下载数据部分.关注本公众号,可 ...
李宏毅2022机器学习hw6
目录 Machine Learning HW6 一.任务二.数据集 Crypko: 三.结果四.改进方法 4.
【李宏毅《机器学习》2022】作业1：COVID 19 Cases Prediction (Regression)
文章目录 [李宏毅<机器学习>2022]作业1:COVID 19 Cases Prediction (Regression) 作业内容 1.目标 2.任务描述 3.数据 4.评价指标代码 ...
李宏毅《机器学习》国语课程(2022)来了
提起李宏毅老师,熟悉机器学习的读者朋友一定不会陌生.很多人选择的机器学习入门学习材料都是李宏毅老师的台大公开课视频.今年李宏毅老师开设一门新的机器学习机器学习课程,涵盖最新热门主题,非常值得关注! 李 ...
【千呼万唤】李宏毅《机器学习》国语课程(2022)终于来了
提起李宏毅老师,熟悉机器学习的读者朋友一定不会陌生.很多人选择的机器学习入门学习材料都是李宏毅老师的台大公开课视频.今年李宏毅老师开设一门新的机器学习机器学习课程,涵盖最新热门主题,非常值得关注! 李 ...
李宏毅机器学习HW1
本博文主要是完成李宏毅机器学习HW1作业作业连接:https://ntumlta2019.github.io/ml-web-hw1/ 作业规则所有代码必须用python3.6编写允许所有pyth ...
李宏毅机器学习-HW1
文章目录前言一.分析目标二.数据预处理 1.初步处理 2.特征提取 3.Normalize和切分训练集和验证集三.训练四.验证五.预测 1.数据预处理 2.预测 3.写入文件总结前言 ...

李宏毅2022机器学习HW1收获