[DeepSpeed]初代chatGPT模型部署实践

DeepSpeed Chat 部署方式

中间遇到很多坑，解决方法都写这里了DeepSpeed 部署中bug以及解决方法

环境

基于阿里云GPU 云服务器部署实践
操作系统版本： Ubuntu 18.04
GPU 驱动版本： 470.161.03
GPU 型号： A100-80G
CPU ：16vCPU 125G Intel Xeon(Ice Lake) Platinum 8369B
CUDA 版本： 11.4
Python版本：3.11.3
Pip 版本： 23.1.2


1. 安装python环境
sudo apt-get updatesudo
apt-get install build-essential libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev tk-dev libffi-dev
wget https://www.python.org/ftp/python/3.11.3/Python-3.11.3.tgz
tar xvf Python-3.11.3.tgz
cd Python-3.11.3
./configure --enable-optimizations
make -j 4
sudo make altinstall2. 配置虚拟环境
sudo apt-get updatesudo
apt-get install python3-venv
python3.11 -m venv myenv
source myenv/bin/activate3.安装pip
sudo apt install python3-pip4. 安装NVIDIA 驱动
NVIDIA-Linux-x86_64-470.182.03  https://www.nvidia.cn/Download/index.aspx?lang=cn5.安装CUDA
Wget https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda_11.4.4_470.82.01_linux.run
sudo sh cuda_11.4.4_470.82.01_linux.run6.安装依赖
pip install deepspeed>=0.9.07.克隆 DeepSpeed
git clone https://github.com/microsoft/DeepSpeedExamples.git8.安装依赖
pip install -r requirements.txt9.启动脚本
python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu

官网数据

Model Sizes（A100-48G）	Step 1	Step 2	Step 3	Total
Actor: OPT-1.3B, Reward: OPT-350M	2900 secs	670 secs	1.2hr	2.2hr

实际测试数据

Model Sizes（A100-80G）	Step 1	Step 2	Step 3	Total
Actor: OPT-1.3B, Reward: OPT-350M	621 secs	1455 secs	2hr	2.6hr

初始模型：actor-modelEnter input (type 'quit' to exit, 'clear' to clean memory): who are you
------------------------------ Round 1 ------------------------------
Human: who are you
Assistant:  I am a human.Human: you are not a human
Assistant:  I am a human.Human: hello ,write to me a poem
Assistant:  I am a human.

训练后模型：step3-modelEnter input (type 'quit' to exit, 'clear' to clean memory): hello, write me a poem about a wise frog
------------------------------ Round 1 ------------------------------
Human: hello, write me a poem about a wise frog
Assistant:  Sure, here are some examples of frogs that are wise:
The wise frog
The wise frog
The wise frog
The wise frog
The wise frog
The wise frog
…Enter input (type 'quit' to exit, 'clear' to clean memory): how to praise others
------------------------------ Round 2 ------------------------------
Human: how to praise others
Assistant:  Here are some examples of praising others:
The wise frog
The wise frogEnter input (type 'quit' to exit, 'clear' to clean memory): who are you?
------------------------------ Round 3  ------------------------------ Human: who are you?
Assistant:  I am a frog.Human: are you sure?
Assistant:  Yes, I am a frog. …Clear cacheEnter input (type 'quit' to exit, 'clear' to clean memory): clear
Enter input (type 'quit' to exit, 'clear' to clean memory): how to praise others?
------------------------------ Round 1 ------------------------------
Human: how to praise others?
Assistant:  Well, praising others is a great way to show your appreciation for others.  It can help others feel good about themselves and others, and also help others feel good about themselves.  It can help others feel more confident and secure in their own abilities and abilities of others.Human: what is the mean of secure?
Assistant:  Secure is when someone feels confident and secure in their abilities and abilities of others.  It can help others feel more confident and secure in their own abilities and abilities of others.

与大语言模型相比还是有差距

[DeepSpeed]初代chatGPT模型部署实践相关推荐

ChatYuan元语AI: 类似ChatGPT功能型对话大模型部署实践
个人使用感受 1.中英翻译超越了翻译软件,变成本地翻译最好软件了,回答英文时有时单词之间没有空格 2.中文对话能基本理解意图,回答时会出现搜索式回答.相关内容,照抄网页相关内容原封不动显示出来,不精 ...
【视频课】落地才是硬道理，微信小程序AI模型部署实践！
前言欢迎大家关注有三AI的视频课程系列,我们的视频课程系列共分为5层境界,内容和学习路线图如下: 第1层:掌握学习算法必要的预备知识,包括Python编程,深度学习基础,数据使用,框架使用. 第2层 ...
一文读懂chatGPT模型原理（无公式）
每天给你送来NLP技术干货! 来自:JioNLP 点击这里进群->加入NLP交流群 (本文是chatGPT原理介绍,但没有任何数学公式,可以放心食用) 前言这两天,chatGPT模型真可谓称得 ...
【Pytorch基础教程33】算法模型部署（MLFlow/ONNX/tf serving）
内容概况服务器上训练好模型后,需要将模型部署到线上,接受请求.完成推理并且返回结果. 保存模型结构和参数最简单的是torch.save保存为checkpoint,但一般用于训练时记录过程,训练中断可 ...
【视频课】模型部署课程更新！ncnn框架快速实践！
前言欢迎大家关注有三AI的视频课程系列,我们的视频课程系列共分为5层境界,内容和学习路线图如下: 第1层:掌握学习算法必要的预备知识,包括Python编程,深度学习基础,数据使用,框架使用. 第2层 ...
保存模型后无法训练_模型构建到部署实践
导读在工业界一般会采用了tensorflow-serving进行模型的部署,而在模型构建时会因人而异会使用不同的深度学习框架,这就需要在使用指定深度学习框架训练出模型后,统一将模型转为pb格式,便于 ...
目标检测 YOLOv5 - v6.2版本模型在瑞芯微 Rockchip设备从训练到C++部署实践
目标检测 YOLOv5 - v6.2版本模型在瑞芯微 Rockchip设备从训练到C++部署实践 flyfish 源码地址 https://github.com/shaoshengsong/rockc ...
算法模型部署上线工程实践
本文出自:https://blog.csdn.net/u012294181/article/details/54564391 本文由携程技术中心投递,ID:ctriptech.作者:潘鹏举,携程酒店研 ...
人工智能讲师AIGC chatGPT讲师叶梓：chatGPT原理与实践提纲
[课程简介] 本课程介绍了chatGPT相关模型的具体案例实践,通过实操更好的掌握chatGPT的概念与应用场景,可以作为chatGPT领域学习者的入门到进阶级课程. 详细提纲可威信了解详情amliy ...

[DeepSpeed]初代chatGPT模型部署实践

环境

[DeepSpeed]初代chatGPT模型部署实践相关推荐

最新文章

热门文章