docker部署tf-serving
从头开始的原因是环境变了,我这里死活装不了tf-model-server,试了从Git clone,然后build,但就是失败,有了sudo权限后,按照此博文的操作仍旧不行,窝草,能咋办,只有docker了。
参考官方文档
For Recommendation in Deep learning QQ Group 277356808
For deep learning QQ Second Group 629530787
I'm here waiting for you
1-docker基本操作
依次执行如下操作,可以是可以,但是怎么终止(kill)呢?懵逼了
docker pull tensorflow/serving
git clone https://github.com/tensorflow/serving
# Location of demo models
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"
# Start TensorFlow Serving container and open the REST API port
docker run -t --rm -p 8501:8501 \-v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \-e MODEL_NAME=half_plus_two \tensorflow/serving &
# Query the model using the predict API
curl -d '{"instances": [1.0, 2.0, 5.0]}' \-X POST http://localhost:8501/v1/models/half_plus_two:predict# Returns => { "predictions": [2.5, 3.0, 4.5] }
采用下面的这句根本不对
# docker kill serving_base
Error response from daemon: Cannot kill container: serving_base: No such container: serving_base
serving_base是啥玩意,也没个解释,这就是坑,试了多少次了,model_name也试了
结果一查docker kill 需要知道容器id或者容器名字才可以,emm,上面的代码运行后的容器名字是啥???用docker ps可以查到容器id,但是查不到容器名字【能查到的,调下xshell显示,或者显示的字体大小】,执行了一句docker代码就不知道它的容器名字???????????????我完全不敢相信这个玩意,基本的东西都不能get
因为这一句没有加-name,如下:那就没有默认的name么???
docker run -t --rm -p 8501:8501 \-v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \-e MODEL_NAME=half_plus_two \tensorflow/serving &
乱拳打死老师傅。
原来ps是有name的(只不过xshell显示的有问题,不完整),而且这个默认的名字不是一两个,挺多的,规则就是形容词+人名,如下第一个是这句运行的
可以拿这个进行kill或者id也可以kill了,为了kill方便,还是建议给你的容器取个名字吧。
2-docker启动tf-serving模型,比如dssm模型
前期工作:创建一个/models文件夹,这是docker默认的model地址(?这是我说的,应该理解的对)这个文件夹在哪个盘呢?【还有这么点空间】
df -h /models/
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 493G 346G 127G 74% /
给模型命名:仍旧取个名字叫狗剩(哈哈,不不),叫mydssm
port为8020,但日志还是8501,这???有Nginx转发?不然-p有个屁用??
docker run -p 8020:8020 --mount type=bind,source=。。/ods_new/ckpt/log/export/final/,target=/models/mydssm -e MODEL_NAME=mydssm -t tensorflow/serving
2022-06-08 07:02:50.047819: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config: model_name: mydssm model_base_path: /models/mydssm
2022-06-08 07:02:50.048163: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-06-08 07:02:50.048200: I tensorflow_serving/model_servers/server_core.cc:591] (Re-)adding model: mydssm
2022-06-08 07:02:50.149259: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: mydssm version: 1654579058}
2022-06-08 07:02:50.149305: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: mydssm version: 1654579058}
2022-06-08 07:02:50.149321: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: mydssm version: 1654579058}
2022-06-08 07:02:50.149366: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /models/mydssm/1654579058
2022-06-08 07:02:50.158082: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2022-06-08 07:02:50.158123: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/mydssm/1654579058
2022-06-08 07:02:50.158254: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-08 07:02:50.209774: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2022-06-08 07:02:50.215191: I external/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2499990000 Hz
2022-06-08 07:02:52.476660: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /models/mydssm/1654579058
2022-06-08 07:02:52.490406: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 2341032 microseconds.
2022-06-08 07:02:52.492201: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /models/mydssm/1654579058/assets.extra/tf_serving_warmup_requests
2022-06-08 07:02:52.494217: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: mydssm version: 1654579058}
2022-06-08 07:02:52.500115: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
2022-06-08 07:02:52.500226: I tensorflow_serving/model_servers/server.cc:367] Profiler service is enabled
2022-06-08 07:02:52.502034: I tensorflow_serving/model_servers/server.cc:393] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2022-06-08 07:02:52.509072: I tensorflow_serving/model_servers/server.cc:414] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
经ps查看,如下:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ba9da02bd451 tensorflow/serving "/usr/bin/tf_serving…" 3 minutes ago Up 3 minutes 0.0.0.0:8020->8020/tcp, 8500-8501/tcp sleepy_bell
这是俩端口都能用???窝草,又忘了给容器取名字了,还是统一下吧,容器名字和模型名字统一
都叫狗剩(还是mydssm吧)
上述命令后面最好加个&,不然真是麻烦(如下,ctrl+C都不行),emmm,日志记在哪里?docker的
没有解决这个问题,直接用另一个shell页面kill命令杀掉了,
日志:能记录任意docker -run启动的日志,sleepy_bell为容器名字或id
docker logs -f sleepy_bell
docker logs --tail 200 sleepy_bell
但我这个tf-serving肯定QPS很多,积累一周磁盘都爆炸了,所有服务全都over了,这咋行?
emm我查了下需要设置个daemon.json文件,放在/etc/docker/下面,需要root权限哈。然后重启
由于服务暂时不由我部署(我写个干啥??为了测试我的结果与别人部署的结果是否一致啊,这个很重要啊,不然过程不对结果咋可能对),暂时略过。
日志的位置在哪?哪个文件夹?
/var/lib/docker文件夹下,这个文件夹的磁盘为与/models 所在磁盘一致,同样很小的空间,具体刚才的日志为,为啥时间与北京时间不同呢?
tail /var/lib/docker/containers/b498dc/b498dc-json.log
{"log":"2022-06-08 07:02:52.476660: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /models/mydssm/1654579058\r\n","stream":"stdout","time":"2022-06-08T07:02:52.480938824Z"}
{"log":"2022-06-08 07:02:52.490406: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 2341032 microseconds.\r\n","stream":"stdout","time":"2022-06-08T07:02:52.494551721Z"}
{"log":"2022-06-08 07:02:52.492201: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /models/mydssm/1654579058/assets.extra/tf_serving_warmup_requests\r\n","stream":"stdout","time":"2022-06-08T07:02:52.494573768Z"}
{"log":"2022-06-08 07:02:52.494217: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: mydssm version: 1654579058}\r\n","stream":"stdout","time":"2022-06-08T07:02:52.494578223Z"}
{"log":"2022-06-08 07:02:52.500115: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models\r\n","stream":"stdout","time":"2022-06-08T07:02:52.50185224Z"}
{"log":"2022-06-08 07:02:52.500226: I tensorflow_serving/model_servers/server.cc:367] Profiler service is enabled\r\n","stream":"stdout","time":"2022-06-08T07:02:52.501872649Z"}
{"log":"2022-06-08 07:02:52.502034: I tensorflow_serving/model_servers/server.cc:393] Running gRPC ModelServer at 0.0.0.0:8500 ...\r\n","stream":"stdout","time":"2022-06-08T07:02:52.50481335Z"}
{"log":"[warn] getaddrinfo: address family for nodename not supported\r\n","stream":"stdout","time":"2022-06-08T07:02:52.509194922Z"}
{"log":"2022-06-08 07:02:52.509072: I tensorflow_serving/model_servers/server.cc:414] Exporting HTTP/REST API at:localhost:8501 ...\r\n","stream":"stdout","time":"2022-06-08T07:02:52.509213227Z"}
{"log":"[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...\r\n","stream":"stdout","time":"2022-06-08T07:02:52.509217497Z"}
docker时间为UTC时间(可进入运行中的容器内查询date),而本地时间为CST时间(直接Linux下的date即可),相差八个小时。
3-验证接口,请求,返回结果。
docker run -p 8020:8020 --name mydssm --mount type=bind,source=。。。/ods_new/ckpt/log/export/final/,target=/models/mydssm -e MODEL_NAME=mydssm -t tensorflow/serving &
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
521352f60b14 tensorflow/serving "/usr/bin/tf_serving…" 53 seconds ago Up 52 seconds 0.0.0.0:8020->8020/tcp, 8500-8501/tcp mydssm
已经运行了。下面整个requests吧,依旧先采用此文(本文不再贴命令了,参考链接吧)的方法,验证下传参。
如下serving_default
The given SavedModel SignatureDef contains the following input(s):inputs['ba_gender'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_5:0inputs['ba_num'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_4:0inputs['br_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_11:0inputs['cate_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_12:0inputs['cate2_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_13:0inputs['city_level'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_6:0inputs['user_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_1:0inputs['hbaby'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_2:0inputs['pre_month'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_8:0inputs['province_name'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_7:0inputs['spwku'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_10:0inputs['unit_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_14:0inputs['user_status'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_9:0inputs['user_gender'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_3:0
The given SavedModel SignatureDef contains the following output(s):outputs['item_emb'] tensor_info:dtype: DT_STRINGshape: (-1)name: ReduceJoin_1/ReduceJoin:0outputs['item_tower_feature'] tensor_info:dtype: DT_STRINGshape: (-1)name: ReduceJoin_3/ReduceJoin:0outputs['logits'] tensor_info:dtype: DT_FLOATshape: (-1)name: Reshape:0outputs['probs'] tensor_info:dtype: DT_FLOATshape: (-1)name: Sigmoid:0outputs['user_emb'] tensor_info:dtype: DT_STRINGshape: (-1)name: ReduceJoin/ReduceJoin:0outputs['user_tower_feature'] tensor_info:dtype: DT_STRINGshape: (-1)name: ReduceJoin_2/ReduceJoin:0
Method name is: tensorflow/serving/predict
一共是14个输入,鉴于我训练的时候编码了,那么输入的应该是编码的数据,
如下测试不可用,
saved_model_cli run --dir 。。/ods_new/ckpt/log/export/final/1654579058/ --tag_set serve --signature_def="serving_default" --input_examples='inputs=[{"ba_gender":[1.0],"ba_num":[1.0],"br_id":[2.0],"cate_id":[1.0],"cate2_id":[0.0],"city_level":[1.0],"user_id":[12.0],"hbaby":[1.0],"pre_month":[1.0],"province_name":[10.0],"swkpu":[123.0],"unit_id":[21.0],"user_status":[10.0],"user_gender":[1.0]}]'ValueError: "inputs" is not a valid input key. Please choose from 上面的keys
采用curl请求,同样不太行,考虑是端口防火墙问题。
* Trying 0.0.0.0:8020...
* TCP_NODELAY set
* Connected to 0.0.0.0 (127.0.0.1) port 8020 (#0)
> POST /v1/models/mydssm:predict HTTP/1.1
> Host: 0.0.0.0:8020
> User-Agent: curl/7.29.0
> Accept: */*
> Content-type: application/json
> Content-Length: 301
>
* upload completely sent off: 301 out of 301 bytes
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
如下是通的
telnet 0.0.0.0 8020
Trying 0.0.0.0...
Connected to 0.0.0.0.
Escape character is '^]'.
Connection closed by foreign host.
试验了原来的例子后,不改8501(例子的端口不要改),然后docker 容器名字也没改,这时候请求竟然对了。如下curl
curl -d '{"instances":[{"ba_gender":["1"],"ba_num":["1"],"brand_id":["1"],"cate_id":["1"],"cate2_id":["1"],"city_level":["1"],"user_id":["12"],"hbaby":["10"],"pre_month":["11"],"province_name":["10"],"skwupu":["123"],"unit_id":["21"],"user_status":["10"],"user_gender":["1"]}]}' -X POST http://localhost:8501/v1/models/mydssm:predict -H "Content-type: application/json" -v
结果如下:
* Trying ::1:8501...
* TCP_NODELAY set
* Connected to localhost (::1) port 8501 (#0)
> POST /v1/models/mydssm:predict HTTP/1.1
> Host: localhost:8501
> User-Agent: curl/7.29.0
> Accept: */*
> Content-type: application/json
> Content-Length: 303
>
* upload completely sent off: 303 out of 303 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 08 Jun 2022 10:20:25 GMT
< Content-Length: 1998
<
{"predictions": [{"item_tower_feature": "-0.027797,0.000017,-0.006938,-0.001139,0.003079,-0.005398,-0.001739,-0.001127,0.013546,0.005068,0.002782,-0.003913,-0.000324,-0.009340,-0.001513,0.019721,-0.045470,0.063666,0.037524,0.041532,0.051212,-0.056672,-0.035556,-0.066957,0.001011,0.014817,-0.002076,0.002347,-0.008887,0.000608,0.018399,0.038382,-0.004276,0.025609,-0.009526,-0.016691,-0.003511,-0.011110,-0.002436,-0.015159,-0.001276,0.000005,0.000391,-0.002247,-0.000523,0.000644,-0.000368,-0.000851","user_emb": "0.089035,0.040245,0.040139,0.664103,0.005765,-0.028758,0.021623,-0.053681,-0.008481,0.188313,0.054642,0.044388,-0.319955,-0.058820,0.457832,-0.170870,0.013058,-0.279777,-0.058252,0.014134,-0.053209,-0.007721,0.032050,-0.061035,0.019025,0.019308,0.016642,0.000556,0.014037,0.049075,0.113682,-0.228581","probs": 0.00208181143,"user_tower_feature": "-0.006469,0.010059,-0.007064,-0.009822,-0.008493,0.001392,0.001749,-0.001267,-0.000328,0.000245,0.000578,-0.000082,-0.000146,-0.000003,-0.000091,-0.000034,0.001661,0.000566,0.000125,-0.001460,0.001285,-0.001913,0.001268,-0.000943,-0.009679,-0.087564,0.005677,0.015227,-0.107948,0.003131,0.006085,0.065827,-0.083296,-0.131117,0.052877,-0.043534,-0.153981,-0.009984,0.126459,0.020243,0.003186,-0.003460,-0.005059,0.006127,0.000208,-0.000853,-0.001991,-0.002454,-0.012654,0.017914,-0.004242,0.017789,-0.002636,-0.000141,-0.008043,0.001162,-0.002597,-0.004423,0.001763,0.002975,-0.004164,-0.004006,0.001054,-0.005090,0.000239,-0.000280,0.000001,-0.000008,0.000170,-0.000037,-0.000122,0.000067","logits": -6.17243767,"item_emb": "-0.177807,0.008181,0.009537,-0.447484,0.041922,-0.018426,0.028242,0.017518,-0.030073,-0.297743,-0.004319,0.005439,0.373108,0.006467,-0.377950,-0.216073,0.076012,0.355724,0.000885,0.056294,0.000588,-0.070175,-0.001788,0.013873,0.034125,0.027787,0.031646,-0.053749,0.040622,-0.150684,-0.202083,0.367106"}]
* Connection #0 to host localhost left intact
下面写个requests(python),不再用saved_model_cli这个尝试了。
>>> import requests
>>> url2="http://localhost:8501/v1/models/mydssm:predict"
>>> js_data={"instances":[。。。。]}
>>> requests.post(url2,json=js_data)
<Response [200]>
>>> res=requests.post(url2,json=js_data)
>>> res.json()
{'predictions': [{'probs': 0.00208181143, 'user_tower_feature': '。。', 'logits': -6.17243767, 'item_emb': '。。', 'item_tower_feature': '。。', 'user_emb': 。。'}]}
没有毛病,但一个信息是,Java调用pb模型不成功,(统一训练的tf及maven版本号也不行),,咋回事,且看下次分解。
愿我们终有重逢之时,
而你还记得我们曾经讨论的话题。
docker部署tf-serving相关推荐
- 用Docker部署TensorFlow Serving服务
文章目录 1. 安装 Docker 2. 使用 Docker 部署 3. 请求服务 3.1 手写数字例子 3.2 猫狗分类例子 参考: https://tf.wiki/zh_hans/deployme ...
- 利用docker部署TF深度学习模型(附件文件较大,并无上传。部署参考步骤即可)
一.介绍 docker: Docker 是一个开源的应用容器引擎,基于 Go 语言 并遵从 Apache2.0 协议开源. Docker 可以让开发者打包他们的应用以及依赖包到一个轻量级.可移植的容器 ...
- Win10 基于Docker使用tensorflow serving部署模型
目录 安装Docker for Windows 安装 tensorflow-serving-api tensorflow serving on docker 测试tf server 方法3:grpc ...
- 使用docker部署vue_如何使用Vue,Docker和Azure进行持续交付
使用docker部署vue A few weeks ago at ng-conf, I announced the launch of vscodecandothat.com - a project ...
- win10 docker部署gpu项目
win10 docker部署gpu项目 nvidia-docker win10安装docker 制作镜像 ubuntu18.04部署docker gpu项目 安装docker 配置docker使用gp ...
- 示例详述Docker部署tensorflow-serving
Docker简单入门 一.前言 工作中,有时需要线下验证训练好的模型,是否能在线上serving成功,所以需要利用docker来简单部署tensorflow-serving,然后线下进行验证模型能否成 ...
- Centos8安装英伟达显卡驱动并通过docker部署深度学习环境
20201107 - 每个人的机器和实际需要的环境都不一样,本文只是提供了在自己实验室centos8上的部署过程,部署过程中,没有什么问题.请谨慎参考本篇文章,以免浪费宝贵时间. 0. 引言 之前的时 ...
- docker容器企业级实战——docker部署与操作实践
docker介绍 什么是docker Docker使用Go语言开发,基于Linux内核的cgroup.namespace以及Unionfs等技术,对进程进行封装隔离,属于操作系统层面的虚拟化技术,由于 ...
- Docker 部署 SpringBoot 项目整合 Redis 镜像做访问计数Demo
Docker 部署SpringBoot项目整合 Redis 镜像做访问计数Demo 最终效果如下 大概就几个步骤 1.安装 Docker CE 2.运行 Redis 镜像 3.Java 环境准备 4. ...
最新文章
- 【Android Gradle 插件】Module 目录下 build.gradle 配置文件 ( plugins 闭包代码块中引入插件 | PluginAware#apply 方法引入插件 )
- Excel 2016中的新增函数之CONCAT
- 第三次学JAVA再学不好就吃翔(part92)--Map集合的遍历
- delphi webbrowser 经常使用的演示样本
- 修复Ubuntu系列pip
- python 获取唯一值_从Python列表中获取唯一值
- idea编译器没有tomcat的选项解决方案
- H.264熵编码分析
- 云盘上传一直显示服务器出错_百度云盘一直服务器忙 百度网盘出现服务器错误...
- 有些路,只能一个人走。
- 离婚时,住房公积金分割吗?
- iOS自动自动隐藏软键盘
- 阿里云创世纪之盘古传奇
- 常见DB2锁等待解决流程
- UI层自动化测试介绍
- ETCgame移动端上线,预测游戏世界杯预测触手可及
- 用水流的概念来玩硬件(一)----阻抗匹配
- 张俊林:万字长文讲述由ChatGPT反思大语言模型的技术精要
- 【产品经理】产品体验报告准备
- 敏捷 scrum_重新想象您的Scrum以增强敏捷性