从头开始的原因是环境变了,我这里死活装不了tf-model-server,试了从Git clone,然后build,但就是失败,有了sudo权限后,按照此博文的操作仍旧不行,窝草,能咋办,只有docker了。

参考官方文档

For Recommendation in Deep learning QQ Group 277356808

For deep learning QQ Second Group 629530787

I'm here waiting for you

1-docker基本操作

依次执行如下操作,可以是可以,但是怎么终止(kill)呢?懵逼了

docker pull tensorflow/serving
git clone https://github.com/tensorflow/serving
# Location of demo models
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"
# Start TensorFlow Serving container and open the REST API port
docker run -t --rm -p 8501:8501 \-v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \-e MODEL_NAME=half_plus_two \tensorflow/serving &
# Query the model using the predict API
curl -d '{"instances": [1.0, 2.0, 5.0]}' \-X POST http://localhost:8501/v1/models/half_plus_two:predict# Returns => { "predictions": [2.5, 3.0, 4.5] }

采用下面的这句根本不对

# docker kill serving_base
Error response from daemon: Cannot kill container: serving_base: No such container: serving_base

serving_base是啥玩意,也没个解释,这就是坑,试了多少次了,model_name也试了

结果一查docker kill 需要知道容器id或者容器名字才可以,emm,上面的代码运行后的容器名字是啥???用docker ps可以查到容器id,但是查不到容器名字【能查到的,调下xshell显示,或者显示的字体大小】,执行了一句docker代码就不知道它的容器名字???????????????我完全不敢相信这个玩意,基本的东西都不能get

因为这一句没有加-name,如下:那就没有默认的name么???

docker run -t --rm -p 8501:8501 \-v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \-e MODEL_NAME=half_plus_two \tensorflow/serving &

乱拳打死老师傅。

原来ps是有name的(只不过xshell显示的有问题,不完整),而且这个默认的名字不是一两个,挺多的,规则就是形容词+人名,如下第一个是这句运行的

可以拿这个进行kill或者id也可以kill了,为了kill方便,还是建议给你的容器取个名字吧。

2-docker启动tf-serving模型,比如dssm模型

前期工作:创建一个/models文件夹,这是docker默认的model地址(?这是我说的,应该理解的对)这个文件夹在哪个盘呢?【还有这么点空间】

df -h /models/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       493G  346G  127G  74% /

给模型命名:仍旧取个名字叫狗剩(哈哈,不不),叫mydssm

port为8020,但日志还是8501,这???有Nginx转发?不然-p有个屁用??

docker run -p 8020:8020 --mount type=bind,source=。。/ods_new/ckpt/log/export/final/,target=/models/mydssm -e MODEL_NAME=mydssm -t tensorflow/serving
2022-06-08 07:02:50.047819: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config:  model_name: mydssm model_base_path: /models/mydssm
2022-06-08 07:02:50.048163: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-06-08 07:02:50.048200: I tensorflow_serving/model_servers/server_core.cc:591]  (Re-)adding model: mydssm
2022-06-08 07:02:50.149259: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: mydssm version: 1654579058}
2022-06-08 07:02:50.149305: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: mydssm version: 1654579058}
2022-06-08 07:02:50.149321: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: mydssm version: 1654579058}
2022-06-08 07:02:50.149366: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /models/mydssm/1654579058
2022-06-08 07:02:50.158082: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2022-06-08 07:02:50.158123: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/mydssm/1654579058
2022-06-08 07:02:50.158254: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-08 07:02:50.209774: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2022-06-08 07:02:50.215191: I external/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2499990000 Hz
2022-06-08 07:02:52.476660: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /models/mydssm/1654579058
2022-06-08 07:02:52.490406: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 2341032 microseconds.
2022-06-08 07:02:52.492201: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /models/mydssm/1654579058/assets.extra/tf_serving_warmup_requests
2022-06-08 07:02:52.494217: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: mydssm version: 1654579058}
2022-06-08 07:02:52.500115: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
2022-06-08 07:02:52.500226: I tensorflow_serving/model_servers/server.cc:367] Profiler service is enabled
2022-06-08 07:02:52.502034: I tensorflow_serving/model_servers/server.cc:393] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2022-06-08 07:02:52.509072: I tensorflow_serving/model_servers/server.cc:414] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...

经ps查看,如下:

CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                                   NAMES
ba9da02bd451        tensorflow/serving   "/usr/bin/tf_serving…"   3 minutes ago       Up 3 minutes        0.0.0.0:8020->8020/tcp, 8500-8501/tcp   sleepy_bell
这是俩端口都能用???窝草,又忘了给容器取名字了,还是统一下吧,容器名字和模型名字统一

都叫狗剩(还是mydssm吧)

上述命令后面最好加个&,不然真是麻烦(如下,ctrl+C都不行),emmm,日志记在哪里?docker的

没有解决这个问题,直接用另一个shell页面kill命令杀掉了,

日志:能记录任意docker -run启动的日志,sleepy_bell为容器名字或id

docker logs -f sleepy_bell
docker logs --tail 200 sleepy_bell

但我这个tf-serving肯定QPS很多,积累一周磁盘都爆炸了,所有服务全都over了,这咋行?

emm我查了下需要设置个daemon.json文件,放在/etc/docker/下面,需要root权限哈。然后重启

由于服务暂时不由我部署(我写个干啥??为了测试我的结果与别人部署的结果是否一致啊,这个很重要啊,不然过程不对结果咋可能对),暂时略过。

日志的位置在哪?哪个文件夹?

/var/lib/docker文件夹下,这个文件夹的磁盘为与/models 所在磁盘一致,同样很小的空间,具体刚才的日志为,为啥时间与北京时间不同呢?

tail /var/lib/docker/containers/b498dc/b498dc-json.log
{"log":"2022-06-08 07:02:52.476660: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /models/mydssm/1654579058\r\n","stream":"stdout","time":"2022-06-08T07:02:52.480938824Z"}
{"log":"2022-06-08 07:02:52.490406: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 2341032 microseconds.\r\n","stream":"stdout","time":"2022-06-08T07:02:52.494551721Z"}
{"log":"2022-06-08 07:02:52.492201: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /models/mydssm/1654579058/assets.extra/tf_serving_warmup_requests\r\n","stream":"stdout","time":"2022-06-08T07:02:52.494573768Z"}
{"log":"2022-06-08 07:02:52.494217: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: mydssm version: 1654579058}\r\n","stream":"stdout","time":"2022-06-08T07:02:52.494578223Z"}
{"log":"2022-06-08 07:02:52.500115: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models\r\n","stream":"stdout","time":"2022-06-08T07:02:52.50185224Z"}
{"log":"2022-06-08 07:02:52.500226: I tensorflow_serving/model_servers/server.cc:367] Profiler service is enabled\r\n","stream":"stdout","time":"2022-06-08T07:02:52.501872649Z"}
{"log":"2022-06-08 07:02:52.502034: I tensorflow_serving/model_servers/server.cc:393] Running gRPC ModelServer at 0.0.0.0:8500 ...\r\n","stream":"stdout","time":"2022-06-08T07:02:52.50481335Z"}
{"log":"[warn] getaddrinfo: address family for nodename not supported\r\n","stream":"stdout","time":"2022-06-08T07:02:52.509194922Z"}
{"log":"2022-06-08 07:02:52.509072: I tensorflow_serving/model_servers/server.cc:414] Exporting HTTP/REST API at:localhost:8501 ...\r\n","stream":"stdout","time":"2022-06-08T07:02:52.509213227Z"}
{"log":"[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...\r\n","stream":"stdout","time":"2022-06-08T07:02:52.509217497Z"}

docker时间为UTC时间(可进入运行中的容器内查询date),而本地时间为CST时间(直接Linux下的date即可),相差八个小时。

3-验证接口,请求,返回结果。

docker run -p 8020:8020 --name mydssm --mount type=bind,source=。。。/ods_new/ckpt/log/export/final/,target=/models/mydssm -e MODEL_NAME=mydssm -t tensorflow/serving &

CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                                   NAMES
521352f60b14        tensorflow/serving   "/usr/bin/tf_serving…"   53 seconds ago      Up 52 seconds       0.0.0.0:8020->8020/tcp, 8500-8501/tcp   mydssm
已经运行了。下面整个requests吧,依旧先采用此文(本文不再贴命令了,参考链接吧)的方法,验证下传参。

如下serving_default

The given SavedModel SignatureDef contains the following input(s):inputs['ba_gender'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_5:0inputs['ba_num'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_4:0inputs['br_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_11:0inputs['cate_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_12:0inputs['cate2_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_13:0inputs['city_level'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_6:0inputs['user_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_1:0inputs['hbaby'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_2:0inputs['pre_month'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_8:0inputs['province_name'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_7:0inputs['spwku'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_10:0inputs['unit_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_14:0inputs['user_status'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_9:0inputs['user_gender'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_3:0
The given SavedModel SignatureDef contains the following output(s):outputs['item_emb'] tensor_info:dtype: DT_STRINGshape: (-1)name: ReduceJoin_1/ReduceJoin:0outputs['item_tower_feature'] tensor_info:dtype: DT_STRINGshape: (-1)name: ReduceJoin_3/ReduceJoin:0outputs['logits'] tensor_info:dtype: DT_FLOATshape: (-1)name: Reshape:0outputs['probs'] tensor_info:dtype: DT_FLOATshape: (-1)name: Sigmoid:0outputs['user_emb'] tensor_info:dtype: DT_STRINGshape: (-1)name: ReduceJoin/ReduceJoin:0outputs['user_tower_feature'] tensor_info:dtype: DT_STRINGshape: (-1)name: ReduceJoin_2/ReduceJoin:0
Method name is: tensorflow/serving/predict

一共是14个输入,鉴于我训练的时候编码了,那么输入的应该是编码的数据,

如下测试不可用,

saved_model_cli run --dir 。。/ods_new/ckpt/log/export/final/1654579058/ --tag_set serve --signature_def="serving_default" --input_examples='inputs=[{"ba_gender":[1.0],"ba_num":[1.0],"br_id":[2.0],"cate_id":[1.0],"cate2_id":[0.0],"city_level":[1.0],"user_id":[12.0],"hbaby":[1.0],"pre_month":[1.0],"province_name":[10.0],"swkpu":[123.0],"unit_id":[21.0],"user_status":[10.0],"user_gender":[1.0]}]'ValueError: "inputs" is not a valid input key. Please choose from 上面的keys

采用curl请求,同样不太行,考虑是端口防火墙问题。

*   Trying 0.0.0.0:8020...
* TCP_NODELAY set
* Connected to 0.0.0.0 (127.0.0.1) port 8020 (#0)
> POST /v1/models/mydssm:predict HTTP/1.1
> Host: 0.0.0.0:8020
> User-Agent: curl/7.29.0
> Accept: */*
> Content-type: application/json
> Content-Length: 301
>
* upload completely sent off: 301 out of 301 bytes
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

如下是通的

telnet 0.0.0.0 8020
Trying 0.0.0.0...
Connected to 0.0.0.0.
Escape character is '^]'.
Connection closed by foreign host.

试验了原来的例子后,不改8501(例子的端口不要改),然后docker 容器名字也没改,这时候请求竟然对了。如下curl

curl -d '{"instances":[{"ba_gender":["1"],"ba_num":["1"],"brand_id":["1"],"cate_id":["1"],"cate2_id":["1"],"city_level":["1"],"user_id":["12"],"hbaby":["10"],"pre_month":["11"],"province_name":["10"],"skwupu":["123"],"unit_id":["21"],"user_status":["10"],"user_gender":["1"]}]}' -X POST http://localhost:8501/v1/models/mydssm:predict  -H "Content-type: application/json" -v

结果如下:

*   Trying ::1:8501...
* TCP_NODELAY set
* Connected to localhost (::1) port 8501 (#0)
> POST /v1/models/mydssm:predict HTTP/1.1
> Host: localhost:8501
> User-Agent: curl/7.29.0
> Accept: */*
> Content-type: application/json
> Content-Length: 303
>
* upload completely sent off: 303 out of 303 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 08 Jun 2022 10:20:25 GMT
< Content-Length: 1998
<
{"predictions": [{"item_tower_feature": "-0.027797,0.000017,-0.006938,-0.001139,0.003079,-0.005398,-0.001739,-0.001127,0.013546,0.005068,0.002782,-0.003913,-0.000324,-0.009340,-0.001513,0.019721,-0.045470,0.063666,0.037524,0.041532,0.051212,-0.056672,-0.035556,-0.066957,0.001011,0.014817,-0.002076,0.002347,-0.008887,0.000608,0.018399,0.038382,-0.004276,0.025609,-0.009526,-0.016691,-0.003511,-0.011110,-0.002436,-0.015159,-0.001276,0.000005,0.000391,-0.002247,-0.000523,0.000644,-0.000368,-0.000851","user_emb": "0.089035,0.040245,0.040139,0.664103,0.005765,-0.028758,0.021623,-0.053681,-0.008481,0.188313,0.054642,0.044388,-0.319955,-0.058820,0.457832,-0.170870,0.013058,-0.279777,-0.058252,0.014134,-0.053209,-0.007721,0.032050,-0.061035,0.019025,0.019308,0.016642,0.000556,0.014037,0.049075,0.113682,-0.228581","probs": 0.00208181143,"user_tower_feature": "-0.006469,0.010059,-0.007064,-0.009822,-0.008493,0.001392,0.001749,-0.001267,-0.000328,0.000245,0.000578,-0.000082,-0.000146,-0.000003,-0.000091,-0.000034,0.001661,0.000566,0.000125,-0.001460,0.001285,-0.001913,0.001268,-0.000943,-0.009679,-0.087564,0.005677,0.015227,-0.107948,0.003131,0.006085,0.065827,-0.083296,-0.131117,0.052877,-0.043534,-0.153981,-0.009984,0.126459,0.020243,0.003186,-0.003460,-0.005059,0.006127,0.000208,-0.000853,-0.001991,-0.002454,-0.012654,0.017914,-0.004242,0.017789,-0.002636,-0.000141,-0.008043,0.001162,-0.002597,-0.004423,0.001763,0.002975,-0.004164,-0.004006,0.001054,-0.005090,0.000239,-0.000280,0.000001,-0.000008,0.000170,-0.000037,-0.000122,0.000067","logits": -6.17243767,"item_emb": "-0.177807,0.008181,0.009537,-0.447484,0.041922,-0.018426,0.028242,0.017518,-0.030073,-0.297743,-0.004319,0.005439,0.373108,0.006467,-0.377950,-0.216073,0.076012,0.355724,0.000885,0.056294,0.000588,-0.070175,-0.001788,0.013873,0.034125,0.027787,0.031646,-0.053749,0.040622,-0.150684,-0.202083,0.367106"}]
* Connection #0 to host localhost left intact

下面写个requests(python),不再用saved_model_cli这个尝试了。

>>> import requests
>>> url2="http://localhost:8501/v1/models/mydssm:predict"
>>> js_data={"instances":[。。。。]}
>>> requests.post(url2,json=js_data)
<Response [200]>
>>> res=requests.post(url2,json=js_data)
>>> res.json()
{'predictions': [{'probs': 0.00208181143, 'user_tower_feature': '。。', 'logits': -6.17243767, 'item_emb': '。。', 'item_tower_feature': '。。', 'user_emb': 。。'}]}

没有毛病,但一个信息是,Java调用pb模型不成功,(统一训练的tf及maven版本号也不行),,咋回事,且看下次分解。

愿我们终有重逢之时,

而你还记得我们曾经讨论的话题。

docker部署tf-serving相关推荐

  1. 用Docker部署TensorFlow Serving服务

    文章目录 1. 安装 Docker 2. 使用 Docker 部署 3. 请求服务 3.1 手写数字例子 3.2 猫狗分类例子 参考: https://tf.wiki/zh_hans/deployme ...

  2. 利用docker部署TF深度学习模型(附件文件较大,并无上传。部署参考步骤即可)

    一.介绍 docker: Docker 是一个开源的应用容器引擎,基于 Go 语言 并遵从 Apache2.0 协议开源. Docker 可以让开发者打包他们的应用以及依赖包到一个轻量级.可移植的容器 ...

  3. Win10 基于Docker使用tensorflow serving部署模型

    目录 安装Docker for Windows 安装 tensorflow-serving-api tensorflow serving on docker 测试tf server 方法3:grpc ...

  4. 使用docker部署vue_如何使用Vue,Docker和Azure进行持续交付

    使用docker部署vue A few weeks ago at ng-conf, I announced the launch of vscodecandothat.com - a project ...

  5. win10 docker部署gpu项目

    win10 docker部署gpu项目 nvidia-docker win10安装docker 制作镜像 ubuntu18.04部署docker gpu项目 安装docker 配置docker使用gp ...

  6. 示例详述Docker部署tensorflow-serving

    Docker简单入门 一.前言 工作中,有时需要线下验证训练好的模型,是否能在线上serving成功,所以需要利用docker来简单部署tensorflow-serving,然后线下进行验证模型能否成 ...

  7. Centos8安装英伟达显卡驱动并通过docker部署深度学习环境

    20201107 - 每个人的机器和实际需要的环境都不一样,本文只是提供了在自己实验室centos8上的部署过程,部署过程中,没有什么问题.请谨慎参考本篇文章,以免浪费宝贵时间. 0. 引言 之前的时 ...

  8. docker容器企业级实战——docker部署与操作实践

    docker介绍 什么是docker Docker使用Go语言开发,基于Linux内核的cgroup.namespace以及Unionfs等技术,对进程进行封装隔离,属于操作系统层面的虚拟化技术,由于 ...

  9. Docker 部署 SpringBoot 项目整合 Redis 镜像做访问计数Demo

    Docker 部署SpringBoot项目整合 Redis 镜像做访问计数Demo 最终效果如下 大概就几个步骤 1.安装 Docker CE 2.运行 Redis 镜像 3.Java 环境准备 4. ...

最新文章

  1. 【Android Gradle 插件】Module 目录下 build.gradle 配置文件 ( plugins 闭包代码块中引入插件 | PluginAware#apply 方法引入插件 )
  2. Excel 2016中的新增函数之CONCAT
  3. 第三次学JAVA再学不好就吃翔(part92)--Map集合的遍历
  4. delphi webbrowser 经常使用的演示样本
  5. 修复Ubuntu系列pip
  6. python 获取唯一值_从Python列表中获取唯一值
  7. idea编译器没有tomcat的选项解决方案
  8. H.264熵编码分析
  9. 云盘上传一直显示服务器出错_百度云盘一直服务器忙 百度网盘出现服务器错误...
  10. 有些路,只能一个人走。
  11. 离婚时,住房公积金分割吗?
  12. iOS自动自动隐藏软键盘
  13. 阿里云创世纪之盘古传奇
  14. 常见DB2锁等待解决流程
  15. UI层自动化测试介绍
  16. ETCgame移动端上线,预测游戏世界杯预测触手可及
  17. 用水流的概念来玩硬件(一)----阻抗匹配
  18. 张俊林:万字长文讲述由ChatGPT反思大语言模型的技术精要
  19. 【产品经理】产品体验报告准备
  20. 敏捷 scrum_重新想象您的Scrum以增强敏捷性

热门文章

  1. 此网站无法提供安全连接(客户端和服务器不支持一般 SSL 协议版本或加密套件。)
  2. 3D游戏建模学多久能工作?
  3. Mac之UltraEdit
  4. 鲁哇客:超低功耗边缘计算人脸检测方案对比调研
  5. java 拼写检查器,怎样写一个拼写检查器(java版)
  6. 设计原则与思想:面向对象11讲
  7. 最新版SpringCloud(H版alibaba)
  8. java else的用法_Java else 关键字(keyword)
  9. Handler原理剖析,看这篇就够了
  10. pppoe工作原理详解