步骤

  • 1. 为什么要自己编译tensorflow?
  • 2. 编译环境
    • 2.1 安装所需软件
  • 3. 编译步骤
    • 3.1 安装python包
    • 3.2 克隆代码
    • 3.3 修改代码
    • 3.4 配置编译选项
    • 3.5 编译代码
    • 3.6 打包安装包
    • 3.7 安装编译出来的安装包
    • 3.8 运行测试
  • 4. 总结
    • 4.1 参考

1. 为什么要自己编译tensorflow?

tensorflow官方不再提供macos的GPU安装包,因为Nvidia也不再提供macos下的显卡驱动了。而官方的CPU安装包也没有针对AVX2, FMA等指令集优化,跑模型会出现:

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

据说启用AVX2, FMA等指令集可以在CNN模型上提速约40%。

查看CPU支持的指令集的命令为:

sysctl -a | grep "machdep.cpu.*features:"

通常编译tensorflow有以下2点原因:

  1. 安装了黑苹果并插了显卡或MAC插了外置显卡,希望利用上GPU。
  2. CPU环境下希望利用AVX2, FMA等进行加速。

2. 编译环境

笔者使用的是一台PC,CPU为Intel i5,GPU为GTX 1050Ti,系统安装了黑苹果。由于Nvidia的驱动最高只能支持到OSX 10.13,所以系统只能安装High Sierra

注意:

要驱动GPU只能安装High Sierra;

只编译优化CPU包可以安装新版本的macos

2.1 安装所需软件

  1. 显卡驱动和CUDA 10.1
    这里提供了mac下安装Nvidia驱动的快捷方式:https://github.com/Benjamin-Dobell/nvidia-update
    只需要执行:
bash <(curl -s https://raw.githubusercontent.com/Benjamin-Dobell/nvidia-update/master/nvidia-update.sh)

安装完确保系统信息显示正确。

到nvidia官网下载CUDA 10.1的osx安装包并安装。安装完System Preferences下会多出一项“CUDA”:

到nvidia官网下载cudnn 7.6包,并解压复制文件到CUDA安装目录。

  1. Xcode 10.1
    从apple developer官网或搜索百度云下载安装。

  2. python 3
    下载并安装:https://www.python.org/ftp/python/3.7.9/python-3.7.9-macosx10.9.pkg

  3. bazel 3.7.2
    这是编译tensorflow 2.4要求的最低版本。
    下载可执行文件:https://github.com/bazelbuild/bazel/releases/download/3.7.2/bazel-3.7.2-darwin-x86_64
    然后链接一下到/usr/local/bin,测试一下能否输出版本号:

chmod +x ~/Downloads/bazel-3.7.2-darwin-x86_64
ln -s ~/Downloads/bazel-3.7.2-darwin-x86_64 /usr/local/bin/bazel
bazel --version

3. 编译步骤

3.1 安装python包

pip3 install -U pip numpy wheel
pip3 install -U keras_preprocessing --no-deps

3.2 克隆代码

git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v2.4.1

3.3 修改代码

有几个文件的代码需要修改一下才能在macos下编译通过。请到github项目下载patch文件,然后执行:

git am 2.4.1.patch

3.4 配置编译选项

如果编译GPU包,注意在CUDA Support选项输入y,否则选N。

./configure
You have bazel 3.7.2 installed.
Please specify the location of python. [Default is /usr/local/bin/python3]: Found possible Python library paths:/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
Please input the desired Python library path to use.  Default is [/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages]Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.Found CUDA 10.1 in:/usr/local/cuda/lib/Developer/NVIDIA/CUDA-10.1/include
Found cuDNN 7 in:/usr/local/cuda/lib/Developer/NVIDIA/CUDA-10.1/includePlease specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.0,3.5,5.0,6.1,7.0WARNING: XLA does not support CUDA compute capabilities lower than 3.5. Disable XLA when running on older GPUs.
Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.Do you wish to build TensorFlow with iOS support? [y/N]: N
No iOS support will be enabled for TensorFlow.Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.--config=mkl             # Build with MKL support.--config=mkl_aarch64     # Build with oneDNN support for Aarch64.--config=monolithic      # Config for mostly static monolithic build.--config=ngraph          # Build with Intel nGraph support.--config=numa            # Build with NUMA support.--config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.--config=v2              # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:--config=noaws           # Disable AWS S3 filesystem support.--config=nogcp           # Disable GCP support.--config=nohdfs          # Disable HDFS support.--config=nonccl          # Disable NVIDIA NCCL support.
Configuration finished

3.5 编译代码

使用以下命令来编译:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2 //tensorflow/tools/pip_package:build_pip_package

如果遇到下面的错误,则需要把bazel-tensorflow/external/com_google_absl/absl/container/internal/compressed_tuple.h中的那2个有问题的函数注释掉:

external/com_google_absl/absl/container/internal/compressed_tuple.h:171:53: error: use 'template' keyword to treat 'Storage' as a dependent template name
return (std::move(*this).internal_compressed_tuple::Storage< CompressedTuple, I> ::get()); ^template
external/com_google_absl/absl/container/internal/compressed_tuple.h:177:54: error: use 'template' keyword to treat 'Storage' as a dependent template name
return (absl::move(*this).internal_compressed_tuple::Storage< CompressedTuple, I> ::get()); ^template
2 errors generated.

编译过程很漫长,可能需要8个小时,如果没有错误,结束时会输出:

...
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 17902.809s, Critical Path: 684.61s
INFO: 7578 processes: 41 internal, 7537 local.
INFO: Build completed successfully, 7578 total actions
INFO: Build completed successfully, 7578 total actions

3.6 打包安装包

./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

会在/tmp/tensorflow_pkg下生成wheel安装包,如:tensorflow-2.4.1-cp37-cp37m-macosx_10_13_x86_64.whl

3.7 安装编译出来的安装包

pip3 install /tmp/tensorflow_pkg/tensorflow-2.4.1-cp37-cp37m-macosx_10_13_x86_64.whl

3.8 运行测试

随便运行一个模型可以看到类似下面的输出,则说明运行正常:

2021-02-19 12:27:55.699299: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.10.1.dylib
2021-02-19 12:27:57.935779: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-19 12:27:57.959578: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.dylib
2021-02-19 12:27:57.984405: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:902] OS X does not support NUMA - returning NUMA node zero
2021-02-19 12:27:57.984958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1
coreClock: 1.392GHz coreCount: 6 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2021-02-19 12:27:57.985295: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.10.1.dylib
2021-02-19 12:27:58.067490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.10.dylib
2021-02-19 12:27:58.067768: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.10.dylib
2021-02-19 12:27:58.121329: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.10.dylib
2021-02-19 12:27:58.143269: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.10.dylib
2021-02-19 12:27:58.234257: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.10.dylib
2021-02-19 12:27:58.288708: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.10.dylib
2021-02-19 12:27:58.372196: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.7.dylib
2021-02-19 12:27:58.372467: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:902] OS X does not support NUMA - returning NUMA node zero
2021-02-19 12:27:58.372987: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:902] OS X does not support NUMA - returning NUMA node zero
2021-02-19 12:27:58.373252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
...

如果出现下面的错误,则需要升级一下numpy的版本:

F tensorflow/python/lib/core/bfloat16.cc:714] Check failed: PyBfloat16_Type.tp_base != nullptr
Abort trap: 6
pip3 install -U numpy

4. 总结

祝编译顺利!如果遇到什么问题,欢迎与我联系!项目的github地址为:https://github.com/evan-wu/tensorflow-macosx-build。欢迎关注、点赞、加

MAC OS下编译tensorflow 2.4.1 - 支持GPU CUDA 10.1和AVX2 FMA相关推荐

  1. mac os下valgrind的安装

    valgrind是一款性能分析工具,功能强大. 在mac os下的安装略有不同,特写此文以记之. 现在最新的版本是3.8.1 tar jxvf valgrind-3.8.1.tar.bz2 cd va ...

  2. Mac OS下Tomcat native-APR的安装

    Mac OS下Tomcat native-APR的安装 1. 下载并编译APR 下载APR 在终端编译APR cd /<your_apr_dir>/aprCFLAGS='-arch x86 ...

  3. 超详细的mac环境下编译魔改frida-server

    文章目录 前言 一.编译最新版本 1.搭建环境 2.代码下载 3.编译 二.魔改再编译 1.切换到指定tag:15.1.12 2.打hluda patch 3.编译 前言 对于frida的编译和魔改想 ...

  4. mac os下编写对拍程序

    转载自Hist!     http://hist.cnblogs.com/ 介绍 对拍是信息学竞赛中重要的技巧,它通过一个效率低下但正确率可以保证的程序,利用庞大的随机生成数据来验证我们的高级算法程序 ...

  5. cp linux 显示进度条_Unix/Linux/Mac os下 文件互传

    Unix/Linux/Mac os下 文件互传 说起文件互传,就不得不提命令scp. 他是Secure copy的缩写,使用ssh连接和加密方式, 如果两台机器之间配置了ssh免密登录, 那在使用sc ...

  6. 在MAC OS 下配置python + Flask ,并支持pyCharm编辑器

    https://www.cnblogs.com/lgphp/p/3841098.html 在MAC OS 下配置python + Flask ,并支持pyCharm编辑器 flask是一个micro ...

  7. Mac os 下的文件权限管理

    Mac os 下的文件权限管理 命令 ls -l -A 结果 -rw-r--r-- 1 user admin 2326156 4 12 15:24 adb 横线代表空许可.r代表只读,w代表写,x代表 ...

  8. Mac OS 下的Vim使用系统剪切板

    [size=12]这里介绍的是Mac OS下终端Vim,Mac Vim并没有测试. [color=red] 下面的解决方法太麻烦了,直接装一个新的Vim直接搞定. macos的预装的终端Vim有个很麻 ...

  9. MAC OS 下QQ音乐下载存放的位置

    MAC OS 下QQ音乐下载存放的位置   /Users/用户/Library/Application Support/

最新文章

  1. Java基础语法运算和控制符
  2. shell xargs的用法
  3. 域客户端的计算机更名
  4. c#操作Xml(八)
  5. 计算机操作系统原理教程与实训(目录)
  6. java swarm集群_52个Java程序员不可或缺的 Docker 工具
  7. Windows下git安装及使用技巧
  8. kabina 使用说明_Kibana安装及使用说明
  9. scrapy middlewares.py
  10. Git Tag及使用
  11. 微服务2.0时代,论其痛点与触点
  12. IDEA里面添加lombok插件,编写简略风格Java代码(转)
  13. 2017年经典hadoop体系课程-徐培成-专题视频课程
  14. 超声波传感器测距原理
  15. 怎样建网站?(超详细)
  16. 信任与背叛的折磨--电影
  17. Linux内核性能测试工具全景图
  18. 首款国风链游打造视觉“饕餮盛宴” 见证元宇宙数字资产变革新峰
  19. 或许,这是最好的一款微信公众号编辑器!
  20. 【微信开发者工具】at-rule or selector expectedcss(css-ruleorselectorexpected)

热门文章

  1. linux查看群组所属用户,linux 列出用户所属的所有群组的5种方法
  2. java写花束_Java作业 题目:16版.情人节送玫瑰花
  3. 煮咖啡的动态页面演示动画
  4. Android 6.0 JNI原理分析 和 Linux系统调用(syscall)原理
  5. 【智能开发】血压计方案设计与硬件开发
  6. java内存(java内存溢出的几种原因和解决办法)
  7. Dev 与 Ops 互怼 | 科普一下 DevOps
  8. 我的物联网项目(二十九) 线上前期运营
  9. c语言九三字符串的字母个数,C语言判断闰年和平年
  10. Web-big、html、css、JavaScript、vue、webpack、git、微信小程序、uni-app、性能优化、兼容性、网络请求、web安全、其他