虽然plink2.0已经存在好久了,但是一直用的都是plink1.9,因为语法熟悉。更主要是plink2.0语法变动太大,害怕步子迈得太大了……

今天看一下plink2.0的读入和输出数据常用参数,

plink2.0用是不会用的,2022年都不会用!!!但是碰到bgen,pgen数据进行转化为bed,bim,fam文件,然后用plink1.9使用的想法还是有的,而且很大!!!

本篇目的:使用plink2.0软件将下面格式随便输入、输出

  • plink1.9的ped和map数据,不如:a.ped, a.map
  • plink1.9的bed和bim和fam数据,比如:a.bim, a.bed, a.fam
  • plink2.0的bgen和sample数据,比如:a.bgen, a.sample
  • plink2.0的pgen和bim和fam数据,比如a.pgen,a.fam,a.bim
    • vcf数据,比如a.vcf

1,plink2.0的提升

plink2.0主要是从以下几个方面,相对于plink1.9有较大的提升:

  • 1,保留参考等位基因的信息,比如vcf格式的数据,不要添加参数 --keep-allele-order。这样vcf变为plink,plink变为vcf就可以不用指定ref和alt了,切换无障碍!

  • 2,新的.pgen文件,结合SNPack-style的压缩,可以节约80%的文件大小。比如1000个Genomes,比压缩的gzip文件小70%,且不丢失任何信息。压缩文件空间更小,速度更快。

  • 3,旧版的二进制文件(bed,bim和fam)文件,plink2.0依旧支持,输出文件包括两种:–make-bpgen 和 --make-bpfile文件。可以支持plink1.9的文件格式,无论是map和ped数据,还是bed,bim和fam格式。

  • 4,分析模块,进行了优化。标准的logistic回归分析失败产生NA或者无意义的结果,–glm比plink1.9的–linear速度提升1000倍。尤其是填充的剂量效应的基因型值(比如0.2,1.8这样的非整数型数据)。PCA分析汇总,增加了参数PCA approx,当样本超过1万,这个参数可以不影响精度(影响不到1%),大大提升计算效率。样本量支持得更多,处理速度更快!

2,plink2.0 安装

plink2.0 网站:https://www.cog-genomics.org/plink/2.0/

二进制文件,直接执行,支持:

  • Intel
  • AMD
  • 苹果M1

建议:plink1.9简写为plink,plink2.0 简写为plink2

3,plink帮助文档

可以通过官网查询具体参数:https://www.cog-genomics.org/plink/2.0/

也可以在命令行中调出帮助文档:

比如直接键入plink2,出现基础参数:

$ plink2PLINK v2.00a3.7LM AVX2 Intel (24 Oct 2022)     www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3plink2 <input flag(s)...> [command flag(s)...] [other flag(s)...]plink2 --help [flag name(s)...]Commands include --rm-dup list, --make-bpgen, --export, --freq, --geno-counts,
--sample-counts, --missing, --hardy, --het, --fst, --indep-pairwise, --ld,
--sample-diff, --make-king, --king-cutoff, --pmerge, --pgen-diff,
--write-samples, --write-snplist, --make-grm-list, --pca, --glm, --adjust-file,
--score, --variant-score, --genotyping-rate, --pgen-info, --validate, and
--zst-decompress."plink2 --help | more" describes all functions.

想查看一下–export的用法,可以看到主要功能:

  • A,是0-1-2编码
  • ped,是map和ped格式
  • vcf,是vcf格式
  • bgen-1.x,包括1.1, 1.2, 1.3,都是bgen格式
$ plink2 --export --help
PLINK v2.00a3.7LM AVX2 Intel (24 Oct 2022)     www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
--help present, ignoring other flags.--export <output format(s)...> [{01 | 12}] ['bgz'] ['id-delim='<char>]['id-paste='<column set descriptor>] ['include-alt']['omit-nonmale-y'] ['spaces'] ['vcf-dosage='<field>] ['ref-first']['bits='<#>] ['sample-v2']Create a new fileset with all filters applied.  The following outputformats are supported:(actually, only A, AD, Av, bcf, bgen-1.x, haps, hapslegend, ind-major-bed,oxford, ped, tped, and vcf are implemented for now)* '23': 23andMe 4-column format.  This can only be used on a singlesample's data (--keep may be handy), and does not supportmulticharacter allele codes.* 'A': Sample-major additive (0/1/2) coding, suitable for loading from R.If you need uncounted alleles to be named in the header line, addthe 'include-alt' modifier.* 'AD': Sample-major additive (0/1/2) + dominant (het=1/hom=0) coding.Also supports 'include-alt'.* 'Av': Variant-major 0/1/2.* 'beagle': Unphased per-autosome .dat and .map files, readable by earlyBEAGLE versions.* 'beagle-nomap': Single .beagle.dat file.* 'bgen-1.x': Oxford-format .bgen + .sample.  For v1.2/v1.3, sampleidentifiers are stored in the .bgen (with id-delim andid-paste settings applied), and default precision is 16-bit(use the 'bits' modifier to reduce this).* 'bimbam': Regular BIMBAM format.* 'bimbam-1chr': BIMBAM format, with a two-column .pos.txt file.  Does notsupport multiple chromosomes.* 'fastphase': Per-chromosome fastPHASE files, with.chr-<chr #>.phase.inp filename extensions.* 'fastphase-1chr': Single .phase.inp file.  Does not supportmultiple chromosomes.* 'haps', 'hapslegend': Oxford-format .haps + .sample[ + .legend].  Alldata must be biallelic and phased.  When the 'bgz'modifier is present, the .haps file isblock-gzipped.* 'HV': Per-chromosome Haploview files, with .chr-<chr #>{.ped,.info}filename extensions.* 'HV-1chr': Single Haploview .ped + .info file pair.  Does not supportmultiple chromosomes.* 'ind-major-bed': PLINK 1 sample-major .bed (+ .bim + .fam).* 'lgen': PLINK 1 long-format (.lgen + .fam + .map), loadable with --lfile.* 'lgen-ref': .lgen + .fam + .map + .ref, loadable with --lfile +--reference.* 'list': Single genotype-based list, up to 4 lines per variant.  To omitnonmale genotypes on the Y chromosome, add the 'omit-nonmale-y'modifier.* 'rlist': .rlist + .fam + .map fileset, where the .rlist file is agenotype-based list which omits the most common genotype foreach variant.  Also supports 'omit-nonmale-y'.* 'oxford', 'oxford-v2': Oxford-format .gen + .sample.  When the 'bgz'modifier is present, the .gen file isblock-gzipped.  'oxford' requests the original.gen file format with 5 leading columns(understood by older PLINK builds); 'oxford-v2'requests the current 6-leading-column flavor.* 'ped', 'compound-genotypes': PLINK 1 sample-major (.ped + .map),loadable with --pedmap.* 'structure': Structure-format.* 'tped': PLINK 1 variant-major (.tped + .tfam), loadable with --tfile.* 'vcf',     : VCF (default version 4.3).  If PAR1 and PAR2 are present,'vcf-4.2',   they are automatically merged with chrX, with proper'bcf',       handling of chromosome codes and male ploidy.'bcf-4.2'    When the 'bgz' modifier is present, the VCF file isblock-gzipped.  (This always happens with BCF output.)The 'id-paste' modifier controls which .psam columns areused to construct sample IDs (choices are maybefid, fid,iid, maybesid, and sid; default is maybefid,iid,maybesid),while the 'id-delim' modifier sets the character between theID pieces (default '_').Genotypes are always exported.  If you want to export asites-only VCF instead, see --make-pgen/--make-just-pvar's'vcfheader' column set.Dosages are not exported unless the 'vcf-dosage=' modifieris present.  The following six dosage export modes aresupported:'GP': genotype posterior probabilities (v4.3 only).'DS': Minimac3-style dosages, omitted for hardcalls.'DS-force': Minimac3-style dosages, never omit.'DS-only': Same as DS-force, except GT field is omitted.'HDS': Minimac3-style phased dosages, omitted for hardcallsand unphased calls.  Also includes 'DS' output.'HDS-force': Always report DS and HDS.In addition,* The '12' modifier causes alt1 alleles to be coded as '1' and ref allelesto be coded as '2', while '01' maps alt1 -> 0 and ref -> 1.* The 'spaces' modifier makes the output space-delimited instead oftab-delimited, whenever both are permitted.* For biallelic formats where it's unspecified whether the reference/majorallele should appear first or second, --export defaults to second forcompatibility with PLINK 1.9.  Use 'ref-first' to change this.(Note that this doesn't apply to the 'A', 'AD', and 'Av' formats; use--export-allele to control which alleles are counted there.)* 'sample-v2' exports .sample files according to the QCTOOLv2 rather thanthe original specification.  Only one ID column is exported ('id-paste'and 'id-delim' settings apply), parental IDs are exported if present, andcategory names are preserved rather than converted to positive integers.--export-allele <file> : With --export A/AD/Av, count alleles named in thefile, instead of REF alleles.

4. plink2.0 笔记

4.1 读取plink的ped和map数据

plink2.0,没有–file这个参数了,变为了:--pedmap,也可以分开写,比如 --ped --map分别接ped和map数据。

plink2 --ped yuanshi.ped --map yuanshi.map

或者写为:

plink2 --pedmap yuanshi

默认输出文件:

plink2.log  plink2.pgen  plink2.psam  plink2.pvar
  • plink2.log,log日志,不用理会
  • plink2.pgen,二进制文件,类似plink1.9的bim文件
  • plink2.psam,个体和性别信息
  • plink2.pvar,snp的信息,包括染色体、物理位置、名称、ref和alt

4.2 读取plink的bed,bim和fam数据

plink1.9的二进制数据是bed,bim和fam数据,plink2.0通过–bfile指定。

比如读取plink1.9的二进制文件,输出bgen格式:

plink2 --bfile a1 --export bgen-1.1 --out t1

日志:

$ plink2 --bfile a1 --export bgen-1.1 --out t1
PLINK v2.00a3.7LM AVX2 Intel (24 Oct 2022)     www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to t1.log.
Options in effect:--bfile a1--export bgen-1.1--out t1Start time: Thu Dec  1 18:59:10 2022
1031523 MiB RAM detected; reserving 515761 MiB for main workspace.
Using up to 96 threads (change this with --threads).
86 samples (0 females, 0 males, 86 ambiguous; 86 founders) loaded from a1.fam.
50904 variants loaded from a1.bim.
Note: No phenotype data present.
Writing t1.bgen ... done.
Writing t1.sample ... done.

4.3 读取bgen和sample格式

导出ped和map的格式:

plink2 --bgen t1.bgen 'ref-last' --sample t1.sample --export ped --out x1

注意:bgen和sample是一对数据,读取时,要分开指定,–bgen, --sample,同时要指定‘ref-last’,即后面的是ref,前面的是alt,也可以修改。

4.4 读取vcf数据

导出ped和map数据

读取vcf,导出ped和map数据:需要定义--export ped

plink2 --vcf x2.vcf --export ped --out y1

导出bed和bim和fam数据

读取vcf数据,导出bim,bed和fam数据:需要定义--make-bed

plink2 --vcf x2.vcf --make-bed --out y2

导出bgen数据

读取vcf数据,导出bgen和sample数据,需要定义--export bgen-1.1

plink2 --vcf x2.vcf --export bgen-1.1 --out y3

上面就是我的总结,更多内容,欢迎关注我的公众号:育种数据分析之放飞自我

plink2.0和plink1.9的忧伤笔记相关推荐

  1. 关于KINECT V2.0 C++ SDK 基础教程的笔记 EP2

    最近忙着搞老师的任务,没来得及更新点云系列. 目前在做Kinect,在这里接着做个笔记. 原文地址: Kinect Tutorials 这仅仅是做一个笔记以及自己的实际操作记录 关于KINECT V2 ...

  2. 阿里云天池 Python训练营Task4: Python数据分析:从0完成一个数据分析实战 学习笔记

    本学习笔记为阿里云天池龙珠计划Python训练营的学习内容,学习链接为:https://tianchi.aliyun.com/specials/promotion/aicamppython?spm=5 ...

  3. 关于K8s中Ansible AWX(awx-operator 0.30.0)平台Helm部署的一些笔记

    写在前面 整理一些K8s中通过Helm的方式部署AWX的笔记分享给小伙伴 博文内容为部署过程和遇到问题的解决过程 食用方式: 需要了解K8s 需要预置的K8s+Helm环境 需要科学上网 理解不足小伙 ...

  4. Ubuntu 20.0.4 linux生信服务器笔记

    系统硬盘挂载情况 $ sudo root # df -h查看硬件raid信息 # lspci |grep -i raid 17:00.0 RAID bus controller: Broadcom / ...

  5. python中123+5.0的执行结果_python实战笔记(一)

    [Python注释] [Python变量] [Python运算符] [Python输入输出] *   [输入函数] *   [输出函数(3.x)] *   [格式化输出] [分支] [循环] ### ...

  6. 【Elasticsearch】搜索引擎从0到1 有赞 视频笔记

    1.概述 转载:https://mp.weixin.qq.com/s?__biz=MzU1NTMyOTI4Mw==&mid=2247486562&idx=1&sn=2c895d ...

  7. NVIDIA VIDEO ENCODER(NVENC)7.0.1 SDK 编码流程 学习笔记

    Video_Codec_SDK_7.0.1 1.使用C:\Windows\SysWOW64\nvEncodeAPI.dll   2.nvStatus = m_pNvHWEncoder->Pars ...

  8. android6.0 Bluetooth蓝牙源码流程笔记

    注:基于mtk平台的android6.0,由于我个人水平有限,代码细节不能详细说明,抱歉 参考文章: http://blog.csdn.net/shichaog/article/details/527 ...

  9. Echarts3.0入门基础与实战(学习笔记)

    1.浏览器画图原理简介 2.Echarts库简介 3.Echarts简单使用 4.Echarts组件使用 5.Echarts高级图例介绍 6.扩展知识 1.浏览器画图原理简介 canvas 基于像素, ...

最新文章

  1. R语言ggplot2可视化配置图例(legend)标签色彩的升序或者反序(reverse)实战:ggplot2可视化默认图例标签色彩(升序,颜色越来越深)、可视化配置图例标签颜色反序(颜色越来越浅)
  2. poj 2109 Power of Cryptography
  3. CQOI2019(十二省联考)游记
  4. Spring 框架中的单例Beans 是线程安全的么?
  5. linux组类型,LINUX用户以及用户组
  6. 卷积神经网络(CNN)及其实践
  7. (王道408考研操作系统)第五章输入/输出(I/O)管理-第一节6:设备的分配和回收
  8. msdn服务器系统,操作系统
  9. java关于替换文本输出的讲解_java替换文件中某一行文本的内容
  10. java建站系统开发教程系列之设计表结构
  11. 一个高难度的 Java 3D 智力游戏,立方四子棋
  12. 有没有可以测试充电宝电流电压的软件,USB测试仪 移动电源电流电压容量检测 充电宝充电器测试老化工具...
  13. Android浮窗实现(WindowManager)
  14. Boost:shared_memory_object --- 共享内存
  15. c# 编写水准测量平差程序
  16. 动态规划问题——当一脸懵逼后的心路历程
  17. 让数码管比段生成器去见鬼吧
  18. 卷组删除pv_如何安全的删除Linux LVM中的PV物理卷
  19. TCP协议的相关特性
  20. 【经验帖】20考研深大电通上岸师兄倾情奉献

热门文章

  1. android 源码名称及路径
  2. 关于什么是电路模型的原理
  3. 【优秀的iPhone/iPad数据恢复工具】Omni Recover for Mac 2.5
  4. 辗转腾挪,立地成佛(单纯的变得佛系)
  5. PHP开发者必备的50个库/框架【2019】
  6. 黑盒测试技术和测试用例的设计方法
  7. [ICCV2019]DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction
  8. 吸尘器哪个牌子好?权威发布的十大品牌榜必须要了解
  9. 百度同步盘+系统备份实现服务器文件在线自动备份
  10. 如何合成动态海报?手把手教你一键在线合成gif海报