实验记录 | 6/3 修改somatic.pl中的文件路径

上一篇文章，我们在服务器中安装完成了所有的必要的“依赖文件"。由于我不是root用户，无法将这些指令存放在path环境变量中，所以我们只能将这些程序文件的绝对路径找出来运行。

已经配置完成的：
Rscript,
annovar (>=2019Oct24,refGene,ljb26_all,cosmic70,esp6500siv2_all,exac03,1000g2015aug downloaded in humandb and refGene downloaded in mousedb)
perl (need Parallel::ForkManager)
python (2.7)
java (1.8)

需要说明路径的：
bwa (>=0.7.15)：/home/xxzhang/workplace/software/bwa/bwa
STAR (>=2.7.2)：/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR
sambamba：/home/xxzhang/workplace/software/sambamba/sambamba
speedseq：/home/xxzhang/workplace/software/speedseq/bin/speedseq
varscan： java -jar /home/xxzhang/workplace/software/VarScan.jar
samtools (>=1.6)：/home/xxzhang/miniconda3/bin/samtools
shimmer：/home/xxzhang/workplace/software/Shimmer/shimmer.pl
strelka (>=2.8.3, note: strelka is tuned to run exome-seq or RNA-seq)：/home/xxzhang/workplace/software/strelka/bin/
manta(>=1.4.0)：/home/xxzhang/workplace/software/manta/bin
lofreq_star (>=2.1.3)：/home/xxzhang/miniconda3/bin/lofreq
bowtie2 (>=2.3.4.3, for PDX mode)：/home/xxzhang/workplace/software/bowtie2/bowtie2

到这里基本完成。

然后，我们切换到window界面下，更新somatic.pl文档。
更新完成。

重新运行。
值得欣慰的是，终于往前推进一步了，比对生成了alignment.bam的文件。此处撒花，我觉得经过前面的训练与磨砺，让我更加有能力解决接下来的问题了。
但是，再执行下一步的时候（使用Picard，进行注释），又出现了新的错误。

java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/picard.jar AddOrReplaceReadGroups INPUT=./output/tumor/alignment.sam OUTPUT=./output/tumor/rgAdded.bam SORT_ORDER=coordinate RGID=tumor RGLB=tumor RGPL=illumina RGPU=tumor RGSM=tumor CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT

这里我观察到一个奇怪的路径：/home/xxzhang/workplace/QBRC//somatic_script/picard.jar
为什么QBRC这里有两个斜杠呢？这样，会影响java找到我们的指令文件吗？
不影响。

java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T `RealignerTargetCreator` -R ./geneome/hg19/hg19.fa --num_threads 32 -known ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf -known ./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf  -o ./output/tumor/tumor_intervals.list -I ./output/tumor/dupmark.bam > ./output/tumor/index.out

我觉得主要的问题，出在这一步。因为它的结果文件，显示为空。为什么？
从这里开始出错：

ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: RealignerTargetCreator

java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T `IndelRealigner`  --filter_bases_not_stored --disable_auto_index_creation_and_locking_when_reading_rods -R ./geneome/hg19/hg19.fa -known ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf -known ./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf  -targetIntervals ./output/tumor/tumor_intervals.list -I ./output/tumor/dupmark.bam -o ./output/tumor/realigned.bam >./output/tumor/tumor_realign.out

ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: IndelRealigner

java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T `BaseRecalibrator` -R ./geneome/hg19/hg19.fa  -knownSites ./geneome/hg19/hg19.fa_resource/dbsnp.hg19.vcf -knownSites ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf  -I ./output/tumor/realigned.bam -o ./output/tumor/tumor_bqsr > ./output/tumor/table.out

ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: BaseRecalibrator

java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T `PrintReads` -rf NotPrimaryAlignment -R ./geneome/hg19/hg19.fa -I ./output/tumor/realigned.bam -BQSR ./output/tumor/tumor_bqsr -o ./output/tumor/tumor.bam > ./output/tumor/tumor_recal.out

ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: PrintReads

总的来说，就是指令中的一些内容，系统无法识别。我当初分析这块代码的过程也是一带而过的。没有细细的去探究。
这四行代码总体上都涉及GenomeAnalysisTK.jar这个命令文件的使用。

我们现在有两个解决的思路：
（1）回到原始文件中，去查看作者对于这部分的内容是如何注释的？这部分代码的功能是什么？
（2）到浏览器上检索GenomeAnalysisTK.jar这个文件的使用方法？

我们一点一点的来看。
（1）源文件中，这部分代码的功能是什么？
通过检索源文件，发现这部分代码的功能是：

indel realignment 插入/缺失的重新比对
base recalibration 碱基的重新校准

（这一步其实是为了让我们的mutation calling更加的精准，之前有看过一个流程有提到过。）

（2）回到GenomeAnalysisTK.jar这个插件的使用上，这个插件的作用是什么？为什么识别不了呢？而且这个插件就是作者源代码中的？（应该不会是我自己添加进去的，会不太匹配。）
java -jar GenomeAnalysisTK.jar --help
找到了help文档，或许可以给我提供一些线索。

The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
Copyright © 2010 The Broad Institute
For support and documentation go to http://www.broadinstitute.org/gatk
usage: java -jar GenomeAnalysisTK.jar -T <analysis_type> [-I <input_file>] [–showFullBamList] [-rbs <read_buffer_size>] [-et
<phone_home>] [-K <gatk_key>] [-tag ] [-rf <read_filter>] [-drf <disable_read_filter>] [-L ] [-XL
] [-isr <interval_set_rule>] [-im <interval_merging>] [-ip <interval_padding>] [-R
<reference_sequence>] [-ndrs] [-maxRuntime ] [-maxRuntimeUnits ] [-dt <downsampling_type>]
[-dfrac <downsample_to_fraction>] [-dcov <downsample_to_coverage>] [-baq ] [-baqGOP ] [-fixNDN]
[-fixMisencodedQuals] [-allowPotentiallyMisencodedQuals] [-OQ] [-DBQ ] [-PF ]
[-BQSR ] [-qq <quantize_quals>] [-SQQ <static_quantized_quals>] [-DIQ] [-EOQ] [-preserveQ
<preserve_qscores_less_than>] [-globalQScorePrior ] [-S <validation_strictness>] [-rpr] [-kpr]
[-sample_rename_mapping_file <sample_rename_mapping_file>] [-U ]
[-disable_auto_index_creation_and_locking_when_reading_rods] [-sites_only] [-writeFullFormat] [-compress
<bam_compression>] [-simplifyBAM] [–disable_bam_indexing] [–generate_md5] [-nt <num_threads>] [-nct
<num_cpu_threads_per_data_thread>] [-mte] [-rgbl <read_group_black_list>] [-ped ] [-pedString
] [-pedValidationType ] [-variant_index_type <variant_index_type>]
[-variant_index_parameter <variant_index_parameter>] [-ref_win_stop <reference_window_stop>] [-l <logging_level>] [-log <log_to_file>] [-h] [-version]

-T,–analysis_type <analysis_type> Name of the tool to run
-I,–input_file <input_file> Input file containing sequence
data (BAM or CRAM)
–showFullBamList Emit a log entry (level INFO)
containing the full list of
sequence data files to be
included in the analysis
(including files inside
.bam.list or .cram.list
files).
-rbs,–read_buffer_size <read_buffer_size> Number of reads per SAM file
to buffer in memory
-et,–phone_home <phone_home> Run reporting mode (NO_ET|AWS|
STDOUT)
-K,–gatk_key <gatk_key> GATK key file required to run
with -et NO_ET
-tag,–tag Tag to identify this GATK run
as part of a group of runs
-rf,–read_filter <read_filter> Filters to apply to reads
before analysis
-drf,–disable_read_filter <disable_read_filter> Read filters to disable
-L,–intervals One or more genomic intervals
over which to operate
-XL,–excludeIntervals One or more genomic intervals
to exclude from processing
-isr,–interval_set_rule <interval_set_rule> Set merging approach to use
for combining interval inputs
(UNION|INTERSECTION)
-im,–interval_merging <interval_merging> Interval merging rule for
abutting intervals (ALL|
OVERLAPPING_ONLY)
-ip,–interval_padding <interval_padding> Amount of padding (in bp) to
add to each interval
-R,–reference_sequence <reference_sequence> Reference sequence file
-ndrs,–nonDeterministicRandomSeed Use a non-deterministic random
seed
-maxRuntime,–maxRuntime Stop execution cleanly as soon
as maxRuntime has been reached
-maxRuntimeUnits,–maxRuntimeUnits Unit of time used by
maxRuntime (NANOSECONDS|
MICROSECONDS|MILLISECONDS|
SECONDS|MINUTES|HOURS|DAYS)
-dt,–downsampling_type <downsampling_type> Type of read downsampling to
employ at a given locus (NONE|
ALL_READS|BY_SAMPLE)
-dfrac,–downsample_to_fraction <downsample_to_fraction> Fraction of reads to
downsample to
-dcov,–downsample_to_coverage <downsample_to_coverage> Target coverage threshold for
downsampling to coverage
-baq,–baq Type of BAQ calculation to
apply in the engine (OFF|
CALCULATE_AS_NECESSARY|
RECALCULATE)
-baqGOP,–baqGapOpenPenalty BAQ gap open penalty
-fixNDN,–refactor_NDN_cigar_string Reduce NDN elements in CIGAR
string
-fixMisencodedQuals,–fix_misencoded_quality_scores Fix mis-encoded base quality
scores
-allowPotentiallyMisencodedQuals,–allow_potentially_misencoded_quality_scores Ignore warnings about base
quality score encoding
-OQ,–useOriginalQualities Use the base quality scores
from the OQ tag
-DBQ,–defaultBaseQualities Assign a default base quality
-PF,–performanceLog Write GATK runtime performance
log to this file
-BQSR,–BQSR Input covariates table file
for on-the-fly base quality
score recalibration
-qq,–quantize_quals <quantize_quals> Quantize quality scores to a
given number of levels (with
-BQSR)
-SQQ,–static_quantized_quals <static_quantized_quals> Use static quantized quality
scores to a given number of
levels (with -BQSR)
-DIQ,–disable_indel_quals Disable printing of base
insertion and deletion tags
(with -BQSR)
-EOQ,–emit_original_quals Emit the OQ tag with the
original base qualities (with
-BQSR)
-preserveQ,–preserve_qscores_less_than <preserve_qscores_less_than> Don’t recalibrate bases with
quality scores less than this
threshold (with -BQSR)
-globalQScorePrior,–globalQScorePrior Global Qscore Bayesian prior
to use for BQSR
-S,–validation_strictness <validation_strictness> How strict should we be with
validation (STRICT|LENIENT|
SILENT)
-rpr,–remove_program_records Remove program records from
the SAM header
-kpr,–keep_program_records Keep program records in the
SAM header
-sample_rename_mapping_file,–sample_rename_mapping_file <sample_rename_mapping_file> Rename sample IDs on-the-fly
at runtime using the provided
mapping file
-U,–unsafe Enable unsafe operations:
nothing will be checked at
runtime (ALLOW_N_CIGAR_READS|
ALLOW_UNINDEXED_BAM|
ALLOW_UNSET_BAM_SORT_ORDER|
NO_READ_ORDER_VERIFICATION|
ALLOW_SEQ_DICT_INCOMPATIBILITY|
LENIENT_VCF_PROCESSING|ALL)
d_locking_when_reading_rods,–disable_auto_index_creation_and_locking_when_reading_rods Disable both auto-generation
of index files and index file
locking
-sites_only,–sites_only Just output sites without
genotypes (i.e. only the first
8 columns of the VCF)
-writeFullFormat,–never_trim_vcf_format_field Always output all the records
in VCF FORMAT fields, even if
some are missing
-compress,–bam_compression <bam_compression> Compression level to use for
writing BAM files (0 - 9,
higher is more compressed)
-simplifyBAM,–simplifyBAM If provided, output BAM/CRAM
files will be simplified to
include just key reads for
downstream variation discovery
analyses (removing duplicates,
PF-, non-primary reads), as
well stripping all extended
tags from the kept reads
except the read group
identifier
–disable_bam_indexing Turn off on-the-fly creation
of indices for output BAM/CRAM
files.
–generate_md5 Enable on-the-fly creation of
md5s for output BAM files.
-nt,–num_threads <num_threads> Number of data threads to
allocate to this analysis
-nct,–num_cpu_threads_per_data_thread <num_cpu_threads_per_data_thread> Number of CPU threads to
allocate per data thread
-mte,–monitorThreadEfficiency Enable threading efficiency
monitoring
-rgbl,–read_group_black_list <read_group_black_list> Exclude read groups based on
tags
-ped,–pedigree Pedigree files for samples
-pedString,–pedigreeString Pedigree string for samples
-pedValidationType,–pedigreeValidationType Validation strictness for
pedigree information (STRICT|
SILENT)
-variant_index_type,–variant_index_type <variant_index_type> Type of IndexCreator to use
for VCF/BCF indices
(DYNAMIC_SEEK|DYNAMIC_SIZE|
LINEAR|INTERVAL)
-variant_index_parameter,–variant_index_parameter <variant_index_parameter> Parameter to pass to the
VCF/BCF IndexCreator
-ref_win_stop,–reference_window_stop <reference_window_stop> Reference window stop
-l,–logging_level <logging_level> Set the minimum level of
logging
-log,–log_to_file <log_to_file> Set the logging location
-h,–help Generate the help message
-version,–version Output version information

以上数行可以全部忽略，只看-T,--analysis_type <analysis_type> Name of the tool to run,而通过观察，我们发现上面出错的都是名字的匹配错误，即-T之后的值。

那么，我现在就有一个想法，就是-T后面可以有什么?我想，通过查阅官网（http://www.broadinstitute.org/gatk），或许可以给我一些答案。

最后，还是去网上搜更加方便。
是java版本的问题。 参考链接：https://www.cnblogs.com/raybiolee/p/6664259.html

我看一下，作者在摘要中说明的java的版本要求：

java (1.8)
java17: path (including the executable file name) to java 1.7 (needed only for mutect). must be strictly followed

有两个要求。既要求java(1.8),又要求有java(1.7)的路径，专门用于mutect。

我现在的java版本是：

java -version
openjdk version “16” 2021-03-16
OpenJDK Runtime Environment (build 16+36-2231)
OpenJDK 64-Bit Server VM (build 16+36-2231, mixed mode, sharing)

我的GATK的version是：

3.5-0-g36282e4

所以，不是作者要求的那样的。那么，怎么做呢？

alias gatk.jar="path/to/jre/bin/java -jar /path/to/GenomeAnalysisTk.jar"

虽然，不太懂alias的意思。这行代码，可以指定GenomeAnalysisTk.jar所使用的java路径。

好的，现在开始安装java。

一般的java下载的网站都是要注册的好像，我之前有遇到一个网站，是将这部分的下载源镜像过来了。我去找一下我之前的安装记录。
由于一般的java都需要登陆，但是在网上发现一个人用爬虫写的，将某用户的账号和密码爬取下来，用于登陆的网站。
http://bugmenot.com/view/oracle.com（收藏）
使用其中的一个成功率比较大的账号和密码，登陆https://www.oracle.com/cn/java/technologies/javase/javase-jdk8-downloads.html，发现成功了！

登陆成功后，下载java1.8版本。

使用tar解压，结果报错。
tar -zxvf jdk-8u291-linux-x64.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

https://www.cnblogs.com/shamo89/p/9265220.html

解决方案1：不用加z指令。

无效。

最终的解决方案：因为是用wget指令下载的文件，在下载的过程中，没有同意协议，所以最终下载出来的文件会有问题。
应该把文件下载下来，然后，再上传到服务器上。

下载完成。
下载地址：
（1）java(1.8)：https://www.oracle.com/cn/java/technologies/javase/javase-jdk8-downloads.html
（2）java(1.7)：https://www.oracle.com/java/technologies/javase/javase7-archive-downloads.html#license-lightbox

好像直接解压，即可，存在java环境。
java1.7的位置：

/home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java

java1.8的位置：

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin

所以，在指令中，java1.7的路径可以进一步的修改。GenomeAnalysisTK.jar所运行的java的指令也可以进一步修改。
将除了1.7之外的，用到java的指令的，路径都改为1.8的位置。

我们再返回window界面，修改java的路径。更新somatic.pl文件。

更新完成。

修改指令：

perl somatic.pl NA NA ./example/example_dataset/sequencing/SRR7246238_1.fastq.gz ./example/example_dataset/sequencing/SRR7246238_2.fastq.gz 32 hg19 ./geneome/hg19/hg19.fa /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java ./output human 1 ./disambiguate_pipeline

通过，对于整个流程的跟踪，我们发现到比对和标记重复这个步骤都是正确的，下一步旧的问题没有再次出现，但还是出现了新的问题。说明没有停留在旧的那一步，程序已经能够听懂，并且继续进行下去了，这是好事。
报错的信息很有价值，我们好好分析一下。

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ./geneome/hg19/hg19.fa --num_threads 32 -known ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf -known ./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf  -o ./output/tumor/tumor_intervals.list -I ./output/tumor/dupmark.bam > ./output/tumor/index.out

报错：

MESSAGE: Input files known2 and reference have incompatible contigs. Please see http://gatkforums.broadinstitute.org/discussion/63/input-files-have-incompatible-contigsfor more information.
Error details: The contig order in known2 and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files…

ERROR known2 contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl0002

ERROR reference contigs = [chr1, chr1_gl000191_random, chr1_gl000192_random, chr2, chr3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr20, chr21, chr21_gl000210_random, chr22, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T IndelRealigner  --filter_bases_not_stored --disable_auto_index_creation_and_locking_when_reading_rods -R ./geneome/hg19/hg19.fa -known ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf -known ./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf  -targetIntervals ./output/tumor/tumor_intervals.list -I ./output/tumor/dupmark.bam -o ./output/tumor/realigned.bam >./output/tumor/tumor_realign.out

ERROR MESSAGE: Input files knownAlleles2 and reference have incompatible contigs. Please see http://gatkforums.broadinstitute.org/discussion/63/input-files-have-incompatible-contigsfor more information. Error details: The contig order in knownAlleles2 and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files…

ERROR knownAlleles2 contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]

ERROR reference contigs = [chr1, chr1_gl000191_random, chr1_gl000192_random, chr2, chr3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr20, chr21, chr21_gl000210_random, chr22, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]

还挺有意思的，这次是排列顺序出了问题。

这个链接归纳的也很好： https://www.cnblogs.com/timeisbiggestboss/articles/8708203.html （收藏）

明天研究它。

实验记录 | 6/3 修改somatic.pl中的文件路径相关推荐

ORACLE热备复制数据库全记录（可以修改数据库名和数据文件路径）
oralce热备是最简单,速度最快的数据库复制方法,以下是我的全部记录: --------------------------------------------------------------- ...
java当中如何修改路径_Java 中更改文件路径
java 中更改文件路径可以用file.renameTo 方法 public static void moveDataFile2Reject() throws Exception { try { // ...
js/jquery 获取本地文件的文件路劲获取input框中type=‘file’ 中的文件路径(转载)...
原文:http://blog.csdn.net/niyingxunzong/article/details/16989947 js/jquery 获取本地文件的文件路劲获取input框中type=' ...
ftp服务器中更改文件路径,ftp服务器中更改文件路径
ftp服务器中更改文件路径内容精选换一换文件作用:可以关闭/打开监听端口.指定监听端口.指定监听IP等.文件路径:在运行环境上,在~/ide_daemon目录下查看ide_daemon.cfg配 ...
Windows 系统中的文件路径格式
目录前言 1 Windows 系统中的文件路径格式 1.1. 传统 DOS 路径 1.2. UNC 路径 1.3. DOS 设备路径 2 路径规范化 3 路径标识总结 4 应用当前目录 5 规范化分 ...
[vue] 怎么修改vue打包后生成文件路径？
[vue] 怎么修改vue打包后生成文件路径? webpack:output.path vue-cli3: outputDir 个人简介我是歌谣,欢迎和大家一起交流前后端知识.放弃很容易, 但坚持一 ...
Java在WEB项目中获取文件路径
2019独角兽企业重金招聘Python工程师标准>>> jsp中获得文件路径 1.根目录所对应的绝对路径:request.getRequestURI(): 2.文件的绝对路径:app ...
python保存文件夹中的文件路径（绝对路径）
保存文件夹中的文件路径(绝对路径). # !/usr/bin/env python # -*- encoding: utf-8 -*-import osimg_path = '/home/jjuv/D ...
java中file路径_Java中的文件路径
Java中的文件路径今天一定在这里解决这个问题,通过路径读文件一般就3种方式,但他们完全不同: 1. File myFile=new File("myfile.txt"); 上面 ...

实验记录 | 6/3 修改somatic.pl中的文件路径

实验记录 | 6/3 修改somatic.pl中的文件路径相关推荐

最新文章

热门文章