本文图片来自于学习视频——新一代测序技术数据分析第三讲 DNA-seq

Review

Alignment srategies
Smith-Waterman(speed too slow to use)
Fast alignment
Hash table
Seed and extension
Mask(for mismatches)
Suffix tree/prefix tree
Suffix array
Burrows-Wheeler Transformation

File format

VCF(Variant Call Format)

Usually stored in a compressed manner and can be indexed
QUAL: phred p-value of the variant call quality
Higher QUAL value —— less mistake
Filter
PASS—— if this position passed all the filters in the header files
q10:s50——list of filters that are not met

INFO: additional information
Optional
18 predefined options
Examples:
DB: dbSNP membership
DP: combined depth across all the samples
NS: number of samples with data
AF: estimated allele frequency
SB: strand bias at this position
AA: ancestral allele
Genotype fields: individual samples
Examples:
GT: genotype (0 reference, 1 first alternative, 2 second alternative…)
GQ: conditional genotype quality
-10long10[p-value(GT call is wrong | variants exist)]
DP: read depth at this position in this sample
HQ: hyplotype qualities

Data visualization

Genome Browsers

Bring together genome data and additional annotation data for viewing in a single browser of the genome
Genome Browsers provide context
Organize data based on chromosomal locations
Search for or navigate to genomic areas of interest to select and view annotation track for the region
EBI(Ensemble) genome browser
NCBI(Map Viewer)
UCSC Genome Browser
http://genome.ucsc.edu

Use this Gateway to search by
Gene names, symbols, IDs
Chromosome number chr7. or region: chr11:1038475-1075482
Keywords: kinase, receptor…


Viewing NGS data
Text files
Upload data/files to GENOME BROWSER sites
BED, GFF, GFT, WIG, MAF, BED detail, Personal Genome SNP, PSL
Binary files
Only portions of the files needed for display are transferred to UCSC
Enable to display files are very large
BAM, bigBED, bigWig,…
Viewing options
Hide: removes a track from view
Dense: all items collapsed into a single line
Squish: each items = separate line, but 50% height + packed
Pack: each item separate, but efficiently stacked(full height)
Full: each item on separate line

Integrative Genomics Viewer(IGV)

Supports a wide variety of data including sequence alignments, microarrays and genomic annotations
Java-based

Genetic variation

SNP(Single nucleotide polymorphism)
1 in every few hundred bp
Mutation rate ~= 10-9
Short indels(insertion/deletion)
1 in every few kb
Mutation rate: variable
Microsatellite(STR) repeat number
1 in every few kb
2-6 bp repeat units
Mutation rate < 10-3
Minisatellites
1 in every few kb
10-100bp repeat units
Mutation rate < 10-1
Repeated genes
rRNA, histones
Large structure variations
Insertion/deletions
Duplications
Inversions
Copy number variations

SNP

Types of SNP
Transition: A,G or C,T
Transversion: substitution between purine and a pyrimidine
for whole human genome, ts/tv of around 2-2.1 is generally correct, in exon, it is 2.8~3.0
SNPs and haplotype
Haplotypes are ‘blocks’ of associated SNPs
Structure variations
Traditionally defined as deletions
insertions or inversions > 1kb
Often involves repetitive regions of the genome and complex rearrangements
No optimal method for SV discovery (before NGS)

Underlying hypothesis for GWAS
Common disease, common variants
Common variants present in more than 1-5% of the population contribute to common disease
GWAS generally do not capture rare variants
Successful GWAS stories
Significant associations reported through March 2010( Manollo. New England J OF med. 2010)
~800 SNPs, 545 studies, 150 diseases/traits
GWAS limitations: lack of functional information
Disease/trait-associated SNPs are not necessarily causative variants
statistical powers
reduce false-positives and improve reproducibility of results
Missing heritability
Median odds ratio copy of the risk allele 1.33
NGS breakthrough in genetics of complex disease
Whole genome sequencing following GWAS(Holm et al. Nat Gen 2011)——Sick Sinum Syndrome
Exome sequencing (Ng et al. Nat Gen 2011)—— Miller Syndrome
Pooled sequencing (Calvo et al. Nat Gen 2011)——Human Complex 1 disorder

Lecture 3——DNA-seq-1相关推荐

  1. Biopython操作DNA,RNA和蛋白质序列

    如何将一条DNA编码序列翻译成蛋白质序列,并写入fasta文件 读入DNA序列 from Bio import Seq from Bio.Alphabet import IUPAC dna = ope ...

  2. R语言:文本(字符串)处理与正则表达式

    处理文本是每一种计算机语言都应该具备的功能,但不是每一种语言都侧重于处理文本.R语言是统计的语言,处理文本不是它的强项,perl语言这方面的功能比R不知要强多少倍.幸运的是R语言的可扩展能力很强,DN ...

  3. Mega使用及R语言中多序列比对

    安装 https://www.megasoftware.net/,下载windows的GUI版本,要使用CC(命令行)版本–配置好环境变量即可.然后如果觉得windows配置不好,也可以安装linux ...

  4. R语言︱文本(字符串)处理与正则表达式

    处理文本是每一种计算机语言都应该具备的功能,但不是每一种语言都侧重于处理文本.R语言是统计的语言,处理文本不是它的强项,perl语言这方面的功能比R不知要强多少倍.幸运的是R语言的可扩展能力很强,DN ...

  5. Scipy Lecture Notes学习笔记(一)Getting started with Python for science 1.2. The Python language

    Scipy Lecture Notes学习笔记(一)Getting started with Python for science 1.2. The Python language 1.2.2. 基本 ...

  6. fasta文件中DNA to RNA

    同样的名为read_1.fa 的fasta文件,里面有若干序列,如: >@r1 TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAA ...

  7. dna编码库_Nature |DNA元件百科全书(ENCODE)计划, 全面注释基因组元件

    原创 mumu 图灵基因 今天 来自专辑 前沿生物大数据分析 撰文:mumu IF=42.778 推荐度:⭐⭐⭐⭐⭐ 亮点: 1.研究了小鼠胚胎全组织.单细胞分辨率水平.不同组织和器官中.随时间变化的 ...

  8. 利用BioPerl将DNA序列翻译成蛋白序列

    转自 https://www.plob.org/article/4603.html 具体请去上面的网页查看. my $DNA="ATGCCCGGT"; my $pep=&T ...

  9. 使用机器学习和Python揭开DNA测序神秘面纱

    "脱氧核糖核酸(DNA)是一种分子,其中包含每个物种独特的生物学指令.DNA及其包含的说明在繁殖过程中从成年生物传给其后代." 简介 基因组是生物体中DNA的完整集合.所有生物物种 ...

  10. perl代码实现DNA翻译蛋白序列

    #!/usr/bin/perl ##本代码用于逐条翻译fasta序列至蛋白序列,指定了起始密码子和终止密码子 use strict; use warnings; my %hash; my $id; m ...

最新文章

  1. LA3177 - Beijing Guards(二分+贪心【更优美的解法)
  2. WEB 测试点总结
  3. 普度网络营销策划_普度网络营销策划-齐宁_新浪博客
  4. 微软云创益大赛获奖团队风采:做一个中国特色的.Net源代码社区
  5. Solr安装并导入mysql数据
  6. ASP权限管理系统源码下载
  7. Flsak爱家租房--个人信息
  8. UC浏览器电脑版播放视频时出现崩溃怎么解决
  9. 致力推广 Vim 的那个程序员走了,Vim 之父:我要把 9.0 版献给他
  10. PHP的XML Parser(转)
  11. vmnet0 子网ip和子网掩码_IP地址知识介绍及子网划分与汇总
  12. macOS Monterey 12.0.1(21A559) 正式版三分区原版黑苹果镜像
  13. 游戏设计入门——游戏程序框架设计
  14. 阅读“变形计”:一场偶然与非偶然的相遇
  15. vue开发之图片加载不出来问题解决
  16. 电脑变WIFI:建立虚拟共享WIFI热点可查看WIFI密码windows中使用bat批处理命令提示符cmd创建教程含工具
  17. dell笔记本外接显示器_戴尔笔记本怎么连接外接显示器?
  18. 拉伯证券|社会消费复苏将是2023年主旋律
  19. Sqlalchemy 使用add_columns函数
  20. 电商后台:商品管理系统

热门文章

  1. log在线生成器 html中如何设置浏览器中标题前的logo
  2. 怎么用计算机画动漫,如何电脑画漫画
  3. 安卓延时方法(推荐第三种)
  4. 程序纹理应用之静态纹理生成
  5. 王齐老师 浅谈cache memory
  6. 编写Java程序,实现接受三个班各四个学员的成绩并求出平均分
  7. 光模块在数据中心的应用解析
  8. 修改USRPx410的ip地址
  9. Novate:Retrofit2.0和RxJava的又一次完美改进加强(Tamic博客 -CSDN)
  10. 插件~使用ECharts动态在地图上标识点