Rosalind编程问题之统计多个序列中profile矩阵和consensus。

Consensus and Profile

Problem:
A matrix is a rectangular table of values divided into rows and columns. An m×n matrix has m rows and n columns. Given a matrix A, we write Ai,j to indicate the value found at the intersection of row i and column j.

Say that we have a collection of DNA strings, all having the same length n. Their profile matrix is a 4×n matrix P in which P1,j represents the number of times that ‘A’ occurs in the jth position of one of the strings, P2,j represents the number of times that C occurs in the jth position, and so on (see below).

A consensus string c is a string of length n formed from our collection by taking the most common symbol at each position; the jth symbol of c therefore corresponds to the symbol having the maximum value in the j-th column of the profile matrix. Of course, there may be more than one most common symbol, leading to multiple possible consensus strings.
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Sample input

Rosalind_1
ATCCAGCT
Rosalind_2
GGGCAACT
Rosalind_3
ATGGATCT
Rosalind_4
AAGCAACC
Rosalind_5
TTGGAACT
Rosalind_6
ATGCCATT
Rosalind_7
ATGGCACT

Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)
Sample output

ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6


题目给出了多个核酸序列,需要我们按照碱基类型输出各个位置碱基的出现次数,即Profile,最终各个位置出现次数最多的碱基组成consensus 。

map集合和StringBuilder类可以很好解决这个问题,下面是实现代码:

public class Consensus_and_Profile {public static void main(String[] args) {//1.读取fasta文件List<String> fasta = BufferedReader2("C:/Users/Administrator/Desktop/rosalind_cons.txt", "fasta");//2.遍历并获取Consensus和ProfileStringBuilder Consensus = new StringBuilder();StringBuilder ProfileA = new StringBuilder();StringBuilder ProfileT = new StringBuilder();StringBuilder ProfileG = new StringBuilder();StringBuilder ProfileC = new StringBuilder();ProfileA.append("A: ");ProfileT.append("T: ");ProfileG.append("G: ");ProfileC.append("C: ");//建立指针i遍历索引for (int i = 0; i < fasta.get(0).length(); i++) {Map<String, Integer> maps = new HashMap<>();maps.put("A", 0);maps.put("T", 0);maps.put("C", 0);maps.put("G", 0);//建立指针j遍历元素for (int j = 0; j < fasta.size(); j++) {String key = String.valueOf(fasta.get(j).charAt(i));if (maps.containsKey(key)) {maps.put(key, maps.get(key) + 1);}}//遍历完成后获取出现次数最多的碱基并进行输出int maxvalue = 0;String maxKey = null;Set<String> keys = maps.keySet();for (String key : keys) {int value = maps.get(key);if (value >= maxvalue) {maxvalue = value;maxKey = key;}//构造Profileswitch (key) {case "A":ProfileA.append(value + " ");break;case "T":ProfileT.append(value + " ");break;case "G":ProfileG.append(value + " ");break;case "C":ProfileC.append(value + " ");break;default:break;}}Consensus.append(maxKey);}System.out.println(Consensus);System.out.println(ProfileA);System.out.println(ProfileC);System.out.println(ProfileG);System.out.println(ProfileT);}public static ArrayList<String> BufferedReader2(String path, String choose) {//返回值类型是新建集合大类,此处是Set而非哈希。BufferedReader reader;ArrayList<String> tag = new java.util.ArrayList<String>();ArrayList<String> fasta = new java.util.ArrayList<String>();try {reader = new BufferedReader(new FileReader(path));String line = reader.readLine();StringBuilder sb = new StringBuilder();while (line != null) {//多次匹配带有“>”的行,\w代表0—9A—Z_a—z,需要转义。\W代表非0—9A—Z_a—z。if (line.matches(">[\\w*|\\W*]*")) {tag.add(line);//定义字符串变量seq保存删除换行符的序列信息if (sb.length() != 0) {String seq = sb.toString();fasta.add(seq);sb.delete(0, sb.length());//清空StringBuilder中全部元素}} else {sb.append(line);//重新向StringBuilder添加元素}// read next lineline = reader.readLine();}String seq = sb.toString();fasta.add(seq);reader.close();} catch (IOException e) {e.printStackTrace();}if (choose.equals("tag")) {return tag;}return fasta;}
}

Rosalind Java|Consensus and Profile相关推荐

  1. Java EE 7公共草案已发布。 我需要Java EE Light Profile!

    2012年12月20日,Java EE 7的公共草案已上载. 乍一看,新规范是对Java EE 6中后续规范的改进.例如,我真的很喜欢Web Profile的想法. 遗憾的是它不是Java EE 6 ...

  2. Rosalind Java|Locating Restriction Sites

    Rosalind编程问题之检索限制性位点. Locating Restriction Sites Problem: A DNA string is a reverse palindrome if it ...

  3. Rosalind Java|Open Reading Frames

    Rosalind编程问题之读取开放阅读框. Open Reading Frames Problem Either strand of a DNA double helix can serve as t ...

  4. Rosalind Java|Matching Random Motifs

    Rosalind编程问题之计算随机序列出现并匹配待比对序列的概率. 跟Rosalind Java|Introduction to Random Strings有异曲同工之妙. Matching Ran ...

  5. Rosalind Java| Computing GC Content

    Rosalind编程问题之计算GC含量. Computing GC Content Problem The GC-content of a DNA string is given by the per ...

  6. Rosalind Java| Counting Point Mutations

    Rosalind编程问题之计数核酸序列突变数. Counting Point Mutations Problem Given two strings s and t of equal length, ...

  7. Rosalind Java|Longest Increasing Subsequence动态规划算法

    Rosalind编程问题之计算集合中最长的递增元素子集. Longest Increasing Subsequence Problem: A subsequence of a permutation ...

  8. Rosalind Java| Finding a Shared Motif

    Rosalind编程问题之寻找共有的motif. Finding a Shared Motif Problem A common substring of a collection of string ...

  9. Rosalind Java|Overlap Graphs

    Rosalind编程问题之查找重叠区段. Overlap Graphs Problem: A graph whose nodes have all been labeled can be repres ...

最新文章

  1. Bing API 2的体验
  2. 前端基础--jquery操作元素
  3. struts2上传文件类型限制
  4. 20165114 《网络对抗技术》 Exp0 Kali安装与配置 Week1
  5. struts入门超详细
  6. 二叉树的一些leetcode题目+python(c++)
  7. 鸿蒙WLAN模组联网+解决在Visual Studio Code不能更改Linux文件的问题
  8. 【推荐】JSON在线格式化工具
  9. 台式计算机加固态硬盘,台式机加硬盘|台式机加SSD固态硬盘提速教程
  10. easychm生成帮助文件时出现的目录导航乱码问题
  11. (CVPR 2019) PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud
  12. 启动docker容器时报iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport错误
  13. python--22 类和对象
  14. Win11添加日语输入法的教程
  15. UE4 更改工程文件名字的方法
  16. Compact Multi-Signatures for Smaller Blockchains学习笔记
  17. android数字转汉字,【原创】最精简的中文数字和阿拉伯数字互相转换函数
  18. HTML5+CSS3小实例:菜单栏图标悬停效果
  19. 违反唯一性约束的两种可能:唯一约束or唯一索引
  20. 指定TabLayout的指示器宽度

热门文章

  1. esp8266最小系统似乎是可以直接用起来接在飞控上的
  2. 阿里云弹性计算总经理张献涛:智能化、高效能、新交互将重塑互联网
  3. 从零起步认识XAML
  4. elasticsearch和elasticsearch-sql安装教程
  5. 自动驾驶定位技术之争:融合定位才是出路
  6. OpenCV - C++ - cv::Scalar
  7. python写用用户名密码程序_python写用’户登录程序‘的过程
  8. ​主机安全是什么以及​主机安全的功能和价值分析
  9. 虚拟化技术—基础(1)
  10. getResource()的几种路径方式