文章目录

  • CIBERSORT
  • 一、CIBERSORT 是什么
  • 二、CIBERSORT 怎么用
  • 三、CIBERSORT 结果可视化

CIBERSORT

CIBERSORT is an analytical tool from the Alizadeh Lab developed by Newman et al. to provide an estimation of the abundances of member cell types in a mixed cell population, using gene expression data.
CIBERSORTx, the next generation version of CIBERSORT, is now available (Newman et al.), with support for single-cell RNA-seq and cell type-specific gene expression purification. We recommend moving over to the CIBERSORTx website. For users who registered with CIBERSORT prior to 2018, you may log in with your CIBERSORT account credentials, otherwise please register for a new account.

CIBERSORT 旧版官网入口:https://cibersort.stanford.edu/
现在新版本名字加了一个x,叫CIBERSORTx,主要增加了适配大量组织(bulk tissue)样本单细胞测序方面的功能,新版入口:https://cibersortx.stanford.edu/
原文地址:https://www.nature.com/articles/nmeth.3337


一、CIBERSORT 是什么

CIBERSORTx is an analytical tool from the Alizadeh Lab and Newman Lab to impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell population, using gene expression data. (输入基因表达矩阵,输出样本中的各种细胞类型的丰度。比较有意思的是,CIBERSORT的设计之初的核心目标是“predicting fractions of multiple cell types in gene expression profiles (GEPs)”,可能例子里22个免疫细胞 signature让人太过印象深刻,几乎全互联网的教程都是基于免疫细胞的,CIBERSORT也被认为是专门用来量化免疫浸润的,其实signature是可以按需求自行选择的。)


二、CIBERSORT 怎么用

CIBERSORT可以在线用(第一部分有官网,里面有详细教程),也可以在本地用R环境跑。CIBERSORT是基于线性支持向量回归(SVR)开发的,gene作为输入在计算过程中去卷积化。CIBERSORT主要需要ν-SVR算法的库, 用R去执行CIBERSORT的话,需要加载 R package e1071作为前置条件。

CIBERSORT R script v1.04 代码:

# CIBERSORT R script v1.04 (last updated 10-24-2016)
# Note: Signature matrix construction is not currently available; use java version for full functionality.
# Author: Aaron M. Newman, Stanford University (amnewman@stanford.edu)
# Requirements:
#       R v3.0 or later. (dependencies below might not work properly with earlier versions)
#       install.packages('e1071')
#       install.pacakges('parallel')
#       install.packages('preprocessCore')
#       if preprocessCore is not available in the repositories you have selected, run the following:
#           source("http://bioconductor.org/biocLite.R")
#           biocLite("preprocessCore")
# Windows users using the R GUI may need to Run as Administrator to install or update packages.
# This script uses 3 parallel processes.  Since Windows does not support forking, this script will run
# single-threaded in Windows.
#
# Usage:
#       Navigate to directory containing R script
#
#   In R:
#       source('CIBERSORT.R')
#       results <- CIBERSORT('sig_matrix_file.txt','mixture_file.txt', perm, QN, absolute, abs_method)
#
#       Options:
#       i)   perm = No. permutations; set to >=100 to calculate p-values (default = 0)
#       ii)  QN = Quantile normalization of input mixture (default = TRUE)
#       iii) absolute = Run CIBERSORT in absolute mode (default = FALSE)
#               - note that cell subsets will be scaled by their absolute levels and will not be
#                 represented as fractions (to derive the default output, normalize absolute
#                 levels such that they sum to 1 for each mixture sample)
#               - the sum of all cell subsets in each mixture sample will be added to the ouput
#                 ('Absolute score'). If LM22 is used, this score will capture total immune content.
#       iv)  abs_method = if absolute is set to TRUE, choose method: 'no.sumto1' or 'sig.score'
#               - sig.score = for each mixture sample, define S as the median expression
#                 level of all genes in the signature matrix divided by the median expression
#                 level of all genes in the mixture. Multiple cell subset fractions by S.
#               - no.sumto1 = remove sum to 1 constraint
#
# Input: signature matrix and mixture file, formatted as specified at http://cibersort.stanford.edu/tutorial.php
# Output: matrix object containing all results and tabular data written to disk 'CIBERSORT-Results.txt'
# License: http://cibersort.stanford.edu/CIBERSORT_License.txt#Core algorithm
CoreAlg <- function(X, y, absolute, abs_method){#try different values of nusvn_itor <- 3res <- function(i){if(i==1){nus <- 0.25}if(i==2){nus <- 0.5}if(i==3){nus <- 0.75}model<-svm(X,y,type="nu-regression",kernel="linear",nu=nus,scale=F)model}if(Sys.info()['sysname'] == 'Windows') out <- mclapply(1:svn_itor, res, mc.cores=1) elseout <- mclapply(1:svn_itor, res, mc.cores=svn_itor)nusvm <- rep(0,svn_itor)corrv <- rep(0,svn_itor)#do cibersortt <- 1while(t <= svn_itor) {weights = t(out[[t]]$coefs) %*% out[[t]]$SVweights[which(weights<0)]<-0w<-weights/sum(weights)u <- sweep(X,MARGIN=2,w,'*')k <- apply(u, 1, sum)nusvm[t] <- sqrt((mean((k - y)^2)))corrv[t] <- cor(k, y)t <- t + 1}#pick best modelrmses <- nusvmmn <- which.min(rmses)model <- out[[mn]]#get and normalize coefficientsq <- t(model$coefs) %*% model$SVq[which(q<0)]<-0if(!absolute || abs_method == 'sig.score') w <- (q/sum(q)) #relative space (returns fractions)if(absolute && abs_method == 'no.sumto1') w <- q #absolute space (returns scores)mix_rmse <- rmses[mn]mix_r <- corrv[mn]newList <- list("w" = w, "mix_rmse" = mix_rmse, "mix_r" = mix_r)}#do permutations
doPerm <- function(perm, X, Y, absolute, abs_method){itor <- 1Ylist <- as.list(data.matrix(Y))dist <- matrix()while(itor <= perm){#print(itor)#random mixtureyr <- as.numeric(Ylist[sample(length(Ylist),dim(X)[1])])#standardize mixtureyr <- (yr - mean(yr)) / sd(yr)#run CIBERSORT core algorithmresult <- CoreAlg(X, yr, absolute, abs_method)mix_r <- result$mix_r#store correlationif(itor == 1) {dist <- mix_r}else {dist <- rbind(dist, mix_r)}itor <- itor + 1}newList <- list("dist" = dist)
}#main function
CIBERSORT <- function(sig_matrix, mixture_file, perm=0, QN=TRUE, absolute=FALSE, abs_method='sig.score'){#dependenciesrequire(e1071)require(parallel)require(preprocessCore)if(absolute && abs_method != 'no.sumto1' && abs_method != 'sig.score') stop("abs_method must be set to either 'sig.score' or 'no.sumto1'")#read in dataX <- read.table(sig_matrix,header=T,sep="\t",row.names=1,check.names=F)Y <- read.table(mixture_file, header=T, sep="\t",check.names=F)#to prevent crashing on duplicated gene symbols, add unique numbers to identical namesdups <- dim(Y)[1] - length(unique(Y[,1]))if(dups > 0) {warning(paste(dups," duplicated gene symbol(s) found in mixture file!",sep=""))rownames(Y) <- make.names(Y[,1], unique=TRUE)}else {rownames(Y) <- Y[,1]}Y <- Y[,-1]X <- data.matrix(X)Y <- data.matrix(Y)#orderX <- X[order(rownames(X)),]Y <- Y[order(rownames(Y)),]P <- perm #number of permutations#anti-log if max < 50 in mixture fileif(max(Y) < 50) {Y <- 2^Y}#quantile normalization of mixture fileif(QN == TRUE){tmpc <- colnames(Y)tmpr <- rownames(Y)Y <- normalize.quantiles(Y)colnames(Y) <- tmpcrownames(Y) <- tmpr}#store original mixturesYorig <- YYmedian <- max(median(Yorig),1)#intersect genesXgns <- row.names(X)Ygns <- row.names(Y)YintX <- Ygns %in% XgnsY <- Y[YintX,]XintY <- Xgns %in% row.names(Y)X <- X[XintY,]#standardize sig matrixX <- (X - mean(X)) / sd(as.vector(X))#empirical null distribution of correlation coefficientsif(P > 0) {nulldist <- sort(doPerm(P, X, Y, absolute, abs_method)$dist)}header <- c('Mixture',colnames(X),"P-value","Correlation","RMSE")if(absolute) header <- c(header, paste('Absolute score (',abs_method,')',sep=""))output <- matrix()itor <- 1mixtures <- dim(Y)[2]pval <- 9999#iterate through mixtureswhile(itor <= mixtures){y <- Y[,itor]#standardize mixturey <- (y - mean(y)) / sd(y)#run SVR core algorithmresult <- CoreAlg(X, y, absolute, abs_method)#get resultsw <- result$wmix_r <- result$mix_rmix_rmse <- result$mix_rmseif(absolute && abs_method == 'sig.score') {w <- w * median(Y[,itor]) / Ymedian}#calculate p-valueif(P > 0) {pval <- 1 - (which.min(abs(nulldist - mix_r)) / length(nulldist))}#print outputout <- c(colnames(Y)[itor],w,pval,mix_r,mix_rmse)if(absolute) out <- c(out, sum(w))if(itor == 1) {output <- out}else {output <- rbind(output, out)}itor <- itor + 1}#save resultswrite.table(rbind(header,output), file="CIBERSORT-Results.txt", sep="\t", row.names=F, col.names=F, quote=F)#return matrix object containing all resultsobj <- rbind(header,output)obj <- obj[,-1]obj <- obj[-1,]obj <- matrix(as.numeric(unlist(obj)),nrow=nrow(obj))rownames(obj) <- colnames(Y)if(!absolute){colnames(obj) <- c(colnames(X),"P-value","Correlation","RMSE")}else{colnames(obj) <- c(colnames(X),"P-value","Correlation","RMSE",paste('Absolute score (',abs_method,')',sep=""))}obj
}

例子数据来自:https://github.com/zomithex/CIBERSORT,当然也可以从官网下,就是LM22后缀不一样,一个是txt,另一个是csv。

首先加载包,source脚本,读取数据:

# 我的例子是TCGA数据库大肠癌的100个样本
library("limma")
library('e1071')
source("CIBERSORT.R")
exp <- read.csv("TCGA_ROAD_100.csv",header=T,check.names=F)

# 整理一下行名,列名,删除表达特别低的基因
exp=as.matrix(exp)
rownames(exp)=exp[,1]
exp=exp[,2:ncol(exp)]
dimnames=list(rownames(exp),colnames(exp))
data=matrix(as.numeric(as.matrix(exp)),nrow=nrow(exp),dimnames=dimnames)
data=avereps(data)
data=data[rowMeans(data)>0,]

值得一提的事,CIBERSORT里面,自带log2转换功能。

#anti-log if max < 50 in mixture fileif(max(Y) < 50) {Y <- 2^Y}

当然log2转换也可以自己手动做,因为limma中负责Transform RNA-Seq Data Ready for Linear Modelling的voom函数不接受负值。

#把准备输入CIBERSORT的数据保存一下
v <-voom(data, plot = F, save.plot = F)
out=v$E
out=rbind(ID=colnames(out),out)
write.table(out,file="TCGA_100_ready.txt",sep="\t",quote=F,col.names=F)

CIBERSOFT 搞定

results=CIBERSORT("LM22.txt", "TCGA_100_ready.txt", perm=100, QN=TRUE)
write.csv(results,"TCGA_100_CIBERSORT_Output.csv")

输出的结果应该有25行,除了22种免疫细胞,还有P-value,Correlation, RMSE三行,滤掉不符合要求的样本就可以做可视化了。


三、CIBERSORT 结果可视化

简单做一个可视化吧。

#调包日常
pkgs <- c("matrixStats", "pheatmap", "RColorBrewer", "tidyverse", "cowplot","ggpubr","bslib","ggthemes")
lapply(pkgs, library, character.only = T)# Read in results ---------------------------------------------------------cibersort_raw <- read.csv("TCGA_100_CIBERSORT_Output.csv",row.names = 1,header = T)
library(dplyr)
library(tidyr)
dd1 <- cibersort_raw %>%as.data.frame() %>%rownames_to_column("sample") %>%pivot_longer(cols = 2:23,names_to = "CellType",values_to = "Composition")plot.info <- dd1[,c(5,1,6)]

画两个棒棒图,都是彩虹色。

ggboxplot(plot.info,x = "CellType",y = "Composition",color = "black",fill = "CellType",xlab = "",ylab = "Cell composition",main = "TME Cell composition") +theme_base() +theme(axis.text.x = element_text(angle = 90,hjust = 1,vjust = 1))

ggbarplot(plot.info,x = "sample",y = "Composition",size = 0,fill = "CellType",color = "CellType",) +theme_base() +theme(axis.text.x = element_text(angle = 90,hjust = 1,vjust = 1,size = 1),legend.position = "bottom")

CIBERSORT 学习笔记相关推荐

  1. PyTorch 学习笔记(六):PyTorch hook 和关于 PyTorch backward 过程的理解 call

    您的位置 首页 PyTorch 学习笔记系列 PyTorch 学习笔记(六):PyTorch hook 和关于 PyTorch backward 过程的理解 发布: 2017年8月4日 7,195阅读 ...

  2. 容器云原生DevOps学习笔记——第三期:从零搭建CI/CD系统标准化交付流程

    暑期实习期间,所在的技术中台-效能研发团队规划设计并结合公司开源协同实现符合DevOps理念的研发工具平台,实现研发过程自动化.标准化: 实习期间对DevOps的理解一直懵懵懂懂,最近观看了阿里专家带 ...

  3. 容器云原生DevOps学习笔记——第二期:如何快速高质量的应用容器化迁移

    暑期实习期间,所在的技术中台-效能研发团队规划设计并结合公司开源协同实现符合DevOps理念的研发工具平台,实现研发过程自动化.标准化: 实习期间对DevOps的理解一直懵懵懂懂,最近观看了阿里专家带 ...

  4. 2020年Yann Lecun深度学习笔记(下)

    2020年Yann Lecun深度学习笔记(下)

  5. 2020年Yann Lecun深度学习笔记(上)

    2020年Yann Lecun深度学习笔记(上)

  6. 知识图谱学习笔记(1)

    知识图谱学习笔记第一部分,包含RDF介绍,以及Jena RDF API使用 知识图谱的基石:RDF RDF(Resource Description Framework),即资源描述框架,其本质是一个 ...

  7. 计算机基础知识第十讲,计算机文化基础(第十讲)学习笔记

    计算机文化基础(第十讲)学习笔记 采样和量化PictureElement Pixel(像素)(链接: 采样的实质就是要用多少点(这个点我们叫像素)来描述一张图像,比如,一幅420x570的图像,就表示 ...

  8. Go 学习推荐 —(Go by example 中文版、Go 构建 Web 应用、Go 学习笔记、Golang常见错误、Go 语言四十二章经、Go 语言高级编程)

    Go by example 中文版 Go 构建 Web 应用 Go 学习笔记:无痕 Go 标准库中文文档 Golang开发新手常犯的50个错误 50 Shades of Go: Traps, Gotc ...

  9. MongoDB学习笔记(入门)

    MongoDB学习笔记(入门) 一.文档的注意事项: 1.  键值对是有序的,如:{ "name" : "stephen", "genda" ...

最新文章

  1. Not enough memory. Please load a smaller dataset or use larger heap size.
  2. 【机器学习】这次终于彻底理解了奇异值分解(SVD)原理及应用
  3. 2020牛客多校2 - Exclusive OR(FWT)
  4. c# winform 打包(带数据库安装)
  5. MySQL中修改表结构的关键字_下列SQL语句中,修改表结构的关键字是
  6. 实战系列-IDEA中Spring MVC实现接口功能
  7. Linux内存管理:TLB flush操作
  8. 运用c++结束学校机房红蜘蛛控制软件
  9. 谈谈线下消费分期的风险点
  10. 【matplotlib笔记】在图表中使用中文信息作为标签
  11. ARM基础相关寄存器的讲解-LPC21XX
  12. 【强烈推荐】Java入门基础笔记,超全!
  13. [PTA]7-116 计算圆周率(c语言)(学习记录)
  14. 不择手段背单词、新东方词根词缀词典、超级新华字典、英语词根词缀记忆大全词典
  15. 开源、个人博客等网站搭建、上云费用控、软件程序爱好者资源集锦
  16. 自然语言处理——字符串基础操作及应用
  17. OAI搭建——eNB搭建
  18. unl文件导入orcle数据库
  19. 2023高频经典前端面试题(es6+webpack+http网络+性能优化中篇,含答案)
  20. html表格美化模板,JavaScript + CSS 美化出的条纹表格样式

热门文章

  1. 计算机2级公务员考试成绩嵌套,2020广东公务员考试行测:双层嵌套式假言命题等价命题思路点拨...
  2. 【mybatis的复杂查询】
  3. 红帽Linux操作系统基本命令(学习笔记)
  4. 域名历史查询工具-批量域名历史注册记录查询
  5. python中save是什么意思_如何在Python中生成save函数
  6. 计算机网络常见问题归纳
  7. 长度一百万的数组,get(0)和get(999999)性能有区别吗?
  8. Mac Wi-Fi断断续续的问题
  9. mac 清理~$开头的垃圾文件
  10. (2)lordPE脱壳