科研进展

陈洛南研究组发表“基于单细胞转录组数据构建细胞特异性网络”的研究成果

来源：时间：2019-03-26

3月13日，国际学术期刊Nucleic Acids Research在线发表了中国科学院生物化学与细胞生物学研究所陈洛南研究组题为“Cell-specific network constructed by single-cell RNA sequencing data”的最新研究成果。该成果首次提出从单细胞转录组数据(scRNA-seq)为每个单细胞构建一个基因关联网络(CSN: Cell-specific network)的理论和方法，使人们第一次能够在单细胞分辨率水平上识别基因之间的相互关联(网络)。利用该方法，可以从网络的角度对scRNA-seq数据进行聚类和拟轨迹分析，为scRNA-seq数据分析开辟了新的途径，并且该方法还能够发现在网络层面起重要作用但通常被传统的差异表达分析所忽略的“暗”基因，其准确性和鲁棒性在多个scRNA-seq公开数据集中得到验证。

单细胞转录组测序提供了一种高通量方法来测量和比较在单细胞分辨率水平下的基因表达水平，从而揭示了细胞之间的异质性和功能多样性，帮助发现具有独特功能的新细胞类型。而基于单细胞数据中庞大的样本量，理论上研究人员还可以从这些数据中构建基因关联网络，并在更深层次发现隐含的基因调控关系的变化规律。

为此，陈洛南研究组提出了一种在单细胞水平上构造每个细胞特异网络的新方法，其来源于基于此前研究组关于统计相关性的新理论模型，可以看作是从“不稳定”基因表达数据到“稳定”基因关联数据的转换。计算上，不需要对细胞事先进行聚类或分类，并可以识别基因之间的线性和非线性关联。对多个scRNA-seq数据集的实验均表明，该方法的准确性和鲁棒性优于传统方法，还能发现一些基因在网络水平而非表达水平上存在显著差异，从而在网络层面上提取更丰富的生物系统信息。

生化与细胞所戴昊博士为本文第一作者，该研究得到了科技部、中国科学院和国家自然科学基金的经费支持。

文章链接

Schematic illustration of CSN and NDM construction and our statistic model. (A) CSN and NDM construction. (i) Make scatter diagrams for every two genes, where each point represents a cell, and x- and y-values are the expression values of the two genes in the n cells. Then m genes lead to m (m – 1)/2 scatter diagrams. (ii) In the scatter diagram of genes x and y, the plot i with red color means there is an edge between genes x and y in the cell i network based on our statistic model, and if the plot is blue, there is no edge. Then, we can construct n cell-specific networks corresponding to n cells, respectively. (iii) By counting the number of edges connected to each gene in each CSN, we can get the network degree matrix, which is still comprised of m rows and n columns, as the same as GEM, and thus it can be analyzed by any existing method. (B) Our statistic model for edge between genes x and y. Near the plot or cell k, make the light and medium grey boxes to represent the neighborhood of x_k and y_k respectively. The intersection of two boxes is the dark grey box, which represents the neighborhood of (x_k, y_k). The number of plots in the light, medium and dark grey boxes is n_x^(k), n_y^(k) and n_xy^(k) respectively. Design the statistic as ρ_xy^(k). If x and y are independent of each other, the statistic follows normal distribution and the mean value and variance can be calculated. If the statistic ρ_xy^(k) is larger than a significant level, label plot k with red color, which means there is an edge between x and y in cell k; otherwise there is no edge.

附件下载：