Skip to content

lqmmring/MTGDC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Multi-scale Tensor Graph Diffusion Clustering for single-cell RNA sequencing data (MTGDC)

===========================================================================

MTGDC is an unsupervised clustering framework for single-cell RNA-seq data. MTGDC can learn global topological information between cells from multiple scales and use a simple and efficient tensor diffusion update algorithm to spread the high-order cell relationship graph to its neighbors until convergence to global stable state which preserves local and global cell topology structure.To achieve the purpose of mining potential similarity distributions among cells under a large amount of noise, we design a multi-scale affinity learning method to construct a fully connected graph between cells.

Overview

Overview

Table of content

Introduction

The algorithm has the following mechanisms: Multi-scale Affinity Learning, Tensor Graph Diffusion Learning, and Mixture Operator.

  • Multi-scale Affinity Learning: To achieve the purpose of mining potential similarity distributions among cells under a large amount of noise, we design a multi-scale affinity learning method to construct a fully connected graph between cells.
  • Tensor Graph Diffusion Learning: For each affinity matrix, we propose an efficient tensor graph diffusion learning framework to learn high-order in context of cells with multi-scale affinity matrices.
  • Mixture Operator: Finally, we mix the multi-scale tensor graph together to obtain the final complete high-order affinity matrix and apply it to spectral clustering.

Installation

- Required Installations

The software is coded using MATLAB and is free for academic purposes.

  • MATLAB: MATLAB R2019b
  • R: Seurat, splatter, ggplot2

Part of the code are from following paper:

Bai S, Zhou Z, Wang J, et al. Ensemble diffusion for retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 774-783.

- Install

To use, please download the MTGDC folder and follow the instructions

- Real Data

We selected 12 public scRNA-seq datasets to verify the performance of clustering analysis. The data is stored as the mat format in the folder Data.

1. Pollen, A.A., et al., Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nature biotechnology, 2014. 32(10): p. 1053.
2. Deng, Q., et al., Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science, 2014. 343(6167): p. 193-196.
3. Schlitzer, A., et al., Identification of cDC1-and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow. Nature immunology, 2015. 16(7): p. 718.
4. Buettner, F., et al., Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nature biotechnology, 2015. 33(2): p. 155.
5. Ting, D.T., et al., Single-cell RNA sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells. Cell reports, 2014. 8(6): p. 1905-1918.
6. Treutlein, B., et al., Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature, 2014. 509(7500): p. 371.
7. Kolodziejczyk, Aleksandra A., et al., Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation. Cell Stem Cell, 2015. 17(4): p. 471-485.
8. Angermueller, C., et al., Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nature methods, 2016. 13(3): p. 229-232.
9. Usoskin, D., et al., Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nature neuroscience, 2015. 18(1): p. 145-153.
10. Tasic, B., et al., Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nature neuroscience, 2016. 19(2): p. 335-346.
11. Macosko, E.Z., et al., Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 2015. 161(5): p. 1202-1214.
12. Zeisel, A., et al., Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science, 2015. 347(6226): p. 1138-1142.

- Simulated Data

We evaluated our method on simulated datasets. Synthetic datasets were simulated by the R package Splatter. The parameters of simulated data are provided in splatter.

The data is stored as the mat format in the folder Sim-Data

- Files Illustration

  • MTGDC: main MTGDC algorithm consisting of the three steps.
  • NormalizeFea: provide the normalized processing.
  • adaptiveGaussian: provide the multi-scale affinity learning.
  • IterativeDiffusionTPGKNN:this code is an implementation of the diffusion process.
  • knnSparse: sparse the affinity matrix by k-nearst neighbor.
  • mergeW: this code is an implementation of the mixture of diffusion affinity matrices.

- Baselines

To verify the performance of our method (MTGDC), we compared it with some competitive baselines.

We selected several widely used scRNA-seq data clustering tools, including graph-based methods (Seurat and SNN-Cliq), ensemble-based methods (SC3, EMEP) and reprehensive learning-based methods (MPSSC, SIMILR, SMSC and SinNLRR).

13. Grubman, A., et al., A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation. Nature neuroscience, 2019. 22(12): p. 2087-2097.
14. Xu C, Su Z. Identification of cell types from single-cell transcriptomes using a novel clustering method[J]. Bioinformatics, 2015, 31(12): 1974-1980.
15. Kiselev, V.Y., et al., SC3: consensus clustering of single-cell RNA-seq data. Nat Methods, 2017. 14(5): p. 483-486.
16. Li, X., S. Zhang, and K.C. Wong, Single-cell RNA-seq Interpretations using Evolutionary Multiobjective Ensemble Pruning. Bioinformatics, 2018.
17. Park, S. and H. Zhao, Spectral clustering based on learning similarity matrix. Bioinformatics, 2018. 34(12): p. 2069-2076.
18. Wang, B., et al., SIMLR: A Tool for Large-Scale Genomic Analyses by Multi-Kernel Learning. Proteomics, 2018. 18(2).
19. Qi, R., et al., A spectral clustering with self-weighted multiple kernel learning method for single-cell RNA-seq data. Briefings in Bioinformatics, 2020.
20. Zheng, R., et al., SinNLRR: a robust subspace clustering method for cell type detection by nonnegative and low rank representation. Bioinformatics, 2019.

Example Usage:

A demo is provided in run file, showing details of data preprocessing, and clustering with MTGDC.

- Preprocessing

The input is configured as n cells (rows) by m genes (columns).

clc;
clear;
close all;
addpath('MeasureTools');
addpath('LIB');
% compile_func(0);
load('Data/Data_Buettner.mat'); % load data
% ncell=5000;
X=in_X;
% ViewN = 3;
k = 20;
label=true_labs;
% X=in_X(1:ncell,:);
ViewN = 3;
[m,n]=size(X);
% pr=0.05;
% k = round(m*pr);
% label=true_labs(1:ncell);

kmeansK = length(unique(label));
TotalSampleNo=length(label);
TempvData=X;
NorTempvData=NormalizeFea(double(TempvData));
[tempN,tempD] = size(TempvData);

- Multi scale Affinity Learning

tic
disp('Affinity learning......');
[W_G,W_KNN] = adaptiveGaussian(NorTempvData, k, ViewN);
toc

- Tensor Graph Diffusion Learning and Mixture Operator

tic
WW_G=IterativeDiffusionTPGKNN(W_G,k, ViewN);
WW_KNN=IterativeDiffusionTPGKNN(W_KNN,k, ViewN);
WW_MerG=mergeW(WW_G,TotalSampleNo,ViewN);
WW_MerKNN=mergeW(WW_KNN,TotalSampleNo,ViewN);
toc

- Spectral Clustering

tic
disp('Spectral clustering......');
out_G = SpectralClustering(WW_MerG,kmeansK);
out_KNN = SpectralClustering(WW_MerKNN,kmeansK);
toc

- Clustering Evalution

tic
[result_G,Con_G] = ClusteringMeasure(label, out_G');  % [8: ACC MIhat Purity ARI F-score Precision Recall Contingency];
[result_KNN,Con_KNN] = ClusteringMeasure(label, out_KNN'); 
nmi_G = Cal_NMI(label, out_G');
nmi_KNN = Cal_NMI(label, out_KNN');
RES_G = [result_G,nmi_G];
RES_KNN=[result_KNN,nmi_KNN];
toc

Results

The clustering results of MTGDC and compared clustering algorithms

- Scalability

Comparison of NMI among MTGDC and eight clustering algorithms on 12 real datasets.

Comparison of NMI among MTGDC and eight clustering algorithms on 15 simulated datasets.

- Implementation Time

Speed analysis of MTGDC on Macosko dataset.

Acknowledgment

The authors would like to appreciate the support and guidance from Dr. G.H. Wang and Dr. J. Li.

Maintenance

If there's any questions / problems about MTGDC, please feel free to contact Q.M. Liu - cslqm@hit.edu.cn. Thank you!

About

MTGDC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published