Spectral Co-Clustering Algorithm in Scikit Learn
Last Updated :
26 Apr, 2025
Spectral co-clustering is a type of clustering algorithm that is used to find clusters in both rows and columns of a data matrix simultaneously. This is different from traditional clustering algorithms, which only cluster the rows or columns of a data matrix.
Spectral co-clustering is a powerful tool for data analysis, as it can help to uncover hidden patterns and relationships in the data. For example, it can be used to identify groups of similar items in a recommendation system or to find clusters of genes that have similar expression patterns in a gene expression dataset.
In this article, we will discuss the spectral co-clustering algorithm and how it can be implemented in Python using the Scikit-Learn library.
Spectral Co-Clustering Algorithm
Spectral co-clustering is a clustering algorithm that uses spectral graph theory to find clusters in both rows and columns of a data matrix simultaneously. This is done by constructing a bi-partite graph from the data matrix, where the rows and columns of the matrix are represented as nodes in the graph, and the entries in the matrix are represented as edges between the nodes.
The spectral co-clustering algorithm then uses the eigenvectors of the graph Laplacian to find the clusters in the data matrix. This is done by treating the rows and columns of the data matrix as two separate sets of nodes and using the eigenvectors to partition each set into clusters.
One advantage of the spectral co-clustering algorithm is that it can handle data with missing entries. This is because the algorithm only uses the non-zero entries in the data matrix to construct the bi-partite graph, and therefore does not require the matrix to be complete.
Another advantage of the spectral co-clustering algorithm is that it can find clusters of different sizes and shapes. This is because the algorithm uses the eigenvectors of the graph Laplacian, which are sensitive to the local structure of the graph and can therefore identify clusters of different shapes and sizes.
Now that we have discussed the basics of the spectral co-clustering algorithm, let’s see how it can be implemented in Python using the Scikit-Learn library.
First, let’s start by importing the necessary libraries:
Python3
from sklearn import datasets
from sklearn.cluster import SpectralCoclustering
|
Next, let’s load the dataset that we will use for our clustering analysis. For this example, we will use the iris dataset, which is a well-known dataset that contains 150 data points representing three different species of iris flowers (setosa, versicolor, and virginica).
Python3
iris = datasets.load_iris()
X = iris.data
|
Now that we have our dataset, we can proceed with implementing the spectral co-clustering algorithm.
To perform spectral co-clustering, we first need to create an instance of the SpectralCoClustering class. This class takes several parameters, including the number of clusters to find (n_clusters) and the number of eigenvectors to use (n_components). For this example, we will set n_clusters to 3, since there are 3 species of the iris in the dataset.
Python3
clustering = SpectralCoclustering(n_clusters = 3 )
clustering.fit(X)
|
Once the spectral co-clustering algorithm has been applied to the dataset, we can use the row_labels_ and column_labels_ attributes to obtain the cluster labels for the rows and columns of the data matrix.
Python3
row_labels = clustering.row_labels_
column_labels = clustering.column_labels_
|
Finally, we can use cluster labels to visualize the clusters and the relationships between them.
Python3
import matplotlib.pyplot as plt
plt.scatter(X[:, 0 ], X[:, 1 ], c = row_labels)
plt.ylabel( 'Feature 2' )
plt.xlabel( 'Feature 1' )
plt.show()
|
This code will generate a scatter plot that shows the clusters and the relationships between them. The different colors in the plot represent the different clusters, with similar colors indicating data points that belong to the same cluster.

Clusters formed by the SpectralCoClustering Algorithm
Conclusion
In this article, we discussed the spectral co-clustering algorithm and how it can be used to find clusters in both rows and columns of a data matrix. We saw that a spectral co-clustering algorithm is a powerful tool for data analysis, as it can uncover hidden patterns and relationships in the data.
We also saw an example of how the spectral co-clustering algorithm can be implemented in Python using the Scikit-Learn library. By applying this algorithm to a dataset, we can find clusters in both rows and columns of the data matrix and visualize the relationships between them. This can be useful for identifying patterns and trends in the data.
Similar Reads
Spectral Clustering in Machine Learning
Prerequisites: K-Means Clustering In the clustering algorithm that we have studied before we used compactness(distance) between the data points as a characteristic to cluster our data points. However, we can also use connectivity between the data point as a feature to cluster our data points. Using
9 min read
K-Means vs K-Means++ Clustering Algorithm
Clustering is a fundamental technique in unsupervised learning, widely used for grouping data into clusters based on similarity. Among the clustering algorithms, K-Means and its improved version, K-Means++, are popular choices. This article explores how both algorithms work, their advantages and lim
6 min read
Comparing Different Clustering Algorithms on Toy Datasets in Scikit Learn
In the domain of machine learning, we generally come across two kinds of problems that is regression and classification both of them are supervised learning problems. In unsupervised learning, we have to try to form different clusters out of the data to find patterns in the dataset provided. For tha
9 min read
Different Types of Clustering Algorithm
The introduction to clustering is discussed in this article and is advised to be understood first. The clustering Algorithms are of many types. The following overview will only list the most prominent examples of clustering algorithms, as there are possibly over 100 published clustering algorithms.
5 min read
ML | Mini Batch K-means clustering algorithm
Prerequisite: Optimal value of K in K-Means Clustering K-means is one of the most popular clustering algorithms, mainly because of its good time performance. With the increasing size of the datasets being analyzed, the computation time of K-means increases because of its constraint of needing the wh
6 min read
Agglomerative clustering with and without structure in Scikit Learn
Agglomerative clustering is a hierarchical clustering algorithm that is used to group similar data points into clusters. It is a bottom-up approach that starts by treating each data point as a single cluster and then merges the closest pair of clusters until all the data points are grouped into a si
10 min read
Hierarchical Clustering with Scikit-Learn
Hierarchical clustering is a popular method in data science for grouping similar data points into clusters. Unlike other clustering techniques like K-means, hierarchical clustering does not require the number of clusters to be specified in advance. Instead, it builds a hierarchy of clusters that can
4 min read
Agglomerative clustering with different metrics in Scikit Learn
Agglomerative clustering is a type of Hierarchical clustering that works in a bottom-up fashion. Metrics play a key role in determining the performance of clustering algorithms. Choosing the right metric helps the clustering algorithm to perform better. This article discusses agglomerative clusterin
4 min read
Is There a Decision-Tree-Like Algorithm for Unsupervised Clustering in R?
Yes, there are decision-tree-like algorithms for unsupervised clustering in R. One of the most prominent algorithms in this category is the Hierarchical Clustering method, which can be visualized in a tree-like structure called a dendrogram. Another notable mention is the "Clustree" package, which h
4 min read
Top 7 Clustering Algorithms Data Scientists Should Know
Clustering is primarily concerned with the process of grouping data points based on various similarities or dissimilarities between them. It is widely used in Machine Learning and Data Science and is often considered as a type of unsupervised learning method. Subsequently, there are various standard
12 min read