Definition

Co-clustering is an unsupervised learning technique that simultaneously groups both elements (rows) and features (columns) of a data matrix.

Imagine having a table where each row represents a document and each column represents a word. Each cell in the table indicates how many times that word appears in the corresponding document.

Traditionally, clustering groups only one dimension:

  • Document clustering → we group similar documents together
  • Word clustering → we group words that often appear together

Co-clustering, on the other hand, simultaneously groups both documents and words, finding associations between groups of documents and groups of words.

Example

-ActionShotExplosionRomanticLove
Movie 155400
Movie 264501
Movie 300145
Movie 410166
Co-clustering helps us detecting:
  • A group of action reviews (Review 1 and 2), related to the words Action, Shooting, Explosion
  • A group of romantic reviews (Review 3 and 4), related to the words Romantic, Love, Sweet

In practice, co-clustering helps us understand which groups of documents share relevant groups of words. This can be useful in many applications, such as content filtering or text data analysis.


References