FPF-SB: a Scalable Algorithm for Microarray Gene Expression Data Clustering

Filippo Geraci13, Mauro Leoncini2, Manuela Montangero2, Marco Pellegrini1 and M. Elena Renda1


1 Istituto di Informatica e Telematica (IIT)
Consiglio Nazionale delle Ricerche (CNR)
I-56100
Pisa (PI) ITALY

2 Dipartimento di Ingegneria dell'Informazione
University of Modena e Reggio Emilia
I-41100
Modena (MO) ITALY

3 Dipartimento di Ingegneria dell'Informazione
University of Siena
I-53100
Siena (SI) ITALY

Contacts:
Filippo.Geraci_AT_iit.cnr.it
Mauro.Leoncini_AT_unimo.it
Manuela.Montangero_AT_unimo.it
Marco.Pellegrini_AT_iit.cnr.it
Elena.Renda_AT_iit.cnr.it


Abstract. Efficient and effective analysis of large datasets from microarray gene expression data is one of the keys to time-critical personalized medicine. The issue we address here is the scalability of the data processing software for clustering gene expression data into groups with homogeneous expression profile. In this paper we propose FPF-SB, a novel clustering algorithm based on a combination of the Furthest-Point-First (FPF) heuristic for solving the kcenter problem and a stability-based method for determining the number of clusters k. Our algorithm improves the state of the art: it is scalable to large datasets without sacrificing output quality.

Full article in PDF

 


BibTex

@InProceedings{Geraci_et_alHCI07,
author = "Geraci, Filippo and Leoncini, Mauro and Montangero, Manuela and Pellegrini, Marco and Renda, M. Elena",
title = "FPF-SB: a Scalable Algorithm for Microarray Gene Expression Data Clustering",
booktitle = "Proc.\ of the 12th International Conference on Human-Computer Interaction (HCI'07)",
address = "Beijing, P.R. China",
year = "2007",
publisher = "LNCS",
number = "12",
pages = "606--615"
}