As a data miner, I develop and apply data mining scenarios using various machine learning techniques. However, these last years I shift more and more to bioinformatics and applied statistics.
In bioinformatics and applied statistics, my focus is on the analysis of epigenetic variations such as methylation profiles. I am, too, interested by the construction of weighted gene correlation networks which may enable to relate groups of correlated genes to clinical traits.
In the domain of unsupervised learning, I performed many research to address the task of identifying homogeneous subtypes in data by cluster analysis [1][2][3][4][5]. The main result is an R package called SDisc that is available from the CRAN (see Identifying homogeneous profiles in data).
I also researched the field of supervised learning in text classification. I performed extensive experiments to assess and compare the behaviors of Support Vector Machines, naive Bayes and k-NN in sparse bag of words feature spaces [3][6][7][8].
References
- Colas, F, Meulenbelt I, Houwing-Duistermaat JJ, Kloppenburg M, Watt I, van Rooden SM, Visser M, Marinus J, Cannon EO, Bender A et al..
2008. A Scenario Implementation in R for SubtypeDiscovery Examplified on Chemoinformatics Data. Leveraging Applications of Formal Methods, Verification and Validation, Communications in Computer and Information Science. 17:669-683. - Colas, F, Meulenbelt I, Houwing-Duistermaat JJ, Kloppenburg M, Watt I, van Rooden SM, Visser M, Marinus H, van Hilten JJ, Slagboom EP et al..
2008. Stability of Clusters for Different Time Adjustments in Complex Disease Research. 30th Annual International IEEE EMBS Conference (EMBC'08), Vancouver, British Columbia, Canada. - Colas, F.
2009. Data Mining Scenarios for the Discovery of Subtypes and the Comparison of Algorithms. - Citekey colas09discriminative not found
- Colas, F.
2009. R SDisc (vignette): Integrated methodology for the identification of homogeneous profiles in data distribution. - Colas, F, Paclík P, Kok JN, Brazdil P.
2007. Does SVM Really Scale Up to Large Bag of Words Feature Spaces? IDA2007 Intelligent Data Analysis, Ljubljana, Slovenia. :296-307. - Colas, F, Brazdil P.
2006. Comparison of SVM and some Older Classification Algorithms in Text Classification Tasks. IFIP-AI 2006 World Computer Congress. :169-178. - Colas, F, Brazdil P.
2006. On the Behavior of SVM and Some Older Algorithms in Binary Text Classification Tasks.. TSD2006, Text Speech and Dialogue, Brno, Czech Republic. :45–52.