BioMiner

BioMiner software

BioMiner is a powerful graphical tool for defining, extracting and manipulating putative transcription modules from large scale gene expression data. Such modules are sets of genes that are over- or under-expressed in a set of biological situations. This Java program allows to realize all the necessary steps to extract transcription modules in an integrated framework running on Windows, Linux and Mac OS. BioMiner enables to define the transcription modules with parametrical constraints which help the expert of the data to understand the biological mechanisms involved in its studied phenomena.

Microarray technology enables researchers to measure gene expression at a hight throuput level which leads to a new challenge: identifying transcription modules. A transcription module is a set of genes and a set of biological situations such that the genes are over- or under-expressed in the biological situations. Such modules can indicate that the set of genes is involved in the same biological mechanism and then give interesting hypothesis for the biologist.

There exist several bioinformatics techniques that enable to isolate transcription modules from gene expression data, either thanks clustering techniques (Eisen et al. 98, Jiang et al. 2001), or thanks to parametrized techniques that computes collection of local patterns (Ihmels et al. 2004, Cheng et al. 2000, Lazzeroni et al. 2000). Thanks to the declarative definition of the local patterns, algorithms can extract all the local patterns, i.e., the transcription modules, satisfying some useful constraints for the end-users. Both the completeness and correctness are the main advantages of these methods which ensure the quality of the results and that all the modules defined with the constraints are discovered. To be useable in practice, some additional tools are needed. BioMiner proposes an integrated software containing at the same time solvers to extract the modules and the additional tools. We propose to use different local pattern types called formal concepts, fault-tolerant patterns and delta-free on boolean matrices. For that matter, BioMiner proposes different way to define over- or under-expression, i.e., to encode the boolean matrix.

Features

BioMiner is a system which enables to easily define, extract and manipulate putative transcription modules from large scale gene expression data. BioMiner is implemented in Java and thus is platform independent.

The central features of BioMiner enables to realize all the necessary steps from normalized raw gene expression data to putative transcription modules. The whole process is made in five steps: loading the data files, encoding the numerical values into boolean properties (e.g. up/down regulated, strong variation), selecting the genes and biological situations to process, fixing the parameters of the constraints necessary to compute the local patterns and then compute them, and finally selecting the most appropriated transcription modules by a querying process.

The encoding (discretization) step enable to specify the type of gene expression property we are interested in. BioMiner proposes to encode the over-expressionn, the under-expression or the strong variation. The user fixes the percentage of the maximum (respectively minimum) value on biological situation per gene that is used to encode the over-expression (resp. under-expression). For the strong variation, both processes are done.

In the following step, BioMiner gives an interface that easily enables to select the genes and the biological situations of interest for the extraction of transcription modules. This selection is facilitate by the information attached on each gene and biological situation. Files containing information (e.g. names) on genes and on biological situation are asked during the loading step.

Then, man can choose the type of transcription modules that would be extracted. BioMiner proposes three different types of transcription modules, i.e., local patterns. Each method corresponds to a specific algorithm:

  • D-Miner (Besson et al. 2005) extracts transcription modules such that all its genes are over- or under-expressed in all the biological situations of the module. Furthermore, the modules are maximal, i.e., none additional gene or situation can be added. To improve the relevancy of the extracted transcription modules, the user can add some constraints. First he can specify the minimal size of each component of the bi-set, and also he can enforce some genes or situations to belong to it. These patterns corresponds to formal concepts.
  • DR-Miner (Besson et al. 2006) extracts fault-tolerant formal concepts. These patterns are similar to previous modules except that some faults are accepted. It means that modules can contain some genes which are not over- or under-expressed in some situations. The number of accepted faults per gene is bounded by the alpha parameter and vice-versa.
  • Ac-like (Pensa et al. 2005) extracts modules such that at most delta of the genes are not over- or under-expressed for each situation of the module. Moreover, the modules contain at least x biological situations (x is noted support).

The last step enables to retrieve most important transcription modules by means of a querying process.