Statistical analysis

Over the past decade, the high-throughput measurement techniques brought about a revolutionary change in the field of molecular biology and started a large-scale development in many areas such as genomics, proteomics, metabolomics. The development of "omics" also opened the way for personalized medicine, however, a significant amount of research and development is needed to reach its full potential.
First, it is essential to develop new modeling techniques within systems biology that are capable of describing models from cellular level dynamical models to the level of clinical descriptors, and are able to integrate the extracted knowledge. Accordingly, the methods of knowledge representation have to cope with the large number of entities and their relations, the multitude of information sources, and the representation of uncertainty. These issues are actively researched areas, where are many emerging solutions, such as probabilistic knowledge bases.

Second, statistical analysis methods are required that are able to handle the large number of variables (up to tens of thousands of variables) under a relatively small sample size (100-1000), and allow the utilization of background knowledge (i.e. prior knowledge). Classical statistical methods alone are not sufficient, since the problem of multiple testing arises, and due to the applied corrections often none of the results will be statistically significant. Bayesian network based Bayesian statistical methods provide a solution to this problem. These methods are based on the learning of complex models from data, which enables a hypothesis free, knowledge-rich analysis. Although the learning of complex models avoids the multiple testing problem, another phenomenon, the so called curse of dimensionality, arises. That is, the more parameters there are in the model (i.e. the greater the dimension is), the more samples are needed to create a sufficiently accurate model. (The insufficient number of samples may in turn make the model more sensitive to the bias-variance tradeoff.) Fortunately, this problem can be managed by the Markov chain Monte Carlo simulation-based Bayesian model-averaging. This method makes it possible to handle the certainty-uncertainty of individual entities (in the form of probability values) in a coherent framework throughout the processes of study design, data analysis and decision support. Furthermore, it is also suitable for causal modeling, and for the integration of background knowledge.
These knowledge-rich analysis methods are typically computation-intensive, various HPC and HTC systems may enhance their implementation