Products and services: GridR
News on the latest products and services in ACGT?s area of interest
GridR
GridR is an analysis tool based on the statistical environment R
(http://www.r-project.org/). GridR supports the use of the collection of methodologies available as R packages, in a grid environment. Within the ACGT project the aim of GridR is to provide a powerful framework for the analysis of clinico-genomic trials involving large amount of data (e.g. multilevel data from microarray-based clinical trials).
The R environment provides a broad range of state-of the-art statistical, graphical techniques and advanced data mining methods including comprehensive packages for linear and non-linear modelling, cluster analysis, prediction, hypothesis tests, resampling, survival analysis and time-series analysis. It is easily extensible and has turned out to be the de facto standard for statistical research and many applied statistics projects, especially in the biomedical field. The associated project BioConductor addresses the needs of the biomedical and biostatisticians community for genomic data-analysis oriented R packages. Numerous methods available as R/BioConductor packages that were considered experimental a few years ago are now accepted as standard in the analysis of high throughput genomic data.
In May 2008, a first beta version of the GridR R package will be published by ACGT and offered as open source. The package contains functions and libraries that allow the user to access and make use of a distributed environment in a transparent way from a client side R environment. Based on the technology of call-backs, active bindings and paring of error code, the package will provide the functionality of remote function execution in distributed environments. More specifically, different modes for the submission and execution of the computation will be supported in order to allow the user to work with different environments. For example, computations will be able to be submitted via SSH or via web services and executed directly on a remote machine, on a condor cluster or be forwarded to a GT4 machine. The functionality of using parameter sweeps when executing functions remotely will also be provided.
The availability of GridR will be of great use to clinicians and clinical-data analysts interested in computationally heavy data-mining, such as re-sampling techniques, full cross-validation of classifiers or meta-analyses.
With the GridR R package in its current version users can perform computations in distributed environments of different architectures from their local R environment in a nearly transparent way.