Products and Services
Literature Based Discovery (Andreas Persidis - Biovista)
Literature-based discovery (or “LBD’ for short) is a relatively new approach to knowledge discovery. It makes the assumption that by connecting seemingly disparate chunks of knowledge from within a relatively large corpus of scientific articles or other textual resources, it is possible to create new knowledge that does not exist in the original corpus. LBD is similar to data-mining, the difference being that while the latter deals with large bodies of numeric data, the former uses running text as it primary source.
LBD has many applications, including the identification of biomarkers, predicting Adverse Events (AE) and finding new uses for existing drugs or compounds. Within ACGT, partner Biovista has developed versions of its proprietary LBD platform that are compatible with the basic ACGT infrastructure and can either be integrated in any workflow created by the ACGT Workflow Editor or incorporated in any other user application.
Figure 1: Basic architecture of Literature-based discovery platform
The figure 1 above sows the basic architecture of the system. Information extraction algorithms read scientific articles that are downloaded from Medline on a regular basis ensuring the system is always up to date. Extracted information consists of about 25 classes of biologically relevant concepts, such as genes and pathways. Once extracted, these concepts are cross-correlated amongst themselves and these relationships stored in a custom-design database that ensures very high response rates. This database is then queried either by the LBD application or the LBD functions that are accessible via the ACGT infrastructure.
LBD accuracy
As with any predictive system, one of the main concerns is its predictive accuracy; in other words how confident we can be in the output of such a system.
To address this question a study was carried out and will be reported in reference 1. The study looked at Biovista’s LBD platform for predicting AEs before clinical trials, using abstracts from PubMed as the primary raw data source. Using a description of the mode of action (MoA) of a drug as the starting point, we compared it to the MoA underlying all AEs, for similarities. The dataset was 66 unique drugs, of which 61 were oncology, 7 were neurology, and three were both, where the AEs were reported at the American Society of Clinical Oncology (ASCO) annual meeting in 2007 and the American Academy of Neurology (AAN) annual meeting in 2008, respectively. The primary focus was oncology, where our sample covered 87% of the MoAs of all FDA-approved cancer drugs. Using data from 1997 to 2007 divided into five time points, and a total of 881 measurements, a mean of 79±22% of AE prediction was achieved. A similar AE prediction rate of 79±28% was achieved in the small neurology sample, in an additional 97 measurements (978 in total). We also found that when using data that pre-date any publication on a drug by five years, literature-based analytics predicted 72% of its AEs. The figure 2 shows how the predictive accuracy of the platform varies as a function of time (ie available data).
Figure 2: Predictive accuracy versus time
Uses of LBD
To date Biovista has used its LBD platform to successfully reposition 10 drugs and has applied for patents in all these cases. For example its BVA201 drug has shown positive efficacy results in a pre-clinical trial for Multiple Sclerosis (see
http://www.biovista.com/news.php?article_id=136) while its BVA-601 drug has shown positive efficacy results in a pre-clinical trial for Epilepsy (see http://www.biovista.com/news.php?article_id=132&year=2009).
LBD in ACGT
For ACGT, Biovista has made available a number of basic functions that support literature mining. These functions offer some of the basic analytics that are required to support a literature-based discovery process. The functions are directly accessible via a published API and can be combined into more complex functions to perform even more advanced LBD tasks. Additional information can be found at the ACGT site.
References
“Pro-active drug safety: combining existing data in new ways to predict serious adverse events of drugs.” Spyros N. Deftereos*, et al, [in review]