- researchers
- Main menu
Literature Mining
Literature Mining (LM) is the process of applying data mining techniques to databases of published literature. Literature Mining aims to help users extract more easily than traditional search engines the "hidden" knowledge that exists in extremely large bodies of literature. The idea of Literature Mining is that the user is in a "discovery mode" and is looking for interesting connections between seemingly disparate chunks of knowledge that may help solve a task at hand.
LM can also be seen as a kind of a "overview first - details next" approach to search whereby the user is first given a summary of relevant existing knowledge and can then drill down to the supporting evidence (usually in the form of the underlying publications).
In ACGT the LM services being offered are based on 2 principles:
- Domain terms like "Wilm's Tumor", "apoptosis" and "BRCA1" are organized into categories such as "disease", "pathway" and "gene" respectively that are instantly recognizable by practitioners in the life sciences. This organization of terms into categories allows the user to perform searches without having to specify every time exactly which "gene" they are interested in but simply that they are interested in genes in general. In this way for example a medical doctor looking for potential markers for a disease may ask the ACGT LM service a question such as "which genes are most closely related to breast cancer" which in a Boolean search engine he/she would not be able to.
- Terms are considered to have "some" interesting connection if they are mentioned in at least one publication together. The details of the connection will only become obvious once the paper itself is read and also different connections might carry a different weight depending on the exact context of the task at hand.
The ACGT Literature Mining service currently offers the following search capabilities:
- For any given search term (such as "breast cancer") it returns related terms that are of a user specified category (e.g. pathways) as a ranked list
- For any set of terms it returns all the relevant publications mentioning those terms
- Publication trends for the literature that is related to any set of terms
Specification of the ACGT Literature Mining Service
The ACGT LM service is being maintained and expanded on a permanent basis. The following are the latest specification of this service:
Categories of terms covered
The ACGT LM service recognizes 7 classes of terms:
- Genes
- Diseases
- Pathways
- Anatomical Locations
- Cell Lines
- Experiments
- Drugs
Database Coverage
- Medline Abstract form 1975 - present
- Full text of selected top ranking ISI journals such as Nature, Science, Cell, Nature Biotechnology and others.