
/cdn.vox-cdn.com/uploads/chorus_image/image/39123192/openingbell500.0.jpg)
Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other organism-specific corpora of text. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database.
#Textspresso full#
Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Full text access increases recall of biological data types from 45 % to 95%. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. The current ontology comprises 33 categories of terms. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. Together they form a catalog of types of objects and concepts called an ontology. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.).

Textpresso’s two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched.

We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine.
