Authors: Juliane Fluck, Sumit Madan, Sam Ansari, Justyna Szostak, Julia Hoeng, Marc Zimmermann, Martin Hofmann-Apitius, Manuel C. Peitsch
Abstract: In order to extract networks for systems biology from the literature an UIMA based extraction workflow using various named entity recognition processes and different relation extraction methods has been composed. The Unstructured Infor-mation Management architecture (UIMA) is a Java-based framework that allows assembling complicated workflows from a set of NLP components. The new system is processing scientific articles and is writing the open-access biological expression language (OpenBEL) as output. OpenBEL is a machine and human readable language with defined knowledge statements that can be used for knowledge representation, causal reasoning, hypothesis generation, and assembling causal biological network models to enable reliable quantification of perturbations within these networks. In order to curate the automatically derived OpenBEL statements, our workflow integrates a curation interface that provides access to BEL statements generated by text mining and that integrates supporting information to facilitate manual curation. By using the semi-automated curation pipeline, expert time to model relevant causal relationships in BEL could be significant reduced. In this paper the UIMA workflow and the key features of the curation interface are described.