Here's the first focused book that puts the full range of cutting-edge biological text mining techniques and tools at your command. This comprehensive volume describes the methods of natural language processing (NLP) and their applications in the biological domain, and spells out in detail the various lexical, terminological, and ontological resources now at your disposal - and how best to utilize them. You see how terminology management tools like term extraction and term structuring facilitate effective mining, and learn ways to readily identify biomedical named entities and abbreviations. The book offers step-by-step guidance to implement various information extraction methods for biological applications, from pattern matching and full parsing approaches to sublanguage- and ontology-driven extraction techniques. It discusses strategies to make the most of text collections and to use corpora and corpus annotation efficiently in text mining applications, and also gives you tested guidelines for evaluating and optimizing text mining systems. Rounding out the volume are techniques for integrating text mining and data mining efforts to further facilitate biological analyses. Both a critical review of the state of the art and a solution-focused guide packed with field-tested expertise and advice, this first-of-its-kind work will prove indispensable whether you're long experienced with text mining from biomedical literature or entirely new to the field.
Table Of Contents
Introduction to Text Mining for Biology and Biomedicineë Text Mining: Aims, Challenges and Solutions. Outline of the Book. References.; Levels of Natural Language Processing for Text Mining ëIntroduction. The Lexical Level of Natural Language Processing. The Syntactic Level of Natural Language Processing. The Semantic Level of Natural Language Processing. Natural Language System Architecture for Text Mining. Conclusions and Outlook. References.; Lexical, Terminological and Ontological Resources For Biological Text Mining ë Introduction. Extended Example. Lexical Resources. Terminological Resources. Ontological Resources. Issues Related to Entity Recognition. Issues Related to Relation Extraction. Conclusion. References.; Automatic Terminology Management in Biomedicine ëIntroduction. Terminological Resources in Biomedicine. Automatic Terminology Management. Automatic Term Recognition. Dealing with Term Variation and Ambiguity. Automatic Term Structuring. Examples of Automatic Term Management Systems. Conclusion. References.; Abbreviations in Biomedical Text ëIntroduction. Identifying Abbreviations. Normalizing Abbreviations. Defining Abbreviations in Text. Abbreviation Databases. Conclusions. References.; Named Entity Recognition ëIntroduction. Biomedical Named Entities. Issues in Gene/Protein Name Recognition. Approaches to Gene and Protein Name Recognition. Discussion. Conclusion. References.; Information Extraction ë Information Extraction: The Task. The Message Understanding Conferences. Approaches to Information Extraction in Biology. Conclusion. References.; Corpora and their Annotation ëIntroduction. Literature Databases in Biology. Corpora. Corpus Annotation in Biology. Issues on Manual Annotation. Annotation Tools. Conclusion. ; Evaluation of Text Mining in Biology ëIntroduction. Why Evaluate? What to Evaluate? Current Assessments for Text Mining in Biology. What Next? References.; Integrating Text Mining with Data Mining ëIntroduction: Biological Sequence Analysis and Text Mining. Gene Expression Analysis and Text Mining. Conclusion. References.;
Sophia Ananiadou is deputy director of the National Centre for Text Mining and a reader in text mining at the School of Informatics at the University of Manchester.
John McNaught is associate director of the National Centre for Text Mining and a lecturer in informatics at the University of Manchester.