R Basics.- First Foray into Text Analysis with R.- Accessing and Comparing Word Frequency Data.- Token Distribution Analysis.- Correlation.- Measures of Lexical Variety.- Hapax Richness.- Do It KWIC.- Do It KWIC (Better).- Text Quality, Text Variety, and Parsing XML.- Clustering.- Classification.- Topic Modeling.- Appendix A: Variable Scope Example.- Appendix B: The LDA Buffet.- Appendix C: Code Repository.- Appendix D: R Resources.- Practice Exercise Solutions.- Index.
"I can't think of a more qualified person to guide readers through powerful R techniques for text analysis. While extremely useful for people studying literature, these techniques can be also used by anybody working with texts. Even if you simply want to understand how companies and data scientists are analyzing all kinds of texts, go through this book." (Lev Manovich, Department of Computer Science, The Graduate Center, City University of New York & author of The Language of New Media) "The open source programming language R has become one of the most central statistical and analytical tool in many sciences. While it has already been used in linguistic applications, this book is the first to discuss the application of (corpus-linguistic and other) methods with R in the context of literary studies. The author covers a wide range of descriptive, analytical, and exploratory methods beautifully and in detail in a book that will appeal to a wide and diverse audience of both students and seasoned researchers from literary studies, linguistic computing, and the digital humanities more generally." (Stefan Th. Gries, Department of Linguistics, University of California, Santa Barbara & author of Quantitative corpus linguistics with R: A Practical Introduction) "This book does a great service for literary scholars interested in computational approaches to text analysis, giving them ready access to powerful methods for exploring patterns and relationships across l arge quantities of text. Its clear and lucid explanations will also make it an easy textbook to teach from, especially for instructors with prior background who can then use it as a stepping stone to introducing more complex methods. Amateurs and those with little programming background will find it imminently accessible." (Hoyt Long, Department of East Asian Languages and Civilizations, University of Chicago) "Through my work as an epidemiologist, I encounter electronic health records in an unstructured form (i.e. text), and Text Analysis with R covers many of the initial steps for studying these records. The book is very accessible; it provides a straightforward introduction to manipulating text information without presuming a background in programming or a familiarity with the jargon used in this field. I also appreciated Jockers' thoughtful inclusion of supplemental explanations and information in footnotes throughout the book. For example, text analysis often involves the use of "regular expressions"; a footnote concisely explains wildcard and escape characters and this explanation spared me a fair bit of confusion in my own work. Although I am not a "student of literature", I thought the book contained many generalizable and expertly-taught lessons that make it a valuable introduction to manipulating and analyzing text." (Matthew Maenner, Ph.D.) "This book is a worthy introduction to computational text analysis, and it fills an important gap in the literatur e. It's very accessible and contains plenty of interesting examples and real applications, which have been collected and crafted over the many years the author taught text analysis to undergraduate and graduate students. Although it focuses on the study of literature, I would highly recommend this book to students in business administration and related fields." (Joao Quariguasi Frota Neto, School of Management, University of Bath)
The author, Matthew L. Jockers, is Associate Professor of English and Director of the Nebraska Literary Lab at the University of Nebraska in Lincoln. Jockers's text mining research has been featured in the New York Times, Nature, the Chronicle of Higher Education, Wired, New Scientist, Smithsonian, NBC News and many others. Jockers blogs about his research at www.matthewjockers.net.
"The aim of this book is ... to give the Literature students just the most basic tools needed to do some relatively straightforward textual analysis. ... Even though this is primarily a book intended for literature students, I would actually strongly recommend it to anyone interested in text mining, text analysis and natural language processing. It is a very gentle and approachable introduction to the whole world of textual analysis." (Bojan Tunguz, tunguzreview.com, July, 2015) "This is a well written book on the topic of Text Analysis. There is enough information to give you a good start using R. Followed by easy to understand details about text analysis. ... This is a good book to have if you are doing text analysis." (Mary Anne, Cats and Dogs with Data, maryannedata.com, August, 2014) "A remarkably well-crafted book that will allow students to get a quick start and progress toward quite sophisticated text mining tasks. ... exercises provided at the end of each chapter, with solutions at the end of the book, should serve well to help students solidify their knowledge and gain more confidence in their text mining skills. ... a great addition to the libraries of digital humanists and natural language enthusiasts who wish to expand their programming literacy ... ." (Denilson Barbosa, Computing Reviews, August, 2014)