Text Analysis with R for Students of Literature by Matthew L. Jockers
Author:Matthew L. Jockers
Language: eng
Format: epub, pdf
Publisher: Springer International Publishing, Cham
10.2 The Text Encoding Initiative (TEI)
The Text Encoding Initiative (TEI) offers a document-encoding standard that is commonly used by humanities scholars. The TEI markup scheme provides a way of storing an original text file alongside an almost infinite amount of metadata. Since the files are extensible and editable, the amount of metadata available is only limited by the encoder’s willingness to modify the documents. Say for example, you are collecting novels written by Irish– and German–American authors. For this project you might have a metadata field in your document where you can indicate the author’s national origins. You may have another field where you indicate the author’s gender, or birth date, or race, or sexual orientation. Once metadata of this sort is added to the XML files, it can be easily accessed by computer scripts and used, for example, as a sorting facet for a particular type of analysis.
In the rest of this book, you will be working with a corpus of texts that are encoded in TEI compliant XML. Unlike the plain text files (Moby Dick and Sense and Sensibility) that you have processed thus far, these TEI-XML files contain extra-textual information in the metadata of the <teiHeader> element. To proceed, you must be able to parse the XML and extract the metadata while also separating out the actual text of the book from the marked up apparatus around the book. You need to know how to parse XML in R.
Download
Text Analysis with R for Students of Literature by Matthew L. Jockers.pdf
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Modelling of Convective Heat and Mass Transfer in Rotating Flows by Igor V. Shevchuk(6222)
Weapons of Math Destruction by Cathy O'Neil(5827)
Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think by Hans Rosling(4487)
Descartes' Error by Antonio Damasio(3164)
A Mind For Numbers: How to Excel at Math and Science (Even If You Flunked Algebra) by Barbara Oakley(3102)
Factfulness_Ten Reasons We're Wrong About the World_and Why Things Are Better Than You Think by Hans Rosling(3046)
TCP IP by Todd Lammle(3011)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2907)
Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb(2860)
The Tyranny of Metrics by Jerry Z. Muller(2846)
The Book of Numbers by Peter Bentley(2779)
The Great Unknown by Marcus du Sautoy(2536)
Once Upon an Algorithm by Martin Erwig(2473)
Easy Algebra Step-by-Step by Sandra Luna McCune(2467)
Lady Luck by Kristen Ashley(2410)
Practical Guide To Principal Component Methods in R (Multivariate Analysis Book 2) by Alboukadel Kassambara(2379)
Police Exams Prep 2018-2019 by Kaplan Test Prep(2355)
All Things Reconsidered by Bill Thompson III(2261)
Linear Time-Invariant Systems, Behaviors and Modules by Ulrich Oberst & Martin Scheicher & Ingrid Scheicher(2231)
