Rachel Buurma DH Seminar
Contents
Questions
- what training do we need? My sense is some coding but more stats.
- for teaching: how do we know when it's worked?
- what role did this sort of work play in your graduate studies? How do you envision us integrating projects like this with more traditional and institutionally legible graduate work?
Early Novels Database
Dataset
- https://github.com/earlynovels/end-dataset
- It consists of MARC catalog records enriched with custom subfields designed to offer new kinds of structured data about early fiction in English. The END dataset is comprised of high-quality, human-generated metadata that captures a much fuller range of edition- and copy-specific information about early novels than traditional library catalog records.
Readings
Moretti
- 135: the title is where the novel as language meets the novel as commodity
- 136: quantitative stylistics...strategies by which titles point to specific genres
- 139: [the shortening of C18 novel titles over time] ...as the number of new novels kept increasing, each of them had inevitably a much smaller window of visibility on the market, and it became vital for a title to catch quickly and effectively the eye of the public. Summaries were not good at that.
- 141: The market expands, and titles contract.
- Titles allow us to see a larger literary field...the first thing we see, at this moment in hsitry, is the force of the market: how its growth creates a major constraint on the presentation of novels.
- But the trouble is, we literary historians don't really know how to think about what is frequent and small and slow; that's what makes it so hard to study the literary field as a whole: we must learn to find meaning in small changes and slow processes.
- 145: "A large change in size inevitably carries with it a change of form," wrote JBS Haldane, and here one sees how right he was: a title with 20 words and one with two are not the same creature...Different styles...
- 146: ...the adjective does not specify the semantic field; it transforms it.
- 151: That titles became short is interesting, but in the end, so what? That by becoming short they adopted a signifying strategy that made readers look for a unity in the narrative structure - this is a perceptual shift which has persisted for two hundred years. And mediocre conservative writers did more to make it happen than anything else.
- 158: There are differences, of course, between hte history of nature and that of culture: the "fossils" of literary evolution are often not lost, but carefully preserved in some great library, like most of those 7000 novels I have discussed here; but for the purposes of our knowledge, it's as if they too had crumbled into dust, because we have never really tried to read the entire volume of the literary past. Studying titles is a small step in that direction.
Underwood
http://dhdebates.gc.cuny.edu/debates/text/95
- In fact, as Katherine Bode has noted, the questions posed by distant readers are often continuous with the older tradition of book history (Reading); as Jim English has noted, they are also continuous with the sociology of literature (“Everywhere”).
- Distant reading is better understood as part of a broad intellectual shift that has also been transforming the social sciences. The
- In the twentieth century, the difficulty of representing unstructured text divided the quantitative social sciences from the humanities. Sociologists
- But much of the momentum it acquired over the last decade came from the same representational strategies that are transforming social science. Instead of simply counting words or volumes, distant readers increasingly treat writing as a field of relations to be modeled, using equations that connect linguistic variables to social ones
- Conversation of this kind amounts to an empty contest of slogans between the humanities and social sciences, and I think Thomas Piketty spends the right amount of time on those contests: “Disciplinary disputes and turf wars are of little or no importance” (Capital, 33).
- A grad student could do a lot of damage to received ideas with a thousand novels, manually gathered metadata, and logistic regression.
- What really matter, I think, are not new tools but three general principles. First, a negative principle: there’s simply a lot we don’t know about literary history above the scale of (say) a hundred volumes. We’ve become so used to ignorance at this scale, and so good at bluffing our way around it, that we tend to overestimate our actual knowledge.6 Second, the theoretical foundation for macroscopic research isn’t something we have to invent from scratch; we can learn a lot from computational social science. (The notion of a statistical model, for instance, is a good place to start.) The third thing that matters, of course, is getting at the texts themselves, on a scale that can generate new perspectives. This is probably where our collaborative energies could most fruitfully be focused. The tools we’re going to need are not usually specific to the humanities. But the corpora often are.
Goldstone, "Teaching Quant Methods"
- His lessons:
1. Cultivating technical facility with computer tools—including programming languages—should receive less attention than methodologies for analyzing quantitative or aggregative evidence. Despite the widespread DH interest in the former, it has little scholarly use without the latter. 2. Studying method requires pedagogically suitable material for study, but good teaching datasets do not exist. It will require communal e ort to create them on the basis of existing research. 3. Following the “theory” model, DH has typically been inserted into curricula as a single-semester course. Yet as a training in method, the analysis of aggregate data will undoubtedly require more time, and a di erent rationale, than that o ered by what Gerald Graff calls “the eld-coverage principle” in the curriculum.
Allison, "Other People's Data"
http://culturalanalytics.org/2016/12/other-peoples-data-humanities-edition/
- Computational analysis of large corpora is a time-consuming process, and a lot of analysis ends up on the cutting room floor (or on the blog, or in a footnote or an appendix). We need to make better use of that discarded data[.]
- A topic model of selected criticism is something like an argument and something like an archive. Knitting it into a history as recognizable as Graff's transforms the "data" back into argument, which might be built on or expanded in more traditionally argumentative ways.
- really interesting discussion of Tait's being excluded from Underwood/Sellers model: "In the second, the study of volumes reviewed by nineteenth-century periodicals looks very much like a conventional periodical study, except that the foundational insight that frames it is drawn from a stylistic observation of great scope."
- It's time to reconsider what it means to build on other people's work.
Goldstone, "From Reproducible to Productive"
- Her essay [Allison's, above] describes a vision of cumulative research which I hope to see realized. Of course, it is particularly gratifying that the research she envisions building upon includes my own. But the real point, for Allison, is not that Ted Underwood and I said something convincing but that we—like Underwood and Jordan Sellers in their collaboration—produced reusable evidence.4 She provokes us to rethink the conditions in which such reuse could be possible for individual researchers and valued in our disciplines.
- the uses of the "byproducts of cultural analytics"
- But a probabilistic topic model ought normally to be understood like any other statistical model, as a selective picture of data rather than primary data itself.
- Either quantitative studies of culture will make claims that can be defeated by evidence, or they will devolve into games with computers.
MLA Digital Pedagogy
Houston, Text Analysis
- In humanities research, these steps are often iterative and recursive and are rarely labeled as hypothesis, data collection, experimentation, analysis, and argument. Instead, all of these things are called reading. This conflation of very different activities under one word has heightened recent debates between data driven approaches to large scale analysis, what Franco Moretti has termed distant reading, and the traditional formalist and hermeneutic approach called literary close reading (Moretti, Trumpener, Goodwin and Holbo). If reading is often hailed as a specific kind of pleasurable, human activity, the term text analysis may seem in contrast to emphasize statistical approaches to quantifiable aspects of language (Hoover; Jockers 25). The specific disciplinary and institutional histories of computer-assisted text analysis, humanities computing, and computational linguistics variously intersect and diverge from those of literary studies more generally (Rockwell, Jockers, Ramsay 2011, Bonelli).
- But other scholars have argued that computational analysis merely makes explicit the codes and rules already embedded in the nature of textuality itself. Michael Witmore explains: "¶ 7Leave a comment on paragraph 70 I would argue that a text is a text because it is massively addressable at different levels of scale. Addressable here means that one can query a position within the text at a certain level of abstraction."
- unclear on addressable?
Klein, Code
- It follows, then, that any instructor—-in the humanities or otherwise-—must first ask herself what she hopes her students will accomplish by learning to code. Is it an understanding of how to think algorithmically, so as to better comprehend how certain tasks can be abstracted into a series of steps? Is it a familiarity with the basic components of programming languages, so as to be able to understand how code is structured and produced? Is it the knowledge of a specialized programming language, one with specific applications in a particular field? Or is it the more experiential knowledge of what it feels like to move from defining functions and assigning variables to running executable code?
- But a pedagogy of code in the humanities must also explore the intersection of the technical and the theoretical as expressed in or through code itself-— for instance, how we might locate certain literary or cultural concepts at work in the structure or syntax of code; or how we might ourselves write code that advances an argument, embodies a theory, or otherwise performs humanistic critique.
Forster, Fiction
- Teaching fiction often means trying to understand this peculiar power: how does the fiction—the ostensibly “untrue”—nevertheless become powerful through its representations? In practice, this involves questions of narrative and plot, a sensitivity to form and style, and the histories (material and conceptual) of all of these terms. Why does a story feel like it has a sense of closure? When does it not? And what are the meanings—political and otherwise—of such closure? What are the devices by which we come to recognize and view a story from a particular perspective? How does a single work of fiction’s participation in a genre shape our expectations and our understanding of the story? Moreover, studying fiction also invites reflection on medium; as the “same” story moves across genres—a process amply illustrated, for instance, by the many versions (dramatic, illustrated, filmed, etc) of Bram Stoker’s Dracula which began appearing soon after its initial publication. Teaching of fiction, therefore, requires weaving together close attention to the histories and textures of particular works, with larger threads: longer political and social histories; the rise and fall of genres; and narrative structures broader than any single medium or genre.
Look up
- Andrew Goldstone and Ted Underwood, "The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us," New Literary History 45, no. 3 (2014): 359-384.
- Underwood and Sellers, "How Quickly Do Literary Standards Change?," 32: https://figshare.com/articles/How_Quickly_Do_Literary_Standards_Change_/1418394