Revision as of 12:43, 27 April 2017

Questions

what training do we need? My sense is some coding but more stats.
- Goldtone "teaching" echoes this
for teaching: how do we know when it's worked?
what role did this sort of work play in your graduate studies? How do you envision us integrating projects like this with more traditional and institutionally legible graduate work?
More conceptual: What ways might you see for using this type of data to understand the relationship between bibliographical and narrative form? We can talk anecdotally about the effect serialization has on Bleak House as a narrative, but how might a dataset like END (or quantitative methods I'm unfamiliar with) help to understand that relationship more broadly?

Early Novels Database

Dataset

https://github.com/earlynovels/end-dataset
It consists of MARC catalog records enriched with custom subfields designed to offer new kinds of structured data about early fiction in English. The END dataset is comprised of high-quality, human-generated metadata that captures a much fuller range of edition- and copy-specific information about early novels than traditional library catalog records.
- Making it easier to plug back in and, no matter the problems, library standards like MARCXML are very durable (though clunky)

Readings

Moretti

135: the title is where the novel as language meets the novel as commodity
136: quantitative stylistics...strategies by which titles point to specific genres
139: [the shortening of C18 novel titles over time] ...as the number of new novels kept increasing, each of them had inevitably a much smaller window of visibility on the market, and it became vital for a title to catch quickly and effectively the eye of the public. Summaries were not good at that.
141: The market expands, and titles contract.
Titles allow us to see a larger literary field...the first thing we see, at this moment in hsitry, is the force of the market: how its growth creates a major constraint on the presentation of novels.
But the trouble is, we literary historians don't really know how to think about what is frequent and small and slow; that's what makes it so hard to study the literary field as a whole: we must learn to find meaning in small changes and slow processes.
145: "A large change in size inevitably carries with it a change of form," wrote JBS Haldane, and here one sees how right he was: a title with 20 words and one with two are not the same creature...Different styles...
146: ...the adjective does not specify the semantic field; it transforms it.
151: That titles became short is interesting, but in the end, so what? That by becoming short they adopted a signifying strategy that made readers look for a unity in the narrative structure - this is a perceptual shift which has persisted for two hundred years. And mediocre conservative writers did more to make it happen than anything else.
158: There are differences, of course, between hte history of nature and that of culture: the "fossils" of literary evolution are often not lost, but carefully preserved in some great library, like most of those 7000 novels I have discussed here; but for the purposes of our knowledge, it's as if they too had crumbled into dust, because we have never really tried to read the entire volume of the literary past. Studying titles is a small step in that direction.

Underwood

http://dhdebates.gc.cuny.edu/debates/text/95

In fact, as Katherine Bode has noted, the questions posed by distant readers are often continuous with the older tradition of book history (Reading); as Jim English has noted, they are also continuous with the sociology of literature (“Everywhere”).
Distant reading is better understood as part of a broad intellectual shift that has also been transforming the social sciences. The
In the twentieth century, the difficulty of representing unstructured text divided the quantitative social sciences from the humanities. Sociologists
But much of the momentum it acquired over the last decade came from the same representational strategies that are transforming social science. Instead of simply counting words or volumes, distant readers increasingly treat writing as a field of relations to be modeled, using equations that connect linguistic variables to social ones
Conversation of this kind amounts to an empty contest of slogans between the humanities and social sciences, and I think Thomas Piketty spends the right amount of time on those contests: “Disciplinary disputes and turf wars are of little or no importance” (Capital, 33).
A grad student could do a lot of damage to received ideas with a thousand novels, manually gathered metadata, and logistic regression.
What really matter, I think, are not new tools but three general principles. First, a negative principle: there’s simply a lot we don’t know about literary history above the scale of (say) a hundred volumes. We’ve become so used to ignorance at this scale, and so good at bluffing our way around it, that we tend to overestimate our actual knowledge.6 Second, the theoretical foundation for macroscopic research isn’t something we have to invent from scratch; we can learn a lot from computational social science. (The notion of a statistical model, for instance, is a good place to start.) The third thing that matters, of course, is getting at the texts themselves, on a scale that can generate new perspectives. This is probably where our collaborative energies could most fruitfully be focused. The tools we’re going to need are not usually specific to the humanities. But the corpora often are.

Goldstone, "Teaching Quant Methods"

His lessons:

1. Cultivating technical facility with computer tools—including programming languages—should receive less attention than methodologies for analyzing quantitative or aggregative evidence. Despite the widespread DH interest in the former, it has little scholarly use without the latter. 2. Studying method requires pedagogically suitable material for study, but good teaching datasets do not exist. It will require communal e ort to create them on the basis of existing research. 3. Following the “theory” model, DH has typically been inserted into curricula as a single-semester course. Yet as a training in method, the analysis of aggregate data will undoubtedly require more time, and a di erent rationale, than that o ered by what Gerald Graff calls “the eld-coverage principle” in the curriculum.

7: The disticntion between the tool user and the heroically flexible programmer is not so clear...programming competence is not competence in analytical methods.
9 impt trade off: "effort is out of all proportion to the result"
...what students need...is data about which at least some answers have already been given
- connection to Allison: reuse to sponsor learning
10 Better to start from tamed data...in order to focus on methods for answering substantive questions.
11: Better models for the pedagogical rationale of quantitative methods are found in the other intensively skills-based courses in graduate education: premodern languages and bibliography.
12: On the contrary, DH should be wary of promises of ease: in prepackaged tools, in well-meaning introductory tutorials and workshops that necessar- ily stop short of what a researcher would need to draw conclusions, in rationalizations of inconclusive arguments as exploration, play, or productive failure.

Allison, "Other People's Data"

http://culturalanalytics.org/2016/12/other-peoples-data-humanities-edition/

Computational analysis of large corpora is a time-consuming process, and a lot of analysis ends up on the cutting room floor (or on the blog, or in a footnote or an appendix). We need to make better use of that discarded data[.]
A topic model of selected criticism is something like an argument and something like an archive. Knitting it into a history as recognizable as Graff's transforms the "data" back into argument, which might be built on or expanded in more traditionally argumentative ways.
really interesting discussion of Tait's being excluded from Underwood/Sellers model: "In the second, the study of volumes reviewed by nineteenth-century periodicals looks very much like a conventional periodical study, except that the foundational insight that frames it is drawn from a stylistic observation of great scope."
It's time to reconsider what it means to build on other people's work.

Goldstone, "From Reproducible to Productive"

Her essay [Allison's, above] describes a vision of cumulative research which I hope to see realized. Of course, it is particularly gratifying that the research she envisions building upon includes my own. But the real point, for Allison, is not that Ted Underwood and I said something convincing but that we—like Underwood and Jordan Sellers in their collaboration—produced reusable evidence.4 She provokes us to rethink the conditions in which such reuse could be possible for individual researchers and valued in our disciplines.
the uses of the "byproducts of cultural analytics"
But a probabilistic topic model ought normally to be understood like any other statistical model, as a selective picture of data rather than primary data itself.
Either quantitative studies of culture will make claims that can be defeated by evidence, or they will devolve into games with computers.

MLA Digital Pedagogy

Houston, Text Analysis

In humanities research, these steps are often iterative and recursive and are rarely labeled as hypothesis, data collection, experimentation, analysis, and argument. Instead, all of these things are called reading. This conflation of very different activities under one word has heightened recent debates between data driven approaches to large scale analysis, what Franco Moretti has termed distant reading, and the traditional formalist and hermeneutic approach called literary close reading (Moretti, Trumpener, Goodwin and Holbo). If reading is often hailed as a specific kind of pleasurable, human activity, the term text analysis may seem in contrast to emphasize statistical approaches to quantifiable aspects of language (Hoover; Jockers 25). The specific disciplinary and institutional histories of computer-assisted text analysis, humanities computing, and computational linguistics variously intersect and diverge from those of literary studies more generally (Rockwell, Jockers, Ramsay 2011, Bonelli).
But other scholars have argued that computational analysis merely makes explicit the codes and rules already embedded in the nature of textuality itself. Michael Witmore explains: "¶ 7Leave a comment on paragraph 70 I would argue that a text is a text because it is massively addressable at different levels of scale. Addressable here means that one can query a position within the text at a certain level of abstraction."
- unclear on addressable?

Klein, Code

It follows, then, that any instructor—-in the humanities or otherwise-—must first ask herself what she hopes her students will accomplish by learning to code. Is it an understanding of how to think algorithmically, so as to better comprehend how certain tasks can be abstracted into a series of steps? Is it a familiarity with the basic components of programming languages, so as to be able to understand how code is structured and produced? Is it the knowledge of a specialized programming language, one with specific applications in a particular field? Or is it the more experiential knowledge of what it feels like to move from defining functions and assigning variables to running executable code?
But a pedagogy of code in the humanities must also explore the intersection of the technical and the theoretical as expressed in or through code itself-— for instance, how we might locate certain literary or cultural concepts at work in the structure or syntax of code; or how we might ourselves write code that advances an argument, embodies a theory, or otherwise performs humanistic critique.

Forster, Fiction

Teaching fiction often means trying to understand this peculiar power: how does the fiction—the ostensibly “untrue”—nevertheless become powerful through its representations? In practice, this involves questions of narrative and plot, a sensitivity to form and style, and the histories (material and conceptual) of all of these terms. Why does a story feel like it has a sense of closure? When does it not? And what are the meanings—political and otherwise—of such closure? What are the devices by which we come to recognize and view a story from a particular perspective? How does a single work of fiction’s participation in a genre shape our expectations and our understanding of the story? Moreover, studying fiction also invites reflection on medium; as the “same” story moves across genres—a process amply illustrated, for instance, by the many versions (dramatic, illustrated, filmed, etc) of Bram Stoker’s Dracula which began appearing soon after its initial publication. Teaching of fiction, therefore, requires weaving together close attention to the histories and textures of particular works, with larger threads: longer political and social histories; the rise and fall of genres; and narrative structures broader than any single medium or genre.

Look up

Andrew Goldstone and Ted Underwood, "The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us," New Literary History 45, no. 3 (2014): 359-384.
Underwood and Sellers, "How Quickly Do Literary Standards Change?," 32: https://figshare.com/articles/How_Quickly_Do_Literary_Standards_Change_/1418394
Jockers, Text Analysis with R for Students of Literature
The Programming Historian
Karsdorp, Python Programming for the HUmanities
Klein, Lauren F. “The Image of Absence: Archival Silence, Data Visualization, and James

Hemings.” American Literature 85, no. 4 (December 2013): 661–88.

Seminar Notes

NLH article with Laura Heffernan - disciplinary history - look up
Moretti, Underwood, Ramsay
- exploratory data work vs. arguments with data
- data viz - powerful - instantly authenticating esp. when the ability to read it is variable in humanities
- is "methodological continuum" a neoliberal scheme?
- Underwood: it's a method moment, the method conversation about quant work is interesting, but the DH bubble may not last (tho the methodological one will - there's a long history of computational method)
- re-understanding the long history of quant work is the promise of DH - the value and impact of qualitative work, thinking about them together
- Underwood Periods - principle of historical contrast - that sort of frame for thinking could be reframed
Learning statistics tools
- should we trust the guides for humanists?
- Stephen Ramsay, Math for Humanists
- the "state of the discipline" question is important - how do you do it institutionally in an effective way?
- collaboration - again, Ryan's dictum: knowing enough so coders can't bullshit you
Close vs. Distant
- iterative between those - type of work RB does
- understanding texts as a type of data already
- critical information science
- distant reading is literary history, not criticism: two separate projects
  - what Moretti does is old historicism with a bigger corpus
- Leah Price - 20th Century - how "reading" came to mean "interpretation"
Moretti
- the copyright issue is latent and huge
- runs into a wall: I can only get to know the long titles and short
- based in traditional bibliographic methods
- read this against the Nulty: http://paulnulty.net/wp-content/uploads/2016/11/booktitles1.html
- taking on the position of the critic but doing literary history
- reifying C20 first edition-specific bibliographical practices by using that data (vs. now more copy-specific)
  - how copy-specific your corpora is can change the nature of the questions you can ask (having two different editions of Belinda makes that come up more often) - the individual copy is the base record
- what we think of as an C18 novel is an early C20 concept
- first eds mean almost nothing in terms of how works accrue meaning, but Moretti privileges the initial synchronic agency of the publisher's reading
  - A reader's corpus from the early C19 would be completely different (like William St. Clair, they didn't read the newly published stuff always)
  - END is on the way to a reading corpus, more so than Moretti - the collection is the corpus vs. a "representative corpus"
END
- prefaces are where literary theory happens in the C18 and this dataset can take a subset that privileges paratext as the fundamental record rather than the title
- the poetics of the encounter between 18th cent knowledge organization and C20 library technologies and C21 thinking about quantitation
  - using controlled vocabularies to turn literature into data: flattening, but potentially re-encoding in a 500 note field
  - a history of the history of these novels being catalogued in these records - the ongoing encounter between literature and information
Pedagogy
- making the work you're doing legible to a discourse of originality and methodological innovation - impt for publication, institutional visibility
- Goldstone: DH the inheritor of skills-based courses like learning a language or bibliography
  - pull apart programming and quant work: there are intellectually interesting things to learn from critically reading code, too
- Book history: learn how to hold a book before XML, and the C18 novel, about rare books cataloguers, descriptive bibliography (trickier than XML),
- Miriam Posner - can you make it
- Book history and DH go together well: trying to describe the same book, producing a controlled description

@@ Line 5: / Line 5: @@
 *what role did this sort of work play in your graduate studies? How do you envision us integrating projects like this with more traditional and institutionally legible graduate work?
 *More conceptual: What ways might you see for using this type of data to understand the relationship between bibliographical and narrative form? We can talk anecdotally about the effect serialization has on ''Bleak House'' as a narrative, but how might a dataset like END (or quantitative methods I'm unfamiliar with) help to understand that relationship more broadly?
-=Seminar Notes=
 =Early Novels Database=
@@ Line 85: / Line 83: @@
 *Klein, Lauren F. “The Image of Absence: Archival Silence, Data Visualization, and James
 Hemings.” American Literature 85, no. 4 (December 2013): 661–88.
+=Seminar Notes=
+*NLH article with Laura Heffernan - disciplinary history - look up
+*Moretti, Underwood, Ramsay
+**exploratory data work vs. arguments with data
+**data viz - powerful - instantly authenticating esp. when the ability to read it is variable in humanities
+**is "methodological continuum" a neoliberal scheme?
+**Underwood: it's a method moment, the method conversation about quant work is interesting, but the DH bubble may not last (tho the methodological one will - there's a long history of computational method)
+**re-understanding the long history of quant work is the promise of DH - the value and impact of qualitative work, thinking about them together
+** Underwood Periods - principle of historical contrast - that sort of frame for thinking could be reframed
+*Learning statistics tools
+**should we trust the guides for humanists?
+**Stephen Ramsay, Math for Humanists
+**the "state of the discipline" question is important - how do you do it institutionally in an effective way?
+**collaboration - again, Ryan's dictum: knowing enough so coders can't bullshit you
+*Close vs. Distant
+**iterative between those - type of work RB does
+**understanding texts as a type of data already
+**critical information science
+**distant reading is literary history, not criticism: two separate projects
+***what Moretti does is old historicism with a bigger corpus
+**Leah Price - 20th Century - how "reading" came to mean "interpretation"
+*Moretti
+**the copyright issue is latent and huge
+**runs into a wall: I can only get to know the long titles and short
+**based in traditional bibliographic methods
+**read this against the Nulty: http://paulnulty.net/wp-content/uploads/2016/11/booktitles1.html
+**taking on the position of the critic but doing literary history
+**reifying C20 first edition-specific bibliographical practices by using that data (vs. now more copy-specific)
+***how copy-specific your corpora is can change the nature of the questions you can ask (having two different editions of Belinda makes that come up more often) - the individual copy is the base record
+**what we think of as an C18 novel is an early C20 concept
+**first eds mean almost nothing in terms of how works accrue meaning, but Moretti privileges the initial synchronic agency of the publisher's reading
+***A reader's corpus from the early C19 would be completely different (like William St. Clair, they didn't read the newly published stuff always)
+***END is on the way to a reading corpus, more so than Moretti - the collection is the corpus vs. a "representative corpus"
+*END
+**prefaces are where literary theory happens in the C18 and this dataset can take a subset that privileges paratext as the fundamental record rather than the title
+**the poetics of the encounter between 18th cent knowledge organization and C20 library technologies and C21 thinking about quantitation
+***using controlled vocabularies to turn literature into data: flattening, but potentially re-encoding in a 500 note field
+***a history of the history of these novels being catalogued in these records - the ongoing encounter between literature and information
+*Pedagogy
+**making the work you're doing legible to a discourse of originality and methodological innovation - impt for publication, institutional visibility
+**Goldstone: DH the inheritor of skills-based courses like learning a language or bibliography
+***pull apart programming and quant work: there are intellectually interesting things to learn from critically reading code, too
+**Book history: learn how to hold a book before XML, and the C18 novel, about rare books cataloguers, descriptive bibliography (trickier than XML),
+**Miriam Posner - can you make it
+**Book history and DH go together well: trying to describe the same book, producing a controlled description

Difference between revisions of "Rachel Buurma DH Seminar"

Revision as of 12:43, 27 April 2017

Contents

Questions

Early Novels Database

Dataset

Readings

Moretti

Underwood

Goldstone, "Teaching Quant Methods"

Allison, "Other People's Data"

Goldstone, "From Reproducible to Productive"

MLA Digital Pedagogy

Houston, Text Analysis

Klein, Code

Forster, Fiction

Look up

Seminar Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools