Topic Modeling as an Archaeological Dig

I actually suspect that the topics identified by [latent Dirichlet allocation] probably always have the character of “discourses.”
—Ted Underwood, “What kinds of ‘topics’ does topic modeling actually produce?

The tools that enable historians to carry out this work of analysis are partly inherited and partly of their own making […]. These tools have enabled workers in this historical field to distinguish various sedimentary strata; linear successions, which for so long had been the object of research have given way to discoveries in depth.
—Michel Foucault, introduction to The Archaeology of Knowledge (tr. A.M. Sheridan Smith), p. 3

I’ve been thinking about topic modeling over the last few weeks as I re-read (the A.M. Sheridan Smith English translation of) Foucault’s The Archaeology of Knowledge, and thinking about what topic modeling has to offer those of us who (more or less) count as methodological Foucauldians. What I’d like to suggest is that, in point of fact, not only is topic modeling a useful exploratory tool for those engaged in constructing Foucauldian archaeologies, but that topic modeling is already a Foucauldian methodology. I’ll explain why I think this is the case in a bit, but first want to provide some quick definitions to anchor this discussion.

Topic modeling, as many readers are likely already aware, is an computational technique that aims to algorithmically discover clusters of words — the “topics” of “topic modeling,” though this nomenclature is itself not uncontroversial — in a text or set of texts that “belong together” by virtue of comparatively frequent co-occurrence in the source corpus.  It does this by applying one or another variation of an algorithm called “latent Dirichlet allocation,” which constructs the “topics” algorithmically, without human intervention. (A good not-particularly-technical introduction to topic modeling is David Blei’s Topic Modeling and Digital Humanities; an introduction that talks more about how the algorithm actually works is Matt Burton’s The Joy of Topic Modeling. A more mathematically oriented overview is John Mohr and Petko Bogdanov’s Topic Models: What They Are and Why They Matter.) A commonly used piece of software that performs topic-modeling operations — as well as several other types of operations — is MALLET.)

Now, that was a very brief introduction, and not an adequate one by any means; anyone not familiar with topic modeling should probably take a look at one of the articles linked in the last paragraph. But, for the sake of not stating that everyone who isn’t already familiar with the methodology needs to go off and do some background reading before continuing with this blog post, which would be almost unbearably snooty, I will say that, probably, the best very concise explanation of what topic modeling is can be found in Sharon Block’s Doing More with Digitization:

Topic modeling is based on the idea that individual documents are made up of one or more topics. It uses emerging technologies in computer science to automatically cluster topically similar documents by determining the groups of words that tend to co-occur in them. Most importantly, topic modeling creates topical categories without a priori subject definitions. This may be the hardest concept to understand about topic modeling: unlike traditional classification systems where texts are fit into preexisting schema (such as Library of Congress subject headings), topic modeling determines the comprehensive list of subjects through its analysis of the word occurrences throughout a corpus of texts. The content of the documents—not a human indexer—determines the topics collectively found in those documents.

Perhaps one reason why Block’s definition is so good is that it is a very early explanation: Matthew Jockers, certainly a person more experienced with topic modeling than I am, says that Block’s 2006 description “is, to my knowledge, the earliest example of topic modeling in the humanities” (123). As is so often the case with new methodologies, the earliest people to develop the methodology came up with (at least some of) the best explanations, perhaps in part because they (unlike me) couldn’t depend on their audiences already understanding anything about the methodology.

Discussion of a previous example

I’ve talked previously, a bit, about one of my own experiments with topic modeling, when I let MALLET run over a group of ten short stories by H.P. Lovecraft. There are several things that I take to be significant in these results. Perhaps most obviously, I take them to be fair and valid because the topic-modeling process has identified topics that are, in many cases, similar to the topics that an astute reader of these Lovecraft stories would likely identify, with less numeric data and more effort, as “what these stories are about”: the algorithm passes a basic sanity check, because if the program gave us topical clusters like “banking Jesus roses fedora Samsung puppy snickerdoodle polyethylene,” we would know that something was wrong. The fact that the algorithm is coming up with (more or less) what we expect suggests that it is doing more or less what we expect it to do.

Of course, this produces a separate problem, and those experienced with topic modeling are nodding their heads in advance here: if it just gives us what we expect, what good is it, even if it is accurate? We might just as well be doing the same thing ourselves. And, though one fair potential response might be “yes, but it does so much faster, because your laptop can ‘read’ those ten stories much more quickly than you can,” and though this in itself opens up a lot of possibilities that are currently being explored by intelligent people (a great example is Alison Chaney and David Blei’s Visualizing Topic Models project, which includes an example working over a large number of documents from Wikipedia), I’m interested specifically in the method’s exploratory possibilities, its ability to tell us “things about the texts” that we wouldn’t have noticed ourselves.

Because no one ever gets everything out of any given text, no matter how talented a reader she is: our attention is always focused on specific features and overlooks other features. No one can notice every feature of a text, though some people clearly notice more (or even much more) than others, and part of the point of a contemporary education in the humanities, and particularly in literature, is to teach students to notice more as they read. Topic modeling software, though, is a kind of an “objective reader” in ways that human readers rarely or never are: it’s not previously involved in any scholarly or interpretive project other than counting words and noticing co-occurrences, and so it catches things that we miss.

It also misses things that we catch, and is therefore not a substitute for informed human readings: topic models supplement our existing reading practices by providing data that would be incredibly time-consuming to collect by hand but are useful as places to start thinking about a text. A topic modeling application is not an automatic interpretor that grinds out final answers (“THE POEM IS ABOUT DEATH.“), crushing human creativity and enslaving all of us with its uncaring and inhuman efficiencies, as any number of 1970s dystopian sci-fi movies imagined. It produces intermediate data, not final interpretations, and part of the reason for this is that the data can’t be interpreted in any meaningful way without being informed by, well, an informed reading of the text being interpreted. Taking another look at the topic models I produced in my Lovecraft analysis shows that the topics on their own are just, as the phrase has it, a bag of words that involves counting word frequency and comparative distance between words, but discards word order and other grammatical aspects of the text it is analyzing. Which means that topic modeling throws out a whole lot of information that it can’t process, unlike us: it doesn’t “really understand plot” (or characterization, or reference, or irony, or puns, or jokes, or any of hundreds of other things that we all pick up on when we read); it’s just seeing which words tend to occur near each other.

But the intermediate data it provides is incredibly useful on its own: if words aren’t occurring right next to each other, regularly, it’s hard for humans to notice them, but topic modeling can count words that appear near each other for a much more expansive definition of “near” than humans can, and can do so more reliably than we could; and this data that a group of words tends to appear comparatively close together actually says a lot about theme. Too, topic modeling can pick up on words that we tend to filter out, such as prepositions and conjunctions (a great example is cited by Matthew Jockers on page 26 of Macroanalysis: John Burrow’s — not topic-modeling-based, as it was written nearly two decades before topic modeling was invented, but computationally based and conceptually similar — 1987 study of pronoun usage in Jane Austen).

In my own case, with the limited experiments in topic-modeling Lovecraft’s stories, there were a number of things that I noticed from the topic models that I had not previously noticed, even though I had read (some of) those particular stories multiple times across a substantial chunk of my reading life: the extreme degree of prominence of the questions of heredity in “Arthur Jermyn” is one of the less insightful examples here (I’d noticed the emphasis in the story, but not the degree to which the story emphasized it; in fact, discovering that the machine-detected topic 2 comprises 39.8% of the story was quite a surprise to me). But there are other, more subtle clues that the topic-modeling process turned up that are quite interesting and deserve further attention. Most notable, I think are topics 10-12, which seem to be exclusive of each other in stories where they appear: not more than one of topics 10-12 appears in the top nine topics for any of the stories under consideration. These topics are all variants on a single theme that Lovecraft scholars tend to collapse into a single “Lovecraftian concern”: horror and epistemology, inflected in different ways (topic 10 might be seen as a “pure” form of the topic, while topic 11 transposes these concerns onto rural settings and rural people, and topic 12 transposes them onto what might be considered the time’s version of “science-fiction”  concerns). Topic modeling in this case has turned up something that, I think, scholars have overlooked, and that deserves more attention than it has received.

But for Foucault …

Turning back to Ted Underwood’s comment at the beginning of this post, I’d like to examine the question of whether topic modeling and its underlying algorithm, LDA, do in fact “always have the character of ‘discourses.'” Discourse for Foucault is a slippery topic, one he uses in multiple senses; but halfway through the Archaeology, he looks back on his previous uses of the word “discourse” and says that

in the most general, and vaguest way, it denoted a group of verbal performances; and by discourse, then, I meant that which was produced (perhaps all that was produced) by the groups of signs. But I also meant a group of acts of formulation, a series of sentences or propositions. Lastly— and it is this meaning that was finally used (together with the first, which served in a provisional capacity) — discourse is constituted by a group of sequences of signs, in so far as they are statements, that is, in so far as they can be assigned particular modalities of existence. (107)

There are several reasons here why I believe that topic modeling fits a number of characteristics of these overlapping definitions quite nicely:

  • Perhaps most obviously, Foucault’s first definition of “a group of verbal performances” is made by specifying that “that which was produced […] by groups of signs,” and this coincides with the “bag of words” nature of topic modeling quite well. Foucault is of course not (entirely) unconcerned with grammar, word order, or meaning; but neither does he assign them transcendentally revelatory functions; they can be better understood, and more accurately re-inscribed, when their underlying relations are understood. Which brings me to my next point:
  • Foucault writes near the beginning of the Archaeology that “there is a negative work to be carried out first: we must rid ourselves of a whole mass of notions […]. We must question those ready-made syntheses, those groupings that we normally accept before any examination, those links whose validity is recognized from the outset; we must oust those forms and obscure forces by which we usually link the discourse of one man with that of another; they must be driven out from the darkness in which they reign. And instead of according them unqualified, spontaneous value, we must accept, in the name of methodological rigour, that, in the first instance, they concern only a population of dispersed events.” (21, 22) This is precisely what topic modeling does: it abandons traditional groupings of — and relations between — ideas in favor of groupings based purely on word frequency. To (tell a machine to) engage in a topic-modeling exercise is a methodological step that engages in an automated (part of an) analysis that is unbiased by any pre-existing ideas about how ideas should be grouped together (except insofar as words are grouped together based on how frequently they occur together — which is an ideological presupposition on its own, and therefore not neutral, but which has the advantage that the algorithm is known and that open-source implementations are available and can be examined).
  • Topic modeling is a way of throwing those “groups of sequences of signs” into relief, helping us to drive out “the forms and obscure forces by which we usually link the discourse of one man with that of another” “from the darkness in which they reign.” Seeing words that we had not noticed frequently co-occurring put into co-occurrent groups tells us something about the connection of signs, of how those signs construct meaning, of how “knowledge” and power are understood, and constructed, by a text. (But what it tells us, exactly, depends on the text, and requires interpretation, just as the text itself requires interpretation. Topic modeling is a tool for interpretation, not an algorithm that interprets for us.) As Foucault puts it: “In fact, the systematic erasure of all given unities enables us first of all to restore to the statement the specificity of its occurrence.” (28)

For all of these reasons, I believe that topic modeling is a technique whose underlying goals are basically compatible with Foucault’s; and I would also like to suggest that there are some parts of Foucault’s archaeological methodology to which topic modeling is a particularly well-suited investigative tool. In particular, I’d like to take a brief look at how Foucault analyzes what he calls “rules of formation,” with an emphasis on the formation of the formation of concepts (discussed in chapter five of part II of the Archaeology).

First, though, I want to quote Foucault’s explanation of a methodological problem that arises in the analysis of discourse and where discursive analysis found itself later in the twentieth century:

We sought the unity of discourse in the objects themselves, in their distribution, in the interplay of their differences, in their proximity or distance — in short, in what is given to the speaking subject; and, in the end, we are sent back to a setting-up of relations that characterizes discursive practice itself; and what we discover is neither a configuration, nor a form, but a group of rules that are immanent in practice, and define it in its specificity. (46)

This is precisely what topic modeling does: it analyzes and helps to uncover “relations that characterize discursive practice itself,” without regard to underlying “objects themselves,” and helps to reveal the “group of rules that are immanent in practice,” even if never specified explicitly, and to make them explicitly visible.

Focusing more closely on “The Formation of Concepts,” though, opens up questions (of course) about what Foucault means by “concept”; Foucault is rather in character here: he uses the word without defining it; it becomes, in part, the object of analysis in this chapter, and answers are suggested without being formulated explicitly. But I’m willing, for the purpose of avoiding letting this blog post get any longer than it seems likely to get, to simply fall back on the Stanford Encyclopedia of Encyclopedia of Philosophy‘s rather neutral definition, with which Foucault doesn’t explicitly differ: that is, I’ll take “concept” to mean some combination of (1) a mental representation; (2) a mental ability held by cognitive agents; and (3) a Fregean sense (i.e., an abstract object), with Foucault’s usage being more closely involved with senses (1) and (3) than with (2), with which he is not particularly concerned throughout most of the Archaeology.

But what’s interesting about this particular chapter of the Archaeology, I think, is that topic modeling maps closely onto an appropriate investigative methodology for what Foucault describes as the various linguistic “fields” constituted by various discursive actions and operations. Foucault’s brief summary:

The configuration of the enunciative field also involves forms of coexistence. These outline first a field of presence […] Distinct from this field one may also describe a field of concomitance. [….] Lastly, the enunciative field involves what might be called a field of memory. (57-58)

The discursive fields described here — especially the first two — are precisely what topic modeling is designed to investigate directly. Of course, there are numerous other tools that can investigate a “field of presence,” particularly concordance tools (AntConc, in particular, deserves a mention), though I think that topic modeling takes this further than simple concordance work by beginning the analytical process and starting to reveal the  “forms of coexistence” that Foucault names by automating some of the drudgery required and making collection of numerical data possible. The “field of concomitance” that Foucault names is, of course, the primary target of topic modeling software, which intends to analyze precisely which words co-occur and how often, and to group them together to help show the researcher which words seem to be related to each other in a particular discourse. (The astute will notice, of course, that topic modeling analyzes [sets of] texts, not “discourses”; but then, so do literary scholars and historians: at some point, we need to define where the boundaries of the “discourse” with which we’re working and how they’re instantiated in and reflected by individual texts, and to what extent a text constitutes a discourse; these also are not problems whose solution topic modeling automates.)

Topic modeling is not primarily targeted at analyzing the Foucauldian “field of memory,” as it doesn’t take temporal development across an individual text or between a set of texts into account; it simply groups words. But, as with other fields discussed here, topic modeling has the potential to be a good preliminary and intermediate step toward investigating what’s still hanging on in conceptual groupings: collecting data on texts that we think of as “on related topics” and seeing how the clusters of words change can provide insight into this area, too, and (again) provides an  opportunity to reveal previously unnoticed groupings.

Foucault also comments on the aim of discursive analysis in ways that show how topic modeling can contribute to discursive analysis:

The description of such a system could not be valid for a direct, immediate description of the concepts themselves. […] One stands back  in relation to this manifest set of concepts; and one tries to determine according to what schemata (of series, simultaneous groupings, linear or reciprocal modification) of the statements may be linked to one another in a type of discourse; one tries in this way to discover how the recurrent elements of statements can reappear, dissociate, recompose, gain in extension or determination, be taken up into new logical structures, acquire, on the other hand, new semantic contents, and constitute partial organizations among themselves.  These schemata make it possible to describe — not the laws of the internal construction of concepts, not their progressive and individual genesis in the mind of man — but their anonymous dispersion through texts, books, and œvures. A dispersion that characterizes a type of discourse, and which defines between concepts, forms of deduction, derivation, and coherence, but also of incompatibility, intersection, substitution, exclusion, mutual alteration, displacement, etc. Such an analysis, then, concerns, at a kind of preconceptual level, the field in which concepts can coexist and the rules to which this field is subjected. (60)

Again, what I want to note here is primarily that topic modeling is itself a way of beginning to analyze discourse in a Foucauldian way, one that doesn’t presuppose that the discourse in question was constructed according to an intentional system by which “the laws of the internal construction of concepts” become articulate in a text, or according to which the ideas’ “progressive and individual genesis in the mind of man” is taken in advance to be the determining factor in the construction of the texts. Again, topic modeling is a way of getting at characteristics of the discourse itself without resorting to the traditional models involving intentionality and human creativity that human interpreters tend to re-inscribe as they interpret texts; it is a way of getting outside the presumption of the author’s intentionality, of which Foucault was so critical, as a latent assumption in interpretive method. “[N]ot the laws of the internal construction of concepts, not their progressive and individual genesis in the mind of man — but their anonymous dispersion through texts, books, and œvures,” Foucault writes, describing the aims of archaeological analysis; this is precisely what word frequency-counting gets us over the set of selected texts (which may also constitute “books, or œvures,” depending on how they are selected) that are imported into the topic modeling software. Again, the software does not perform genuine interpretive work on the behalf of the scholar, but it does help to reveal connections that are hidden, that are difficult to notice, that slip past the attention of even the most determined scholar; it helps to get at “[a] dispersion that characterizes a type of discourse, and which defines between concepts, forms of deduction, derivation, and coherence, but also of incompatibility, intersection, substitution, exclusion, mutual alteration, displacement, etc.” so that we can notice and interpret it. It brings these relations onto our radar and assists with our hermeneutic and structural work, so that we can see the “preconceptual level, the field in which concepts can coexist and the rules to which this field is subjected. ”

Methodologically, Foucault says that, in investigating the linguistic fields he has described,

one does not subject the multiplicity of statements to the coherence of concepts, and this coherence to the silent recollection of a meta-historical ideality; one establishes the inverse series; one replaces the pure aims of non-contradiction in a complex network of conceptual compatibility and incompatibility; and one relates this complexity to the rules that characterize a particular discursive practice. (62)

Again, I’d like to point out that this is precisely what topic modeling does: it starts with a statistical analysis of work co-occurrence, without even a preconception of what the words in question mean; it establishes linguistic networks in a meaning-agnostic way, leaving the hermeneutics of these networks to the scholar running the software to interpret; it avoids “subject[ing] the multiplicity of statements to the coherence of concepts, and this coherence to the silent recollection of a meta-historical ideality.” It just counts words, notices how they “hang together” according to the “bag of words” model, and goes ahead to assist in beginning to describe the “complex network” (or networks?) “of conceptual compatibility and incompatibility” as a way of beginning to “[relate] this complexity to the rules that characterize a particular discursive practice.”

It’s worth saying again (and again, and again) that topic modeling is not a replacement for analysis, but a tool that’s productive in assisting in engaging in it. It is a place for beginning and a tool for noticing, which may very well lead to beginning again (and again, and again), running further experiments to investigate medium to large corpora of data (there are ways in which topic modeling is also useful on small data sets, but I am primarily concerned here with the large-scale archaeological digs made possible by software tools). Again, Foucault has anticipated me here:

when one speaks of a system of formation, one does not only mean the juxtaposition, coexistence, or interaction of heterogeneous elements (institutions, techniques, social groups, perceptual organizations, relation between various discourses), but also the relation that is established between them — and in a well determined form — by discursive practice. What is to be done with these […] systems or rather those […] groups of relations? How can they all define a single system of formation? (72)

Foucault answers his own question by reminding the reader that “the different levels thus defined are not independent of one another” and that “strategic choices do not emerge directly from a world-view or from a predominance of interests peculiar to this or that speaking subject; but […] their very possibility is determined by points of divergence in the group of concepts” (72).

Which is to say (again) that there are numerous points of contact between the archaeological methodology and the kinds of tasks that topic modeling aims to achieve; but I don’t want to belabor the point. There’s a book to be written here someday, I think, talking in more detail about how topic modeling achieves this, but this blog post has already grown much longer than I intended, so I’ll close up with a specific proposal for research that I’m not qualified to conduct but that might serve as a test case for how well topic modeling can be used to conduct archaeological research.

Epilogue: a Proposal

At the very end of the Archaeology, in the last three pages of (the 1972 publication of A.M. Sheridan Smith’s English translation of) the book, Foucault discusses the possibility of “other” archaeologies than those he has conducted throughout the book as methodological examples. (Indeed, I largely see my own mostly-yet-unwritten dissertation as one of these “other archaeologies” whose potential is briefly discussed on pp. 192-95 of the Archaeology.) His own first example of a potential alternate discourse in this section, first published seven years before the first volume of the History of Sexuality, is an archaeology of sexuality:

Such an archaeology would show, if it succeeded in its task, how the prohibitions, exclusions, limitations, values, freedoms, and transgressions of sexuality, all its manifestations, verbal or otherwise, are linked to a particular discursive practice. It would reveal, not of course as the the ultimate truth of sexuality, but as one of the dimensions in accordance with which one can describe it, a certain ‘way of speaking’; and one would show how this way of speaking is invested not in scientific discourses, but in a system of prohibitions and values. (193)

In fact, this is a fair overview of Foucault’s approach to the topic in the first volume of the series — at least in its general outline. What I propose as a test of my thesis that topic modeling is a useful tool for the production of Foucauldian archaeologies is a series of topic modelings based on Foucault’s own data set discussed in the first volume of The History of Sexuality: what happens when MALLET (or another topic modeling tool) runs over (well-selected, pre-processed) groups of corpora identified by Foucault? What topics does it turn up, and how do those topics change over time, and how does this compare to Foucault’s own analysis? What happens when other texts, unexamined by Foucault (or, at least, not explicitly theorized by him), but germane to his arguments and analyses in that volume, are added to these corpora? And what does this tell use about Foucault’s methodology, and about topic modeling?

There are whole series of potential books ready to be written on this topic. Alas, they should be written by someone who reads the French of Foucault’s source texts far better than I do.

(Print) References

Foucault, Michel. “The Archaeology of Knowledge.” The Archaeology of Knowledge and The Discourse on Language. Trans. A[lan] M. Sheridan Smith. New York: Pantheon Books, 1972. 3–211. Print.

Jockers, Matthew Lee. Macroanalysis: Digital Methods and Literary History. Urbana: University of Illinois Press, 2013. Print.

Topic Modeling Experiments with some Stories by H.P. Lovecraft

I’m re-posting a short write-up of a weekly practicum assignment from Alan Liu‘s Introduction to Digital Humanities graduate seminar in fall 2014 tonight so that I can cite it as an example from another blog post. There have been some (very small) modifications; the original write-up, from 3 November 2014, is currently available here for comparison, should anyone be interested in in making a comparison.

“The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents.” — H.P. Lovecraft, “The Call of Cthulhu” (1926)

I installed MALLET on my computer (this is surprisingly non-intuitive under Linux, and was for me an excellent example of prior knowledge being obtrusive in learning a new program). After getting everything installed and working and running through the Programming Historian tutorial, I selected a group of texts to play with, something I had lying around on my hard drive: 10 stories by H.P. Lovecraft, extracted from this collection on Project Gutenberg. Of the nearly seventy stories included, I picked two that I know well (“Facts Concerning the Late Arthur Jermyn and His Family” and The Shadow Over Innsmouth — I’ve taught them to undergrads and the appear on the reading list for the first chapter of my dissertation) and eight more chosen on the basis of the fact that (a) I’m more or less familiar with them, and (b) they more or less cover the major branches of Lovecraft’s writings that his fans tend to be familiar with. (Incidentally, these texts are selected from the same corpus that my automatically generated experimental blog The Worst of Bad Lovecraft draws from.) I saved each to a separate text file in a separate directory, then I imported them into mallet with

bin/mallet import-dir --input lovecraft/ --output lovecraft.mallet --keep-sequence --remove-stopwords

and then started playing with the .mallet file in various ways, including looking for an optimal number of topics. I initially tried 20, as in the tutorial, but they didn’t seem to be a coherent set — I couldn’t easily characterize any of them in any other way than by noticing that many of the topics were clearly centered around a particular story (the “Arthur Jermyn” cluster, the “Innsmouth” cluster, the “Herbert West” cluster …). So I tried larger and smaller numbers, but the sweet spot seems to be about 25 in this case: there are still identifiable clusters around most of the stories, but there are also more general clusters. Running it with a small number of topics seems (to me) to produce uninterpretable results that say little more than “H.P. Lovecraft wrote some creepy things.” (Well, we knew that already. In a lot of ways, he was a creepy guy.) But running it with a larger number tends to produce equally uninterpretable results. Or maybe it’s just that I’m not yet sufficiently conversant with interpreting the results of mallet runs and/or tweaking the search parameters (I haven’t yet played with the “hlda” command that the tutorial mentions briefly). But at about 25 topics over the corpus of ten stories, themes seem to emerge in at least some clusters. So I exported a dataset using

bin/mallet train-topics --input lovecraft.mallet --num-topics 25 --optimize-interval 1000 --output-state lovecraft-state.gz --output-topic-keys lovecraft_keys.csv --output-doc-topics lovecraft_composition.csv --word-topic-counts-file lovecraft_word_topic_counts.txt

which gave some interesting results. I’ve zipped up the relevant files into, well, a .zip file, lovecraft.zip, in case anyone wants to look at them, then visualized the results as a word cloud, as Matthew Jockers did in Macroanalysis, by using Lexos (hence the --word-topic-counts-file switch), as described in this blog entry by Scott Kleinman. Here are the clouds: These clusters actually say a lot, and a number of things jump out at me here. One is that certain topics are certainly still clustering around certain stories. For instance:

  • Topic 0 is clearly a topic describing The Shadow Over Innsmouth, and this is notable even aside from the fact that the most prominent word visible in the cloud appears in that story’s title. Several other prominent words, for instance, are lower-cased proper names that appear repeatedly in that story: “eliot,” “dagon,” “zadok,” “walakea.” Also notable is the prevalence of heavy representations of New England spellings that are particularly prominent in that particular story, but don’t occur particularly often in most of the other stories in the group of texts being analyzed: “aout,” “ye,” “ud” (for “would”), “agin,” “sech,” “feller,” “jest” (not a joke in this case, but rural Massachusetts for “just”), “taown,” “seed” (an irregular past participle here). Topic 14 is also an Innsmouth topic, for the same reasons (“marsh” here is a last name, not geographical description). In fact, looking at the lovecraft_composition.csv file shows that these two topics together account for almost 30% of the story. Topic 19 is almost as prominent as 14, comprising just over another 7% of the story and having similar thematic concerns, though more diffusely so. But topic 19 is also a prominent component of two other stories and will be discussed in more detail later.
  • Topic 2 is clearly the “Arthur Jermyn” cluster, describing a story particularly concerned with ancestry and having a very troubling relationship to race. Notable here are the ancestral terms, since this story (note the titular end “… and his family”) is concerned with descent and biological race. Also notable are the names of Jermyns — here’s a family tree I threw together for a slide show when I was lecturing on it:In this case, examining the lovecraft_composition.csv file shows that this cluster alone accounts for 39.8% of the topics in the story— the next largest topic is topic 24, which accounts for only about 8.38% of the story (and note the prominent presence of the word “black” in that cluster, as well as prominent genealogical words (“child,” “children”) and words the story uses to describe physiognomy (“face,” “head,” “parts,” “arms,” “appeared,” “ghoulish,” “slope”).
  • To avoid belaboring the point, I’ll just point out quickly that, even without going through the data, and just from looking at the word clouds, it’s apparent that topic 6 is a cluster centering around “The Dunwich Horror” (names of the Whateleys, and the family name as the most prominent word in the cluster; again, the emergence of dialect renderings); topic 8 is the “Erich Zann” cluster (the name, and musical terminology); topic 9 is the “Herbert West” cluster (again, the name; also, words associated with the Frankenstein theme of the story: technical terminology from anatomy and the concern with the educational setting); topic 13 is the “Call of Cthulhu” story (multiple names are indicators here this time, since “Cthulhu” occurs a fair amount in HP Lovecraft stories; the settings; the thematic and symbolic elements: “museum,” “hieroglyphics,” “idol”); topic 18 is the “Cats of Ulthar” cluster (“cat” and “cats” are prominent, a dead giveaway here: cats are not a large thematic concern of Lovecraft’s; the setting: “remote” “cottage”; the main characters: “wife” and “wanderers”).
  • I admit that until I looked at the data, it was not immediately apparent to me that topic 17 was a cluster centering around “The Lurking Fear,” but on second glance it clearly is: “thunder,” “lightning,” and “tempest” are major thematic concerns and plot-event motivators there, and the words “lurking” and “fear” occur prominently; there’s also a group of underground-related words (“digging,” “mounds,” “underground,” “tunnel”), and underground is where the terrifying creatures live in that story. Similarly, I missed the fact that topic 21 is clearly the “Polaris” cluster, despite the clearly astronomical cast of the topic: “horizon,” “overhead,” “quarter,” “north,” “plateau,” “peaks,” “pole.” (But in my defense, I haven’t read “Polaris” in a long time, possibly since high school, and I only picked it out to have a tenth text.) Looking at the data clarifies a number of things.

Here’s a slightly cleaned-up table with each story and its two most prominent topics, with their percentages rounded off to three significant digits:

Story Title Most common topic Most common topic frequency 2nd topic 2nd topic frequency
“Facts Concerning The Late Arthur Jermyn And His Family” 2 39.8% 24 8.39%
“The Dunwich Horror” 6 21.9% 20 9.59%
“The Doom That Came To Sarnath” 7 56.9% 19 10.3%
“The Music Of Erich Zann” 8 35.1% 24 14.0%
“Herbert West, Reanimator” 9 28.5% 15 10.5%
“The Call Of Cthulhu” 13 23.3% 19 11.4%
The Shadow Over Innsmouth 14 15.8% 0 13.6%
“The Lurking Fear” 17 25.2% 5 12.8%
“The Cats Of Ulthar” 18 50.9% 24 12.8%
“Polaris” 21 43.8% 5 9.28%

I take this data to have several implications:

  • There aren’t many thematic elements contributing prominently to the individual stories. For half of the stories on this list, the top two topics alone account for 48% or more of the stories in question; for three stories, the top two topics account for more than half of the stories’ content.
  • In all but one story (Innsmouth), the top two topics account for at least 30% of the story’s content, and Innsmouth comes close at 29.43%; Innsmouth is also the longest story, at nearly 150K of plain text, and one might reasonably expect a longer story to contain a larger number of prominent topics than does a shorter one (i.e., the longer it is, the more it has to contain to keep reader interest and to produce a horrifying effect).
  • In two stories (“The Cats of Ulthar” and “The Doom That Came to Sarnath”), the top two topics account for over 60% of the story’s content (63.71% and 67.25%, respectively). These are both short stories (7.2K and 14.6K of plain text, respectively; the first and third shortest stories in the group under analysis).
  • In fact, there’s a pretty strong negative statistical correlation between (a) the length of the story in question, and (b) how prominently its top two topics are represented. Here’s a table:
Title Prominence of First Two Topics File Size (KB)
The Shadow Over Innsmouth 29.463% 147.8
“The Dunwich Horror” 31.453% 98.2
“The Call Of Cthulhu” 34.732% 67.8
“The Lurking Fear” 37.986% 42.3
“Herbert West, Reanimator” 39.040% 69.6
“Facts Concerning The Late Arthur Jermyn And His Family” 48.200% 20.9
“The Music Of Erich Zann” 49.101% 18.9
“Polaris” 53.069% 8.1
“The Cats Of Ulthar” 63.710% 7.2
“The Doom That Came To Sarnath” 67.248% 14.6

And here’s a scatter plot:

  • Lovecraft’s thematic concerns actually vary quite a bit from story to story, at least insofar as MALLET is able to determine them. Only topics 5, 19, and 24 are repeated in the table indicating the top two themes, above. (But these three topics appear in seven stories, leaving only 3 that have no topical overlap with the others in the top two topics for each story.) Horror writers are often represented by detractors as writing the same damn thing over and over, but there seems to be a lot of variation here. (This is admittedly a rather simplistic interpretation, and should really be supported by further analysis of what’s actually in each individual topic.)

Looking at the repeated topics says something about the general thematic trend of the stories as a whole, I think:

  • Topic 5 contains the prominent words “time” and “night,” as well as “day,” plus a variety of nature-related words: “stream,” “wind,” “moon,” “nature,” “air,” “light,” “ground.” (I exempt “plain” from this group because it would be too much work tonight to determine whether it’s being used as a geographical feature or an adjective — my hunch is that it’s most often the latter. Similarly, it would be too much work to determine whether “rose” is the flower or the past tense of the verb “rise,” though again, I suspect the latter.) Some of the remaining words fit into a broader general pattern of indicating “setting”: “house,” “mill,” “region,” “place.” This topic has interesting placement in the ranked list of topics for each story: It occurs as the second most prominent topic twice, and usually in the top eight, but in one story (“The Call of Cthulhu”), it is the twelfth most common topic (but never lower, out of the 25 identified topics); on average, it ranked 5.5 out of the 25 topics identified. I take this to mean that setting is a moderately important background concern for the stories in question, and that the particular cluster of words that represents it shows Lovecraft’s creative debt to Poe’s American “Dark Romanticism.” More, there are two other general types of words occurring in this thematic cluster that are highly suggestive: “eyes,” “find,” “notice,” “visible” suggest that the concern is specifically with an active agent perceiving a setting, and these are modified by some adjectives and adverbs that suggest a mode in which the perception occurs: “distant,” “wholly,” “lines,” “narrow,” “absence,” “fact” all suggest the detached, analytical view of nature implied by abstract, industrial Western scientific thinking. Of course, Lovecraft’s overarching project becomes clear with the numerous emotional modifiers attached to this cluster: “horrible,” “lone,” “fear,” “shot,” “left” (behind, is my hunch, though, again, it’s too late tonight to verify), and “iron.”
  • Topic 19 might be designated the “patriarchy” cluster: “great,” “men,” “city,” “spoke,” “heaven,” “pillars,” “art” and “artists,” “worship,” “manuscript,” “day” and “time” and “year,” “notes,” “letters,” and “words” all fit this pattern. There is also a concern with patrilineal descent: “elder,” “young,” “aged,” “born,” “mortal,” “died,” “native” — and with the way that knowledge was transmitted: “spoke,” “showed,” “found,” “ears,” “clear.” But this is not an unqualified endorsement, but rather an uneasy (“vaguely”; “half”; the qualifier “whilst”) anxiety about its displacement (“unknown”; “silence”; “strange”; “aspect”; “bizarre”).
  • Topic 24 might be said to be the most obviously “Lovecraftian” cluster, indicating the particular plot-mechanical devices that Lovecraft helped to solidify into the canon of hoary horror tropes: “nearer,” “door,” “room,” “sound” and “sounds,” “fire,” “missing,” “dark” and “black,” “lights,” “call,” “heard,” and that which provokes terror because it is very “large” (a particularly prominent thematic concern for Lovecraft — the back side of the Romantic sublime, in fact). And there are the keywords indicating the traditional horror setting and its problems: “amidst,” “room,” “slope,” “nearer.” The parts of the body mentioned are those that are both vulnerable and closely connected to identity and interpersonal connection, especially in the context of romantic and sexual love: “lips,” “face,” “head,” “hand,” “arms”; these provide a reading of a related word, the fragile human connection indicated by “touch” (which is precisely what is punished in much later high school slasher-horror movies). Also prominent are words associated with the emotions supposed to be provoked by the horror genre: “terror,” “wild,” “dare”; the typical “manner” in which horror achieves its effects and by which it advances its plots: “suddenly,” “finally,” “strange,” “stay” and “sat” (almost always mistakes), “met” (such a necessary plot device for the genre), “utter” (I suspect this is most often a verb in these texts), “sleep,” “guided,” “ascent” (Lovecraft is here the antecedent of the writer who sends the ditzy heroine up the stairs while she is being chased, though Lovecraft’s protagonists are more likely to be male and seeking rather than fleeing). Too, there are the traditional larger-scale thematic concerns for the genre, the “what’s really at stake” keywords: “exist”; “truth”; “memory” and “reason” (precisely what is threatened by Lovecraft’s supernatural forces); “man” (in the double sense of the masculine taken as the default human); “perfect”; “alive” (a state that is often itself a source of horror in Lovecraft’s fiction).

Only topics 1, 3, 4, 10, 11, 12, 16, 22, and 23 do not appear in the top two topics for any of the stories in question. These I take to indicate recurring background concerns and devices in the texts that are currently under the lens. To provide a quick sketch of potential readings here:

  • Topic 1 expresses a tension between traditionalist agricultural customs and modernization. (“iii,” alas, is just a Roman numeral commonly used as a heading, though I wonder whether this topic tends to occur in the third section of various stories. Again, more attention is needed here.)
  • Topic 3 is another knot of traditional horror settings, plot devices, and concerns.
  • Topic 4 is primarily concerned with genealogy, race, epistemology, and the ways that these concepts work themselves out in Lovecraft’s stories. (Again, there’s a structural debt to Poe indicated here, insofar as “the house of Usher” is both the decaying ancestral manse and the family as an institution.)
  • Topic 10 is concerned with epistemology as expressed in scientific-rational discourse, with the university as its exemplary setting, and with the consequences of the epiphanic realization of the limits of scientific epistemologies. (Again, “ii” is a Roman numeral heading.)
  • Topic 11 is much like topic 10, but critiques these concerns through juxtaposition with rural, uneducated people who have a rough and terrible wisdom passed down from generation to generation.
  • Topic 12 takes topic 10 and transposes it more directly onto what we might think of as a “science fiction” domain: the limits of human knowledge are critiqued not through homespun-though-terrible folk wisdom, as in topic 11, but rather by pushing the limits of scientific knowledge past what humans can understand. (Interestingly, never does more than one of topics 10-12 appear in the first nine topics for any of the stories under consideration.)
  • Topic 16 might plausibly be read biographically as the development (“increasingly”) of an antifeminist train of thought: “woman,” “home,” “half,” “voice,” “ways” (might be said to) fit this first cluster, with a variety of negative judgments providing a justification for my “antifeminist” claim. Flogging the biographical horse a bit more, we might argue that this misogyny was motivated by an early life largely dominated by his mother and aunts, who raised him after his father and grandfather passed away; he spent much of his life a recluse in their home, influenced “evening” and “morning,” “general[ly]” and “complete[ly].” “Boston” would then be justified in its inclusion in the cluster as the location at which he attended a journalists’ convention several days after his mother’s death, which greatly expanded his social circle and coincided with the beginning of a period of increased creative output that led up to what might be thought of as his “mature” writing phase. (But this feels to me the least cohesive reading in this list.)
  • Topic 22 might be taken to once again express a tension between the epistemological claims of traditional religion and modernizing scientific thought.
  • Topic 23 might plausibly be read to express anxiety produced in response to the crossing of (generally never questioned), which I tend to see as a major characteristic of horror. More specifically, in this case, the boundaries in question are biological and generational boundaries. It appears prominently (i.e., top 5 topics) in only 3 of the stories under consideration: “Herbert West, Reanimator” (the eponymous character is a college Frankenstein); and “Arthur Jermyn” and “The Lurking Fear,” both of which are concerned with human-ape hybridism.

And that, there, is a set of preliminary observations about what I found topic modeling these texts.

Some Thoughts on the Politics of Coding

A problem: some preliminary remarks

I’ve been thinking tonight about Joris van Zundert’s post in Humanist 28.1, in which he asks how much humanities-oriented academia values the work of software builders. Van Zundert’s post responds to a previous question by list moderator Willard McCarty about how the increasing availability of build-it-yourself coding frameworks are changing the nature of the relationship between digital humanists and the people who build the kind of software that digital humanists use — or, as McCarty puts it, whether “the boundary between scholar and technical builder is moving.”

Van Zundert replies that it is not: “in the reality of projects I see the traditional scholar battling his turf to the bitter end. Anything as long as he does not have to seriously look at new technologies and methodologies.” As a very junior scholar and someone just beginning to dip my toes into … well, into the shadow of the big-DH tent … I want to avoid taking a stand on this particular issue: my own department, the Department of English at UC Santa Barbara, has a specific emphasis on digital humanities, and I’m not yet sufficiently well-established in my field to say that I have a fair perspective on the field as a whole. Rather, what I want to do tonight is to comment on a specific feature of van Zundert’s argument, because I take it to be rather prevalent in the digital humanities and in other fields that have an ambiguous relation to coding. So I think it’s worth quoting and summarizing some parts of van Zundert’s argument so that we can look closely at its features. Van Zundert’s post tells a story (“not my story,” he says explicitly)

about a man who […] is a builder of software. […] Our hero sees a concrete problem in the scientific workflow of the [humanities] scholars. His experience and expertise tell him he can solve it. The solving will involve true research to guarantee the sophistication and validity of the solution.

Once the software is developed,

a working solution is presented that not just solves the problem, but also identifies some aspects of the problem that clearly demarcate the boundaries between what a solvable problem of this type in humanities is and what remains as grounds for interpretation, yielding scholars much information about the current limits of formalization of their epistemics. A concrete problem is solved, effort for a labor-intensive task can be decimated. What used to take weeks, months, can be put forth in mere milliseconds. Only the willfully blind would not recognize the thus created potential to reallocate resources to scholarly research by eradicating an error prone and dull, yet scholarly skilled task.

I take this to be a fair description of the way that humanities scholars have traditionally viewed the development of software that assists them in performing the kinds of tasks that van Zundert has identified. After all, humanities scholars engage in detail-oriented work on massive scales all the time. I think of my own discipline for examples of this and can find numerous places where massive automated data collection can benefit more traditional textual scholarship: the way that close readings of texts can be contextualized by automated analysis. The way that hypotheses based on nuanced readings of sets of selected texts can be augmented by massive automated analysis of more texts than any scholar could possibly read so as to test hypotheses. The manner in which problems that are difficult to solve for human scholars simply because those human scholars, no matter how diligent, are imperfectly suited to these tasks (because of some combination of factors surrounding the tasks: say, tasks that are extremely detail-oriented, boring, time-consuming, and numerically oriented) can often yield to machine-based analyses. I think, for instance, of Ryan Heuser and Long Le-Khac’s automated analysis of 2,958 British novels published during the long 19th century, and their exploration of their starting premise, that “one promise of digital humanities is leveraging scale to move beyond the anecdotal” (4).

But van Zundert’s story does not display the optimism of Heuser and Le-Khac’s analysis: his unnamed protagonist discovers that

[i]f it is hard to value the labor involved with curating information as a scientific task, it is even harder for them to see how automation of such basic tasks would constitute research. Yes, it is important it should happen, no it is not research that we recognize. […] Our respectful scholars are not able to recognize the scholarly merit and quality of the software that our protagonist puts forward. Yes the great effort needed for a scholarly task is sincerely reduced. We see that, they say. But we can not see the work this man has done. We can not establish its scholarly correctness. And besides, this is a primitive task in the greater scholarly work. This man has not given us any synthesis, no broader scholarly perspective, no reasoning and argument on paper in a humanities journal.

Again, I take this to be a fair description of current difficulties involved in evaluating the scholarly value of coding work from the perspective of humanities institutions: How is a humanities department to allocate funding for this type of work? How does it contribute to the coder-scholar’s rise on the academic ladder? How is a department composed largely of people who do not have this specific, highly specialized skill able to evaluate the scholarly correctness of the automation of a basic scholarly task? Do our institutional values support the development of tools that automate the difficult and boring parts of our work so as to free us for the engaging work of analysis, synthesis, interpretation?

These are hard questions, and I do not propose to answer them here, though I think answers are needed. But what I would like to examine is a presumption in the post that I take to be typical of those people who walk the line between coding, tool use, and more traditionally oriented humanities scholarship: the assumption that the building of a tool is necessarily a service that unequivocally works for the good of the profession as a whole.

As van Zundert puts it, “we expect our hero to be celebrated, respected, recognized for his scientific interdisciplinary achievement.” I would like to suggest that, for many pieces of software currently enjoying cultural cachet under the big tent that describes itself as “digital humanities,” this is quite a rose-tinted view of the actual accomplishments of software built for the purpose of aiding digital research in the humanities, and it overlooks some important political questions.

This particular way of glancing at software through rose-colored glasses is hardly unique to van Zundert; the digital humanities represents many pieces of software that it uses in similar terms. The website for Gephi, for instance, starts to describe the network visualization software by saying that it “is a tool for people that have to explore and understand graphs,” that it allows its user to “profit from the fastest graph visualization engine to speed-up understanding and pattern discovery in large graphs,” that it is “[u]ser-centric.” Gephi also asks for donations by saying that those who donate “[h]elp us to innovate and empower the community.” Similarly, in discussing the MALLET software for topic modeling, Andrew Goldstone and Ted Underwood write, “we argue that the mere counting of words can redress important blind spots in the history of literary scholarship, highlighting long-term changes […] that were not consciously thematized by scholars” (3).

There are plenty of other examples of software making grandiose claims for its own utility and importance, and even more examples of enthusiastic users making grandiose claims about the software that they use. And I would like to say up front that I believe that both Gephi and MALLET are, in their way, based on my lamentably limited interactions with them, useful and important pieces of software that have a lot to contribute to digital approaches to literary studies. But I do think that they elide some important questions:  Who are the people whom Gephi helps to understand and explore graphs? Who profits from the fastest graph visualizations? Around which users is Gephi’s “[u]ser-centric” architecture centered? Who is able to use the software that engages in the “mere counting of words”? Who are these “willfully blind” people who do not recognize the implied value of van Zundert’s coder’s contribution? Who are the members of the “all of us” set in the implicit assertion that these works “benefit all of us” that underlies so many of these claims?

Ted Underwood begins to address this problem in a blog post, in which he writes:

The models I’ve been running, with roughly 2,000 volumes, are getting near the edge of what can be done on an average desktop machine, and commonly take a day. To go any further with this, I’m going to have to beg for computing time. That’s not a problem for me here at Urbana-Champaign (you may recall that we invented HAL), but it will become a problem for humanists at other kinds of institutions. (3)

But I don’t think that this acknowledgment goes far enough toward acknowledging the kind of access problems that accompany DH-oriented software. Yes, there are access problems with being able to practically run topic modeling on a corpus of 2,000 texts that make it difficult for humanities scholars at other institutions to replicate the massive-scale textual experiments that Underwood envisions; but there are also a host of other potential places where access is a problem.

An example: Gephi, a Java-based program

One of these: digital humanities software is quite often difficult to install, configure, and use, and this is often the result of specific coding decisions that developers make. Even more significant, I think, is the way that DH-oriented software often involves particularly troubling trade-offs — of computer security and of computer resources — in order to run at all.

I’d like to take Gephi as a particular instance of this set of problems, because it illustrates a major subset of them particularly nicely. I’ve blogged about trying to get Gephi working on my computer at my personal blog, so I’ll just summarize here what I’ve described in more detail there: Getting Gephi working under Linux is a mess. I haven’t tried to get it working under other platforms, but a quick search for “java” of just the “installation” section of the Gephi forums returns, at the time of this writing, 369 hits in the 96 posts in the “installation” forum, many of which mention either Windows or OS X. That is to say, quite a few people had enough trouble installing Gephi to come to the forums and post a request for help that included the word “java” in the problem description or which included the word “Java” as a proposed solution. Some of them almost certainly had to take extra steps to register for an account on the Gephi forums before they could post. (In fact, it’s likely that most of them did — people who are trying to install a piece of software are some of the least likely people to have an account already on that piece of software’s support forums.) I would suggest that many more people than these 369 are likely to be having that problem: there will be those who didn’t post a new topic, but just wrote something along the lines of “this affects me too” in an existing thread; there will be those who couldn’t install the software and just gave up without asking for help; there will be those who searched the forums, plowed through the posts already made, and found a solution. There are algorithmic ways to approach some of these questions — say, scraping the results that turn up when a search is conducted and counting the unique users who post in these topics — but what I would like to suggest at this point is that this problem affects many people.

Which means that it’s a real problem. Which means that it is a real barrier to software usage. This is a conclusion that can be reached fairly even without digging into the details of the problem reports, but these are worth looking at, too. Here are some recently active posts that I take to be symptomatic of broader problems:

  • Here, someone complains that reinstalling Java broke Gephi entirely. Their solution is to downgrade to a lower version of Java, but another user reading the thread in hopes of finding a solution complains that he cannot downgrade his Java version because other applications with which he works depend on the latest version of Java.
  • Here, another user is unable to get Gephi running. The forums provide her with enough information to get the program running, but another user says that the solution did not work for her. No further solutions are provided in that thread.
  • Here, a teacher is unable to use Gephi in his/her course because the local IT folks won’t install a version of Java old enough to get Gephi to run, because this would introduce security problems into the lab computers.
  • Here is a Linux user trying to run Gephi. Nothing happens. No support is provided. There is no indication that the problem is resolved.

There are lots and lots and lots of other posts about other Java-related issues. But even from this set of data, a number of conclusions can be drawn.

For one thing, Java problems seem to be a real barrier to entry for those using Gephi. I will go so far as to say that Java is a bad choice of development environment merely for this reason: the necessity to install and maintain an interpreted-code environment (or a just-in-time compilation environment for code, as modern Java implementations provide) adds an additional overhead maintenance burden to the user. Even when configuring specific versions of Java isn’t particularly challenging for the user, it sucks up user time (even in small amounts), and downloading and installing the initial environment along with whatever other software packages are necessary adds a small additional burden to the maintenance of the user’s system. This burden — installing and updating a Java environment — also takes up space on the user’s hard drive and requires downloading updates, taking up the user’s bandwidth. Neither bandwidth nor hard drive space is likely to be particularly tight for a contemporary user in (what I take to be) the program’s likely target group — academics and other professionals running Gephi on laptop and desktop computers — but building the software with this assumption implicitly restricts the software’s usability, and I think that this is itself a reason for reconsidering the common belief that free and open-source software is a “gift to the world” in a general sense.

After all, Java introduces a real overhead — recompilation time, extra processor resources required to recompile or interpret Java bytecode, extra hard drive space, bandwidth required to install and update the environment that allows the Java program to execute. Again, this is not likely to impact (what I take to be) the intended target group: hard drive space is probably not going to be at a premium for upper-class and upper-middle-class professionals, just as bandwidth is unlikely to be restricted in an inconvenient way, either in terms of total transfer during a billing period nor in terms of unreasonably low maximum possible transfer rates. But I’d like to think about people who don’t fit into these categories: what about people who want to perform network analysis with Gephi who aren’t upper-middle-class professionals? What are the opportunity costs for them?

I’m thinking specifically here of the applications that Gephi might have for secondary education and the degree to which funding for technology-purchasing funds are scarce to nonexistent for schools in low-income neighborhoods. While I am not suggesting that Gephi should be developed in such a way that the Apple II is a target platform, I do want to ask: What about the Raspberry Pi, the computer that has been touted as an inexpensive solution for computer education for schools on a limited budget? It’s not that it’s not theoretically possible to run Gephi on the Raspberry Pi — but it is the case that hardware limitations restrict what can be done with the software, and that the Java-based overhead required by the software design decisions makes the limitations of the hardware environment even more restrictive. For instance, the extra disk space required to install a Java environment may not be inconvenient for me, with my 2-terabyte internal laptop hard drive, but I suspect that this is a different story for schools in poor neighborhoods running Raspbian Linux from an SD card whose capacity is measured in single- or double-digit numbers of gigabytes, in which case installing a recent Java runtime environment may be a deal-breaker. Similarly, running Gephi on a Raspberry Pi with 128 MB of Ram restricts the program to performing operations on networks with no more than 1000 nodes and edges, which makes the program, practically speaking, merely a toy in these environments, incapable of doing many kinds of serious tasks.

Similarly, what about the possibility of using Gephi in less-developed countries, where recent hardware is less easily available? Notably, Gephi’s system requirements state that

Gephi uses an OpenGL 3D engine to speed up graph visualization. However a compatible graphic card is required. If your graphic card is older than 5 years, or if your laptop doesn’t have a dedicated graphic card, you may have to upgrade your hardware to run Gephi.

It’s worth saying here that this rules out the use of Gephi on older hardware past a certain point: the developers have made the particular decision to trade for increased processing speed for users who have newer hardware at the expense of making Gephi entirely unusable for those using older hardware. This is a particular political choice, favoring the economically privileged in the global economy: it is a choice to make Gephi more usable for the privileged at the cost of making it entirely unusable for (some of) the underprivileged.

It’s also worth thinking about how Internet service works in other countries: though always-on Internet access with comparatively high data transfer rates and no overall bandwidth caps has become the norm in developed countries, there are still places where dial-up is a common way to get onto the Internet, where Internet access cannot be assumed to be constantly available,  where exceeding a bandwidth cap will result in large overage charges, or where other infrastructural challenges are a necessary part of connectivity. In these cases, scheduling and executing the necessary updates to Gephi and its required Java environment are substantial impediments to using Gephi.

Some conclusions

My point here is not to pick on Gephi, though my own experience with it has convinced me that it’s a particularly egregious violator of what I think of as good software-development practices; my point is to take it as a case study of what the unconsidered implications of coding practices are. (A smaller example might have been built from the way that MALLET, a text-analysis toolkit, expects that data files will exist in the same directory as the program files themselves, which is also a bad practice: users should have the ability to organize their data as they see fit, and software should respect that choice.)

Nor is it the case that I don’t understand the attraction of Java as a development platform: I understand that Java is intended to provide the opportunity to write code that can be run under (more or less) any combination of hardware and operating system. Circumventing the problem that application code normally has to be written and compiled for a particular operating system on a particular hardware platform, Java intends to offer a Write Once, Run Anywhere experience for developers and ameliorate the burden of adapting the program code to different environments. But Java has made some bad decisions over its nearly 20 years as a major language. A number of these are summed up by Eric S. Raymond in his book The Art of Unix Program (which he has made freely available online), when he provides a general evaluation of Java as a general-purpose programming language. In part, Raymond says:

Against Java, we can say that (compared to, say, Python) some parts of it appear over-complex and others deficient. Java’s class-visibility and implicit-scoping rules are baroque. The interface facility avoids complex problems with multiple inheritance at the cost of being only slightly less difficult to understand and use in itself. […] While Java’s I/O facilities are very powerful, simple reading of text files is not simple.

I think that it’s worth pointing out that this is precisely the trap that Java development tends to fall into — trading off platform-independence for complex coding requirements — and that I think this is the problem that Java-based projects often run afoul of once their code reaches a size where it’s no longer practically possible to abandon Java as the development environment because of the amount of existing code written in the language: the language’s opacity has made debugging difficult, and the coders have tied themselves to a development environment whose standards are always gradually evolving  — and evolving under the aegis of organizations that don’t necessarily take the needs of these particular organizations into account. Java changes a fair amount from version to version (here is a list of incompatibilities between Java 7 and Java 6; those who want a sample of the kinds of arcane problems that Java developers need to deal with may want to dig through Vladimir Roubtsov’s What Version Is Your Java Code?), and end-users of Java are encouraged to update to newer versions of the Java Virtual Machine, the program that runs Java programs, as soon as possible (for many users this is a more or less automatic process), because this also fixes security problems. But these security fixes drag incompatibilities along with them, requiring that existing Java-based applications be updated to continue to be usable. Java developers are thus caught in a trap where they are constantly required to update their code to meet an evolving set of requirements, even though they have little to no input on what those requirements are. (It might be worth noting that many operating systems are more careful about ensuring that applications will continue to run when the operating system is updated than Java is.)

Raymond continues:

There is a particularly invidious problem, resembling Windows DLL hell, with libraries. Java has no method to manage different library versions. This can create huge problems in environments like application servers, where the server might come equipped with one version of (say) an XML library, but the application ships with a different (usually newer) version. The only handle on such problems is the CLASSPATH environment variable, a source of chronic deployment problems.

This is, of course, precisely the problem with Gephi: it requires that the user’s installation fall within a narrow range of version options, and the workarounds for this problem involve setting the CLASSPATH environment variable. This is an imperfect solution: it requires that users who have updated to a newer version of Java (perhaps because they use multiple Java-based applications requiring different Java version) continue to maintain an installation of an older version. This exacerbates the problems already identified — downloading and installing updates for multiple versions takes more time and bandwidth; more storage space is required to maintain multiple environments — and makes the maintenance of the multiple environments more complex. Too, it requires that the user engage in comparatively complex system configuration by hand, and it requires that earlier versions of Java be installed, along with the security vulnerabilities that they may include. Java has a spotty security history; Ars Technica has said that “plugins for Oracle’s Java software framework have emerged as one of the chief targets for drive-by attacks,” and uninstalling Java entirely has been recommended by Ars Technica, by Twitter, and by the Computer Emergency Response Team at U.S. Department of Homeland Security; Apple blacklisted Java twice in three weeks in January 2013 in response to multiple security threats. (An argument has also been made that Java’s security model is itself fundamentally flawed.) Though Gephi may be immune to these vulnerabilities itself, its use requires that a vulnerable software environment be installed on the user’s computer, introducing vulnerabilities that the user may not understand even when Gephi is not running.

All of which is to say…

… that it’s worth thinking about what we offer when we offer software, and that users should think about what the trade-offs are when they install software. But this is simplistic: users often have a poor understanding of the technical trade-offs involved in making installation choices, and computers, despite the assumptions that some developers seem to make, are not just for coders — they should be usable for everyone. This implies a number of things: that security problems and other trade-offs should be proactively disclosed, for one (particularly substantial) thing; but there are other real implications: that the needs of the underprivileged should be taken into account in designing software; that open-source software should in fact be a gift to the world, and not merely to those who already experience privilege; that applications should allow users to structure data in ways that the user finds sensible, rather than demanding that the user structures his/her data storage in ways that the application can deal with easily; that configuration options and requirements should be well-documented, instead of relegated to user-run support forums; that coding decisions for open-source software should be made in ways that make future development efforts, including efforts that result in forks of the project, to be done easily and in intuitive ways, with a minimum of fuss.

Underlying all of these recommendations is a belief — my own strong belief — that software should not deform the underlying system to fit its needs, because the underlying system belongs to the user, not to the software. This is precisely the expectation that is violated by malware, which (so often) exploits the underlying system to make it a revenue-generating system in ways that are unacknowledged by the program at installation; but it is also the expectation that is violated by poorly thought-out but sincere open-source applications that represent themselves as gifts to the outside world.

Once again, Eric S. Raymond has anticipated me in the spirit of this demand: at the conclusion of 2003’s The Art of Unix Programming, discussing the challenges that POSIX programmers faced in developing software for a future that allowed for genuinely populist computer use, he wrote,

The problem is that we increasingly face challenges that demand a more inclusive view. Most of the computers in the world don’t live in server rooms, but rather in the hands of those end users. In early Unix days, before personal computers, our culture defined itself partly as a revolt against the priesthood of the mainframes, the keepers of the big iron. Later, we absorbed the power-to-the-people idealism of the early microcomputer enthusiasts. But today we are the priesthood; we are the people who run the networks and the big iron. And our implicit demand is that if you want to use our software, you must learn to think like us.

In 2003, there is a deep ambivalence in our attitude — a tension between elitism and missionary populism. We want to reach and convert the 92% of the world for whom computing means games and multimedia and glossy GUI interfaces and (at their most technical) light email and word processing and spreadsheets. We are spending major effort on projects like GNOME and KDE designed to give Unix a pretty face. But we are still elitists at heart, deeply reluctant and in many cases unable to identify with or listen to the needs of the Aunt Tillies of the world.

Raymond’s question for Unix programmers has, as the open-source and Unix programming communities have grown together and learned from each other since 2003, grown to be a central question for all open-source programmers: who owns the computers, and for whom are they used?

(Selected) (Print) References

Goldstone, Andrew, and Ted Underwood. “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us.” preprint of article in New Literary History (2014): n. pag. Google Scholar. Web. 5 Nov. 2014. < https://rucore.libraries.rutgers.edu/rutgers-lib/43176/ >

Heuser, Ryan, and Long Le-Khac. “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method.” May 2012. < http://litlab.stanford.edu/LiteraryLabPamphlet4.pdf >

An automated sentiment analysis of Keats’s “Ode on Melancholy”

Alas. My Internet connection went out suddenly last night, leaving me without the ability to complete an assignment at the last minute.* Since the submission window for the assignment has now closed on the course website, I’m posting this here, both because the results are a bit interesting and as a gesture of good faith toward a professor for whom I have a great admiration.


So I’m interested in Stanford University‘s automated natural-language processing sentiment analysis tool, called (appropriately enough) Sentiment Analysis, and I’m interested specifically in seeing what its boundaries are. So I’m going to run one of John Keats’s six great odes of 1819, the Ode on Melancholy, through it and see how well it works. My initial prediction is that Sentiment Analysis will likely have trouble of some kind and to some degree with several aspects of the text, in one way or another, and part of the intent of this experiment is to test this hypothesis and see how it plays out. These aspects of the text that I hypothesize will be problematic are:

  1. it’s a poem, whereas the other examples I’ve seen run through software of this type have all been prose.
  2. it uses archaic (and “elevated” and “poetic”) diction, whereas other examples of sentiment analysis that I’ve seen have used everyday, contemporary language.
  3. It personifies several key emotions, treating them as proper names rather than unambiguously direct descriptions of emotional states. I’m curious to find out how this will affect the processing of the text in this regard.

I picked this particular poem in part because I know it well — I’ve been known to recite it off the top of my head in front of undergraduates, and it appears on the reading list for my dissertation’s (as-yet-unwritten) prologue — and because it already has a strong relationship to “sentiment” and to related features: emotion, affect, etc. After all, it’s the Ode on Melancholy. I wanted to see what the Sentiment Analysis program makes of it. Too, I wanted to work with a short text with complex syntactic structures, and “Melancholy” certainly qualifies. Besides, it’s on my mind in particular right now because we’ve just read Cleanth Brooks on Keats’s “Ode on a Grecian Urn” for class this week. And finally — to show my hand a bit — there’s a particular resonant discordance that exists when lining a poem by John Keats up with this particular type of automated reading tool, and I’ll talk about this late in this blog post.

A few preliminary words on the poem, in combination with some initial predictions, are in order. A thumbnail reading of (the aspects of) the poem (that interest me) might go like this, if we allow for what Cleanth Brooks called “the heresy of paraphrase”: the poem analyzes a state of mind tagged “melancholy” in the nineteenth century (and earlier, for that matter). Keats’s Ode is written when the belief that melancholy is caused by an “imbalance of humours” has lost a great deal of cultural and scientific currency, and this is no longer an unquestioned assumption of then-contemporary medicine; but he writes before the terminology “melancholy” had begun to be displaced by the later diagnosis of “depression.” In a very rough sense, then, we can take “melancholy” to be Keats’s word for “depression,” provided that we attach a number of provisios: it needs to be detached from our own contemporary understandings of neurobiology, for instance, and understood, at least to some degree, as something more of a trait than a state, though there’s a lot of blurring in that distinction. In summary: we might expect the overall evaluation of the poem to result in a judgment of “negative” or “very negative” (two of the Sentiment Analysis algorithm’s five possible evaluations) if we’re just taking the title uncritically as an indication of “what the fellow is really talking about”: “It’s an ode to [if we forget that the title’s preposition is ‘on’] melancholy! What else would we expect?”

Of course, this is not a fair way to take the title at all. This is partly because the title includes the preposition “on,” not “to,” as do the titles of two more of Keats’s six great odes. The poem is not merely an ode “in praise of” melancholy; it is an ode in the broader sense of being a poem in elevated language that meditates on a topic. Though I don’t want to take the time to discuss this here in detail, it’s worth pointing out that “Ode on Melancholy” also follows the ode genre’s originary strophe-antistrope-epode form. In this form, a topic is considered from a particular point of view in the first portion (the first ten lines of “Melancholy” provide an exposition of despair and contain a series of elevated injunctions against various specific ways of committing suicide); this position is then problematized by a reply from a different viewpoint in the second portion (the second stanza of “Melancholy” consists of a series of recommendations that the speaker suggests as ways of dealing with affective description that are more productive than suicide). The final section consists of a conclusion that considers, balances, and integrates both viewpoints (the third stanza of “Melancholy” takes a position on the melancholic position showing it to be a position that enhances an appreciation of the opportunities life offers — at least, this is a very crude paraphrase that’s made possible only by stretching the earlier provisio that we’re going to allow the heresy of paraphrase, and by stretching this as far as possible). In a broad sense, then, Keats’s “Melancholy” follows a dialectical formula in which the problems of melancholy are expounded upon, then re-framed, and where this is followed a resolution that takes a more abstract viewpoint that integrates the viewpoints of both earlier stanzas into a broader “philosophical” position.

So let’s add one more hypothesis to test to the exercise’s goals:

  1. I want to see whether Sentiment Analysis can pick up on the overall progress of Keats’s discussion of melancholy as I’ve sketched it out here, and whether its machine reading resonates with and supports my human reading, or whether there might be other features of the text that I’ve missed.

And, having set up some initial conjectures, here’s the series of experiments I ran with Sentiment Analysis.

First experiment: just running the poem through

I think it’s worth looking at the poem itself quickly here:

1

No, no, go not to Lethe, neither twist
Wolf’s-bane, tight-rooted, for its poisonous wine;
Nor suffer thy pale forehead to be kiss’d
By nightshade, ruby grape of Proserpine;
Make not your rosary of yew-berries,
Nor let the beetle, nor the death-moth be
Your mournful Psyche, nor the downy owl
A partner in your sorrow’s mysteries;
For shade to shade will come too drowsily,
And drown the wakeful anguish of the soul.

2

But when the melancholy fit shall fall
Sudden from heaven like a weeping cloud,
That fosters the droop-headed flowers all,
And hides the green hill in an April shroud;
Then glut thy sorrow on a morning rose,
Or on the rainbow of the salt sand-wave,
Or on the wealth of globed peonies;
Or if thy mistress some rich anger shows,
Emprison her soft hand, and let her rave,
And feed deep, deep upon her peerless eyes.

3

She dwells with Beauty—Beauty that must die;
And Joy, whose hand is ever at his lips
Bidding adieu; and aching Pleasure nigh,
Turning to poison while the bee-mouth sips:
Ay, in the very temple of Delight
Veil’d Melancholy has her sovran shrine,
Though seen of none save him whose strenuous tongue
Can burst Joy’s grape against his palate fine;
His soul shall taste the sadness of her might,
And be among her cloudy trophies hung.

For this experiment, I just copied the poem from a post on my personal blog (which was in turn sourced carefully from my edition of The Complete Poems of John Keats [fill in publication details when I get home]), removed the stanza numbers, and ran it through Sentiment Analysis. A screencap of the output, with all lines expanded, is here (note: it’s 1205 x 14907 pixels and 1.5 megabytes!), and the machine-parsable version is available here (79.6 kilobytes).

What I noticed first was that Sentiment Analysis seems, indeed, to treat each line of the poem as a separate semantic unit. Keats’s individual stanzas are each only one sentence long, though each stanza is a syntactically complex sentence with multiple independent clauses, so what Sentiment Analysis is doing here is performing a semantic analysis of sentiments not on a sentence at a time, but on one line at a time. Whether this constitutes a problem depends on what we take “the point” of the exercise to be, it seems to me: there is no reason to think that the only valid unit of analysis is the sentence or the clause, and looking closely at the composition of each line is a worthwhile exercise on its own, I think. Indeed, the individual parse trees for each line are actually illuminating as analogues with those sentence-diagramming exercises that a lot of us will remember from grammar school. The dangers here, though, are two: first, uncritically taking the analysis as “saying something authoritative” about the structure of the poem (or each individual stanza) as a whole; second, failing to appreciate that what is being diagrammed are in fact individual lines and that they have been stripped of their context within other structures of meaning in the poem. Indeed, perhaps the most productive way to think about them is as an instance of what Jerome McGann refers to as “deformative readings” in Radiant Textuality. (About which more, perhaps, another time.) But what I would like to suggest at this point is that there is a value in re-encountering the individual line as a semantic unit that exists in tension with the larger-scale grammatical structures that produce the semantic meanings of the poem in larger blocks — that the exercise helps to encounter these units in a fresh way, without yielding to the hermeneutic pressure motivated by the poem’s (occasionally more, occasionally less) enjambed structures.

Also notable was the overall distribution of the “Sentiment” of individual lines: only lines 1, 10, 14, 21, 22, 23, 25, 29, and 30 have an overall “sentiment” rating at all. Lines 1, 10, 14, 23 are judged to be “negative”; lines 21, 22, 25, 29, and 30 are judged to be “positive.” No lines were judged to be “very positive” or “very negative” by the analysis. The rest of the lines — 2–9, 12–13, 15–20, 24, and 26–28 — had no “sentiment” rating assigned at all. Nor is the basis of these decisions immediately and transparently clear: some lines (2, 3, 7, 8, 9, 11, 12, 15, 17, 18, 20, 21, 27, 28) have no overall sentiment rating assigned, even though there are ratings assigned to individual words in those lines, and often these seem to exhibit a clear pattern; while other lines (14) have an overall sentiment rating that clashes with what seems to be the prevalent rating assigned to individual words in the line; and lines 22, 29, 30 have an overall sentiment rating assigned to the line, even though no words on those lines have an individual sentiment rating. In some cases these can be provisionally explained: In line 21, for instance, it seems fair for the algorithm to let the two instances of the word “Beauty” and the (somewhat ambiguously) positive “dwell” outweigh the single negative word “die” … but I think that an algorithmic explanation of how some of these other anomalies deserves explanation. (Why does the “rich” of line 18, which is the single positive word that the algorithm identifies, outweigh the single negative, “anger”? Is being rich more significant than being angry, especially given the multiple possible meanings of “rich,” which we might reasonably expect the problem to be nervous about interpreting?)

A partial list of words I’m surprised didn’t generate a sentiment weighting at all:

  • adieu (arguably, if Keats was using this word 195 years ago, it can be thought to have entered English parlance by now. And is not parting such sweet sorrow?)
  • bane (surely a negative word)
  • death-moth (if nothing else, this could be decomposed, I think; “Wolf’s” is decomposed into “Wolf” and “’s” in line 2)
  • drown (when is this ever positive?)
  • feed
  • kiss (certainly this is more commonly positive than negative, I would think)
  • nightshade
  • nor
  • poison (when is this a positive thing?)
  • poisonous
  • rose
  • sadness
  • shrine
  • shroud
  • weeping

Second experiment: collapsing sentences into single lines

For this iteration, I collapsed each sentence (i.e., stanza) into a single line and replaced what had been capital letters on the second through tenth lines of each stanza with lowercase letters. This yielded the following text, which I submitted to the algorithm:

No, no, go not to Lethe, neither twist wolf’s-bane, tight-rooted, for its poisonous wine; nor suffer thy pale forehead to be kiss’d by nightshade, ruby grape of Proserpine; make not your rosary of yew-berries, nor let the beetle, nor the death-moth be your mournful Psyche, nor the downy owl a partner in your sorrow’s mysteries; for shade to shade will come too drowsily, and drown the wakeful anguish of the soul.

But when the melancholy fit shall fall sudden from heaven like a weeping cloud, that fosters the droop-headed flowers all, and hides the green hill in an April shroud; then glut thy sorrow on a morning rose, or on the rainbow of the salt sand-wave, or on the wealth of globed peonies; or if thy mistress some rich anger shows, emprison her soft hand, and let her rave, and feed deep, deep upon her peerless eyes.

She dwells with Beauty—Beauty that must die; and Joy, whose hand is ever at his lips bidding adieu; and aching Pleasure nigh, turning to poison while the bee-mouth sips: ay, in the very temple of Delight veil’d Melancholy has her sovran shrine, though seen of none save him whose strenuous tongue can burst Joy’s grape against his palate fine; his soul shall taste the sadness of her might, and be among her cloudy trophies hung.

I should say immediately that this is already a reinterpretation of Keats’s text in some important ways: by removing line breaks, I’ve removed an important form of punctuation; and by removing line indents, I’ve discarded a set of implicit “tags” that not only (perhaps) provide oral performance notes, but have also obscured the ode’s historical relationship to the form’s — and this particular poem’s — roots in ancient Greek musical performance. Too, it now “looks like” prose, which is misleading in a number of subtle but important ways. But I’ll deform the poem more in the next experiment, so I’ll just move on and say that here is the graphical representation of the parse tree (1205 x 2167 pixels, 604 kilobytes), and here is the machine-parsable output (87.4 kilobytes).

What’s noticeable here? A number of things, actually, especially in comparison with the previous results. Perhaps most immediately apparent is the fact that the algorithm is doing a better job of parsing what I will (admittedly roughly) call the “prose-like” semantic structures of each individual sentence-stanza; it is able, for instance, to let the computed sentimental value of the words it can evaluate propagate to higher levels of the semantic tree. Immediately noticeable to me, then, is that each stanza gets an overall “sentiment” ranking: the first stanza is ranked “very negative” — this is certainly not surprising; it’s a series of injunctions to the listener to refrain from committing suicide. But this is the first time in this analysis that any semantic structure has been ranked “very negative,” suggesting that achieving this ranking requires a number of previous rankings of “negative” — or, at least, that having a number of lower-level rankings of this nature is a way to move more quickly toward this ranking for higher-order semantic structures.

There’s a lot that could be said about the individual parse trees here and how they differ from the first experiment, but I’ll confine myself to one observation, in the interest of brevity: this is simply that the larger-scale parse trees incorrectly identify a set of grammatical structures in the stanza-sentences that falsely construct a structural “movement” in the poem. In more detail: the first stanza results in a parse tree that is more or less visually balanced, because the stanza’s major grammatical structures are roughly balanced; there are two large-scale “chunks” of four and six lines that the algorithm takes (correctly, I think) to be the top-level grammatical structure of the sentence-stanza. The second stanza, however, has the same structure — four lines ending with a semicolon, then six more lines — but the increased syntactic complexity of the second group of lines produces a visual effect that makes the stanza seem to be “back-loaded” in a way that the syntactic structures themselves don’t support. This is a problem of the hermeneutics applicable to the visualization; we (well, I, anyway) expect (perhaps unfairly) that visualizations will reveal structures in a more or less transparent way; this is often why we employ them: to throw structural features into immediate relief so that we can notice features that aren’t immediately apparent. But this visualization requires more careful attention to its details than I think we expect visualizations to require.

However, processing the text in this way before running it through Sentiment Analysis reveals an interesting feature that confirms my own thumbnail sketch of a reading above: the poem’s emotional tone develops from one stanza to the next, resulting in machine readings of the stanzas as “very negative,” “negative,” and “positive,” respectively. That is to say that, despite the numerous words that aren’t processed at all, the algorithm’s hermeneutics, when the text gets some preliminary hand-holding, pick up on an important emotional feature of the poem’s structural and intellectual dialectic that took me multiple readings to theorize explicitly.

Third experiment: modernization of spelling and punctuation

In this pass, I adapted the prose adaptation from the second pass further by regularizing Keats’s spelling and punctuation to (what I take to be) contemporary American usage. There are two major assumptions that I’m making here. First, I’m hypothesizing that the adoption of modern spelling will facilitate increased automated understanding of the adapted text, because the majority of texts that the program has encountered in the past are likely to have originated in the contemporary era (and the program is gradually being trained by feedback from its users). Second, I’m hypothesizing that adapting Keats to (what I take to be) contemporary American punctuation may facilitate the processing of grammatical structures by a program (presumably) written by Americans.

I’m also de-capitalizing the personified emotions in “Melancholy” to see whether that affects the algorithm’s evaluation of their emotional tone. Capitalized “Joy” in the third stanza, for instance, has in previous experiments been taken as a neutral word. On reflection, this seems to me to be a sensible decision for a program designed to engage in processing of contemporary text: “Joy” is also a name, and when it occurs in this way, it seems that the safest assumption for the algorithm to make is that it has no emotional connotation. (We can think of Joy/Hulga in Flannery O’Connor’s “Good Country People” for a strong implicit argument for this, if a literary example is thought desirable.) But, after all, the fact that someone named “Joy” may very well be angry or depressed “in real life” isn’t really relevant to the question of personified abstractions in the same way that it is to an evaluation of O’Connor’s story: it seems clear on even brief examination that Keats’s personification “Joy” is likely to be most profitably taken as, well, a personification of that emotion, and therefore to have emotional content. I want to see if de-capitalizing the noun has that effect.

Here is the text that I submitted for processing this time:

No, no, go not to Lethe, nor twist wolf’s bane, tight-rooted, for its poisonous wine; nor suffer your pale forehead to be kissed by nightshade, ruby grape of Proserpine. Make not your rosary of yew-berries, nor let the beetle, nor the death-moth be your mournful Psyche, nor the downy owl a partner in your sorrow’s mysteries. For shade to shade will come too drowsily, and drown the wakeful anguish of the soul.

But when the melancholy fit shall fall, sudden from heaven like a weeping cloud that fosters the droop-headed flowers all, and hides the green hill in an April shroud, then glut your sorrow on a morning rose, or on the rainbow of the salt sand-wave, or on the wealth of globed peonies; or if your mistress some rich anger shows, imprison her soft hand, and let her rave, and feed deep, deep upon her peerless eyes.

She dwells with beauty—beauty that must die, and joy, whose hand is ever at his lips, bidding adieu, and aching Pleasure nigh, turning to poison while the bee mouth sips: yes, in the very temple of delight, veiled melancholy has her sovereign shrine, though seen of none save him whose strenuous tongue can burst joy’s grape against his palate fine. His soul shall taste the sadness of her might, and be among her cloudy trophies hung.

This is admittedly a much more radical reinterpretation of Keats’s poem than the previous alteration, and yet I’ve tried to take a middle-of-the-road approach in the kinds of interpretive choices I’ve made: I’ve broken up sentences occasionally where (I feel that) contemporary American usage would find Keats’s structure very unweildy, yet I haven’t moved adjectives before the nouns they modify. There are several other problematic moves that I’ve made that I’ll pass over in silence here, as well. A more radical contemporary American “prose translation” would be interesting to scan with Sentiment Analysis, but I’m going to skip it. At least for now.

Here‘s the graphically rendered parse trees (which, noticeably, result in a new tree for each sentence, not each paragraph). And here is the machine-parsable output.

I find one thing primarily notable here besides the fact that the algorithm is processing individual sentences, not paragraphs: de-capitalizing nouns does, in fact, affect their machine-perceived emotional content. Lowercase-D “delight” is perceived (appropriately, I think) as “very positive,” as is “joy” when it appears with an initial lowercase J.

Unconducted experiments

As I’ve run out of time to play with this particular tool tonight, I’d like to suggest briefly that a few things that I don’t have time to play with would be interesting experiments to run:

  • Removing punctuation entirely and seeing how the parsed tree structure looks, as well as what the overall evaluation of the tone is.
  • A series of transformations of the poem in which vocabularies and sentence structures are adapted to contemporary American prose usage in various ways:
    • Simply moving adjectives before the nouns they modify.
    • Using various vocabulary substitutions that approximate more and more closely Basic and/or Simplified English.
      • A humorous take on how this might play out might be inferred from xkcd comic #1133, “Up Goer Five.”
  • A set of experiments to see how the algorithm does when I’ve specifically trained it on how I think various non-evaluated words should be read. This would give additional insight into the workings of the algorithm itself … and would, arguably, be a small public-service contribution toward improving the algorithm itself. I’ve avoided doing this tonight for two reasons:
    1. I’m primarily interested — at least, this week, on my first encounter with the algorithm — in seeing how it does on its own. I think of this is a rather radical approach, especially in the first experiment, when no human assistance aside from typing was provided: what emotional structures can the algorithm detect without any more help than this?
    2. I think that the question of exactly how to rank various words, even within the rather coarse levels of granularity offered by Sentiment Analysis, deserves more thought than I have time to give it right now. Too, there is the consideration of how my rankings might influence the overall rankings of the program, which also deserves more thought than I have time to give it right now.

Ideally, of course, all of these experiments would be accompanied by thoughtful readings of what is actually happening to the performative aspects of Keats’s poem when the transformations are made. If, as Archibald MacLeish has said, “A poem should not mean / But be,” then what is the poem when these transformations are applied? This is not a simple question, and and there seems to be a fruitful opportunity here to explore the intersection of traditional close-reading techniques and more innovating machine-reading techniques.

And it’s worth saying that I wish I had more time to dig into the details of how the algorithm works, and that observing its emergent behaviors under more experiments would provide more insight into this.

Some provisional conclusions

On one hand, I think that the program does a fairly remarkable job of assessing emotional tone within its limited scope. (But perhaps this is — at least partly — because one of my experiments dovetails with some of my own preliminary hypotheses. Let us not rule out investigator bias.) The experiments, I feel, have resulted in basically successful and, in some ways, rather perceptive automated readings by the algorithm. I’m also pleased that the algorithm’s reading processes match mine in some basic ways, and tend to think that this confirms my own readings.

On the other hand, I think that there are a number of ways in which the algorithm is structured in ways that restrict its reading possibilities. The most glaringly obvious is that the algorithm “reads” only along a single axis (negative—positive) and with only five levels of granularity (very negative, negative, neutral, positive, very positive). Of course, we tend to think that sentiment is more complex than this: there is more than one axis along which it should be mapped, and there are more granular levels than the algorithm seems to appreciate.

A related (though somewhat more abstract) point is that the single-axis reduction of what might be called “the sentimental field” invokes a rather regrettable term, one that is loaded with a lot of unnecessary and harmful baggage in contemporary discourse: “negative.” I am objecting here specifically to the uncritical judgment that is associated with the knee-jerk rejection of something that gets labeled as “negativity.” Surely this label is preeminently idiotic in our own age: looking at how it is being used right now on Twitter — and looking at how quickly new results appear in that search — will almost certainly show that it is an excuse to avoid critique in favor of merely labeling and avoiding without engaging in critique. “Negativity” applies an abstracting suffix to something that is already a very abstract noun; and, it seems to me, the deployment of this term is inevitably a dismissive move that is used to shut down discourse by denying — and denying implicitly — that critique is meaningful in a discussion. But of course an honest search for truths, in whatever sense we understand that term, requires that any position be open at least in principle to critique and discussion; and so any rhetorical move that aims to close down the possibility of critique and discussion is a sin against the intellect — and a move that abstracts from actual data twice, while denying that the data needs to be interpreted, is triply a sin against the intellect.

More concretely, of course, what is shut down in the refusal to countenance even the possibility of critique is the opportunity for productive change, because productive change requires that the imperfect nature of an existing state of affairs be recognized. The possibility for this to happen depends, to a large extent, on the ability to state explicitly that the existing state of affairs is imperfect. If this potential for critique is shut down, then productive change becomes possible only serendipitously, when a sudden epiphany happens to descend from above. Or, to put it another way, shutting down critique on the basis of tone is poor critical thinking, because, as even a venture capitalist recognizes, responding to tone is a poor way to debate (look for level “DH2” in that rather short essay).

For all of these reasons, I find it regrettable that Sentiment Analysis chooses to label one of its conceptual axis “negative” — in a nutshell, I think it constitutes a failure to disentangle its automated analyses from the implied judgment that the features of language that it picks out at that end of the spectrum are “bad.” But this knee-jerk reaction that Sentiment Analysis (intentionally or otherwise) plays into has problems of its own. As Tom Scocca has put it:

Over time, it has become clear that anti-negativity is a worldview of its own, a particular mode of thinking and argument, no matter how evasively or vapidly it chooses to express itself. For a guiding principle of 21st century literary criticism, BuzzFeed’s [Isaac] Fitzgerald turned to the moral and intellectual teachings of Walt Disney, in the movie Bambi: “If you can’t say something nice, don’t say nothing at all.”

The line is uttered by Thumper, Bambi’s young bunny companion, but its attribution is more complicated than that—Thumper’s mother is making him recite a rule handed down by his father, by way of admonishing her son for unkindness. It is scolding, couched as an appeal to goodness, in the name of an absent authority.

The same maxim—minus the Disney citation and tidied up to “anything at all”—was offered by an organization called PRConsulting Group recently, in support of its announcement that the third Tuesday in October would be “Snark-Free Day.” “[I]f we can put the snark away for just one day,” the publicists wrote, “we can all be happier and more productive.” Is a world where public-relations professionals are more productive a more productive world overall? Are the goals of the public-relations profession the goals of the world in general?

(I might mention that I have a great admiration for this quite long article, and that much of what Scocca has to say is directly relevant here, and that I am quite looking forward to talking about this article with my freshman comp. students at the end of this quarter.) This is directly relevant to the question of how we read machine-produced readings, and related to the observations above about the expected hermeneutic transparency of visualizations: subcontracting the task of reading out to machines has clear benefits that Moretti has talked about [insert quote, or at least reference, here]; but, then, subcontracting the task of reading out, at all, either to machines or other humans, brings with it the associated dangers of dependence on the interpretations that those others, whoever they may be, produce. I do not mean to suggest that this means that we should jettison the possibilities of machine reading; I am merely noting that uncritical dependence on machine readings elides options for critical engagement directly with texts — and that the process of outsourcing this particular labor in itself makes it difficult to see what is being lost here. The opportunity that is missed is, in part, the opportunity to notice that something has been missed.

But there is another particular problem with reading a poem by Keats along a positive–negative axis, and any readers of this blog post who are British romanticists will almost certainly have been wondering when I’m going to get to the other half of this problem: the uncritical use of the word “negative” when talking about the poetry of John Keats is complicated by Keats’s specific way of bringing this discussion about “negativity” back to its starting point. I am thinking, of course, of his oft-cited and influential comment about “negative capability” in his letter of 21 December 1817 to his brothers, George and Thomas, in which he defines the phrase “negative capability” to mean

when a man is capable of being in uncertainties, mysteries, doubts, without any irritable reaching after fact and reason – Coleridge, for instance, would let go by a fine isolated verisimilitude caught from the Penetralium of mystery, from being incapable of remaining content with half-knowledge. This pursued through volumes would perhaps take us no further than this, that with a great poet the sense of Beauty overcomes every other consideration, or rather obliterates all consideration.

And it seems that one of the many things that I won’t be able to consider tonight in the depth that it deserves is the way in which this passage suggests a reading of the structure of “Melancholy,” though I would like to say that it does inform my earlier claim that the structure of the ode is “dialectical in a general sense,” and that the concept of negative capability suggests several things about the way in which – and the limits to which – it can be taken as “dialectical.” More, there are clues here about how the third stanza should be taken as reconciling the other two. But all I really have time to talk about tonight in any depth at all is the way in which Sentiment Analysis parses the particular poem by Keats. I will be the first to say that this is unfortunate.

Keats’s point – or, rather, the currently germane of the many substantial points that he makes in this short passage – is that closing off interpretive possibilities by engaging in knee-jerk reactions involves missing what he takes to be one of the major tasks of poetry, and that reductive interpretations of a text or a concept get in the way of higher-order understandings of how the text performs. I think that there is probably more to be said about the tension between the kind of interpretation that Keats suggests in this letter would best be applied to his own poetry, and that kind of interpretation that Sentiment Analysis applies to it. But this will require a closer look at what Sentiment Analysis is actually doing (in brief, as a thumbnail sketch of an answer: I suspect that taking a close look at the algorithm’s emergent patterns of behavior would suggest that the interpretation that it provides is nowhere near as cut and dried as our knee-jerk dismissive assumptions about algorithmic behavior might suggest). This, alas, will have to be yet another task for another night.

References to print texts

Admittedly, there should be some of these in here, in the interest of scholarly honesty. Like so many other tasks that I wish I had time for, this will have to be a task for another time. Unlike those many other tasks, though, I anticipate that this blog post will be edited soon in such a way as to solve this particular problems.


Footnote

* (Agreed, I should avoid trying to complete assignments at the last minute. However, some weeks in grad school, everything has to be done at the last minute, because there are multiple deadlines, each of which constitutes a crisis. Those interested in seeing my conversation with my less-than-admirable ISP about this and not queasy about the use of the F word can read this tweet, this tweet, this tweet, this tweet, the conversation starting with this tweet, and the conversation resulting from this tweet. As you can probably guess, I am of course thrilled to have a conversation with a company that holds a monopoly on an essential service closed off with a vague promise to pass my complaint on to someone or other, which may or may not mean anything other than that a PR person is going to send an email to an address that routes all of the mail it receives to /dev/null, just as I am of course thrilled by their strong implication that providing maintenance to their equipment absolutely requires shutting down service completely, which is of course not true.)
[ back to main body ]