Last year, I delivered a research seminar on a recent project based on a corpus of philosophy texts. One of the comments was something along the lines of,
Ok, that was more interesting than I thought! When I heard the talk featured corpus linguistics, I was worried it was going to be all numbers, numbers, numbers, and we would all be snoring in our chairs.’
I hear a version of that fairly frequently when I mention corpus linguistics. Even among linguists, there is a common misconception that CL is all about probability scores and ignores the context in which stretches of discourse are situated. It’s all a matter of counting beans, right? Occasionally, other misconceptions make an appearance, such as what Raukko (2003: 165) writes:
The linguist looks at a large and somewhat pre-processed selection of text material and tries to find the relevant instances (instantiations, specimens) of the item that
s/he wants to study.
Since I use CL methodology fairly frequently in my research, I will from time to time debunk myths like those above and explore the value of CL. And so I’ll begin by responding to a recent post by Tim Challies, where he laments what he suspects is a rise in the use of the phrase ‘I feel’ as opposed to ‘I think’ or ‘I believe.’ A friend sent that post to me, and I immediately responded with ‘Corpus linguistics can help answer that!’
That’s part of the awesome beauty of CL. Someone has a question like ‘Aren’t these I feel statements more common than they used to be?’. And anyone who understands how corpus linguistics works can access one of the many large corpora freely available to the public and start to find out. For example, to answer Challies’ question, I turned to the COCA (Corpus of Contemporary American English), and compared ‘feel’ with both ‘think’ and ‘believe’ over a 25-year period.
As you can see in my findings above, Challies’ intuitions were right, at least to some extent. Gradually, feel is getting an edge on believe, and slowly gaining on think, at least in this corpus, which
The terms are hierarchical rather than synonymous and over time we ought to see a progression from feeling to thinking to believing.
which may be interpreted as a female tendency towards an emotional and attitudinal academic discourse.
Further, they note that the construction I feel was more frequent in highly or mostly interactive speech events, where emphasis was on building rapport.
In light of this, statements like Challies appear to be another in a long line of criticisms of what some evidence indicates are feminine ways of communicating. Challies writes
The things I feel are the things I am unsure of, the things I am encountering and responding to on an impulsive or emotional level.
In fact, scholarship on feel like that of Robert Dixon, suggests that feel is a non-face-threatening way of expressing something one knows intuitively but wishes to present sensitively. Challies’ suggestion, therefore, that we abandon such language for the more forceful (and perhaps more male?) I think and I believe could be seen, therefore, as rooted in a belief that women and our ways of talking are inappropriately emotional and impulsive. Based on corpus-based research I’ve read, I feel like they aren’t.
All that said, I still have questions about points Challies raised, which I can’t answer at this point in time. For example, I didn’t have access to gender data in COCA, so I had to rely on other existing research for information about gendered usage. If I were carrying out a thorough corpus-driven analysis of think, feel, and believe, I would use the corpus statistics as a starting point from which to examine the texts themselves more closely, asking such questions as:
- In what contexts are feel, think, and believe used?
- To what extent is their usage gendered? marked by age?
- Is the increase in use of feel due to greater participation of women in public discourse?
- If not, how can we explain its increased use?
- Does feel behave like any other tokens? For example, is there a version of feel used more frequently by men? (as some existing research suggests).
Like all methodologies, CL has its limitations and weaknesses. However, good, responsible corpus linguistics, with the proper tools, allows us to answer questions about how people use language, to what extent they use it and when, and in what contexts. It involves looking at patterns in often large bodies of naturally occurring data, carefully and cautiously selected, and examining language features in context. As I will explore in future posts, CL is often combined with other methodologies, such as discourse analysis, which is a text-analytical tool more familiar to biblical scholars.
For more information about CL, below are some resources I have found helpful: