Corpus Linguistics: I feel, I think, I believe

Last year, I delivered a research seminar on a recent project based on a corpus of philosophy texts. One of the comments was something along the lines of,

Ok, that was more interesting than I thought! When I heard the talk featured corpus linguistics, I was worried it was going to be all numbers, numbers, numbers, and we would all be snoring in our chairs.’

I hear a version of that fairly frequently when I mention corpus linguistics. Even among linguists, there is a common misconception that CL is all about probability scores and ignores the context in which stretches of discourse are situated. It’s all a matter of counting beans, right? Occasionally, other misconceptions make an appearance, such as what Raukko (2003: 165) writes:

The linguist looks at a large and somewhat pre-processed selection of text material and tries to find the relevant instances (instantiations, specimens) of the item that
s/he wants to study.

The idea here is that corpus linguistics involves picking out whatever instances of a stretch of discourse suit a researcher’s pre-determined hypothesis and ignoring the rest. While I won’t go into all the reasons some people are suspicious of CL, you hopefully get the general idea by now, and maybe have some others of your own, which I look forward to hearing.

Since I use CL methodology fairly frequently in my research, I will from time to time debunk myths like those above and explore the value of CL. And so I’ll begin by responding to a recent post by Tim Challies, where he laments what he suspects is a rise in the use of the phrase ‘I feel’ as opposed to ‘I think’ or ‘I believe.’ A friend sent that post to me, and I immediately responded with ‘Corpus linguistics can help answer that!’

That’s part of the awesome beauty of CL. Someone has a question like ‘Aren’t these I feel statements more common than they used to be?’.  And anyone who understands how corpus linguistics works can access one of the many large corpora freely available to the public and start to find out. For example, to answer Challies’ question, I turned to the COCA (Corpus of Contemporary American English), and compared ‘feel’ with both ‘think’ and ‘believe’ over a 25-year period.

Results from COCA: Each number is a ratio (overall) for the two words. For example, for 1990-1994, there are .22 tokens (instances) of feel for every token of think.

As you can see in my findings above, Challies’ intuitions were right, at least to some extent. Gradually, feel is getting an edge on believe, and slowly gaining on think, at least in this corpus, which

contains more than 520 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2015 and the corpus is also updated regularly (the most recent texts are from December 2015).
Now, you may wonder at this point, what does this have to do with women and families? And here we return to Challies’ post, where he argues that when it comes to feel, think, and believe,
The terms are hierarchical rather than synonymous and over time we ought to see a progression from feeling to thinking to believing.
This kind of prescription makes me immediately uncomfortable. There is, first of all, some evidence that women use I feel more frequently than men. While in my investigation of COCA I stopped after an initial, probing query, other linguists have looked further at the context surrounding instances of feel in other corpora.  BellésFortuño and Campoy-Cubillo, for example, in their analysis of I feel in the MICASE corpus, found that female academic speakers use this construction more frequently in discourse,
which may be interpreted as a female tendency towards an emotional and attitudinal academic discourse.

Further, they note that the construction I feel was more frequent in highly or mostly interactive speech events, where emphasis was on building rapport.

In light of this, statements like Challies appear to be another in a long line of criticisms of what some evidence indicates are feminine ways of communicating. Challies writes

The things I feel are the things I am unsure of, the things I am encountering and responding to on an impulsive or emotional level.

In fact, scholarship on feel like that of Robert Dixon, suggests that feel is a non-face-threatening way of expressing something one knows intuitively but wishes to present sensitively. Challies’ suggestion, therefore, that we abandon such language for the more forceful (and perhaps more male?) I think and I believe could be seen, therefore, as rooted in a belief that women and our ways of talking are inappropriately emotional and impulsive. Based on corpus-based research I’ve read, I feel like they aren’t.

All that said, I still have questions about points Challies raised, which I can’t answer at this point in time. For example, I didn’t have access to gender data in COCA, so I had to rely on other existing research for information about gendered usage. If I were carrying out a thorough corpus-driven analysis of think, feel, and believe, I would use the corpus statistics as a starting point from which to examine the texts themselves more closely, asking such questions as:

  • In what contexts are feel, think, and believe used?
  • To what extent is their usage gendered? marked by age?
  • Is the increase in use of feel due to greater participation of women in public discourse?
  • If not, how can we explain its increased use?
  • Does feel behave like any other tokens? For example, is there a version of feel used more frequently by men? (as some existing research suggests).

Like all methodologies, CL has its limitations and weaknesses. However, good, responsible corpus linguistics, with the proper tools, allows us to answer questions about how people use language, to what extent they use it and when, and in what contexts. It involves looking at patterns in often large bodies of naturally occurring data, carefully and cautiously selected, and examining language features in context. As I will explore in future posts, CL is often combined with other methodologies, such as discourse analysis, which is a text-analytical tool more familiar to biblical scholars.

For more information about CL, below are some resources I have found helpful:

Corpus Linguistics: Method, Theory and Practice

Theory-driven Corpus Research

Corpus Linguistics: A Practical Introduction



