Search engines, recalls and ratios

Just for fun, I tried three separate search engines, AltaVista, Google and Lycos, to see how their recalls differ. I chose those three since Wikipedia indicates that they index and search the interwebs independently of each other. (The figures below seem to substantiate this.)

I’m not interested in actual number of hits, but rather in ratios. Hence I searched for words in pairs (one word, two spellings) such as "keyboard" vs "kyeboard" (the latter being a typo), "occurring" vs "occuring" (common misspelling), "organisation" vs "organization" (variant spelling), and a handful others.

Ideally such recalls should tell me how much more common one construction is compared to another. As long as the indexing of the interwebs is comprehensive and/or sufficiently random, then each search engine should give me roughly equal ratios irrespective of the actual number of hits involved. However, the figures below indicate something different.

TYPOS &
MISSPELLINGS
  keyboard kyeboard Ratio  
AltaVista 483,000,000 23,000 21,000:1
Lycos 26,162,417 1,158 22,593:1
Google 91,000,000 30,300 3003:1 ⬅ more
  occurring occuring Ratio  
AltaVista 149,000,000 12,400,000 12:1
Lycos 8,235,538 675,116 12:1
Google 45,500,000 14,400,000 3:1 ⬅ more
  episode epsiode Ratio  
AltaVista 844,000,000 1,360,000 621:1
Lycos 45,426,313 53,927 842:1
Google 397,000,000 306,000 1,297:1 ⬅ less
VARIANT
SPELLINGS
& FORMS
  organization organisation Ratio  
AltaVista 1,620,000,000 811,000,000 2.0:1
Lycos 467,042,950 45,292,899 10.3:1 ⬅ less
Google 248,000,000 142,000,000 1.8:1
  isn’t ain’t Ratio  
AltaVista 368,000,000 82,900,000 4.5:1
Lycos 80,699,040 13,814,999 5.8:1 ⬅ less
Google 223,000,000 52,500,000 4.2:1
  "he isn’t" "he ain’t" Ratio  
AltaVista 298,000 97,000 3.1:1 ⬅ more
Lycos 1,912,390 306,715 6.2:1 ⬅ less
Google 5,090,000 11,900,000 1:2.3 ⬅ !!!
  "than I" "than me" Ratio  
AltaVista 216,000,000 69,000,000 3.1:1
Lycos 11,704,123 3,402,830 3.4:1
Google 54,700,000 15,300,000 3.6:1

Taken at face value, it would appear that Google disagrees with the other two search engines when it comes to typos and misspellings, although the disagreement does not appear to be consistently in any one direction. When it comes to variant spellings and forms, there seem to be no general tendencies. In one case ("than I" vs "than me"), they all agree, in another ("he isn’t" vs "he ain’t"), they all disagree. In the other two cases, AltaVista and Google agree, while Lycos does not.

To be honest, I’m not entirely sure what this means. The fact that the differences are there ought to raise some alarm bells before trusting any figures provided by any of the search engines. Perhaps there’s a simple technical reason for all this. Idiosyncratic roundings off? Invisible spell-checkers? Biases indexing of the web? Biased recall procedures? Unfortunately, I’m too ignorant about how exactly search engines work. One thing is clear, however. Different search engines do it differently, which leads to the obvious question: should we trust any of them?

A corpus like COCA (i.e. Corpus of Contemporary American English) is more tailor-made for linguistics, and is therefore also more suited for linguistic queries. On the other hand, COCA doesn’t give us all aspects of actual language usage. The written-language part of the corpus contains texts drawn from published sources, and is thus composed of edited texts in which typos and non-standard usages have been weeded out. Typos like "epsiode" and "kyeboard", for instance, give no hits at all in COCA, while the “than I”/”than me” ratio is 6.5:1 in COCA (compared to the roughly 3.5:1 in the tables above).

Search engines like AltaVista, Google, Lycos, and others, index and search people’s unedited language usage out there "in the wild", warts, typos and all. Therefore their recalls should be more representative of actual usage. The trouble is, they give different results, as the above little excercise demonstrates.

At any rate, this isn’t a very comprehensive survey, being based on only a few searches. All I know at this point is that one obviously needs to be very cautious about interpreting numbers extracted from search engines. Most people seem to trust results offered by Google without even blinking. Indeed, many people use nothing *but* Google. Admittedly I do, too, normally. We might want to re-think our faith in Google. I know I will.

Nicknames for IPA symbols

A recent blog post by John Wells, which discusses transcriptions of toddler-speech and also mentions so-called phonetic alphabets, made me remember that I once created a phonetic alphabet for the symbols in the IPA (International Phonetic Association) alphabet.

Although I’ve used it largely for my own benefit, it could potentially be useful to anyone who wants to spell out phonetic transcriptions without resorting to cumbersome and complex phonetic descriptions. The full table, with all the nicknames, can be downloaded as a PDF from here. It’s been updated and revised a few times over the years.

Most of the names are arbitrary, though not entirely un-motivated. For instance, ƀ is named barbie, which is a contraction of barred b, while đ is named bardy, a shortening of barred d.

Some names derive from IPA’s own pamphlet, The principles of the International Phonetic Association, 1949. For instance, the ɮ symbol is named dhla simply because in the IPA pamphlet that particular symbol is exemplified with the Zulu word dhla, in which the sound represented by that symbol is said to occur.

Other names derive from a symbol’s graphic features. For instance, ɰ is named ham because it looks like a combination of lower-case H and M, though upside-down. Thus H+M ≈ ham. Perhaps I was hungry when I made that connection.

Anywho, as far as I know, no such or similar "phonetic alphabet" exist for the symbols of the IPA. Maybe phoneticians don’t need one? Or else they resort to all kinds of ad-hoc and impromptu solutions. Either way, here‘s one they can use should they need one.

What’s wrong with bad language?

(Here’s a bit of a post-Christmas rant. Sorry for the length!)

Complaints like the following are not uncommon on the internet and elsewhere:

— “Why can’t people write/speak Proper English?”
— “Write proper English if you want others to understand you!”
— “People who don’t use Proper English are lazy, stupid, or both.”
— “English is ruined by sloppy pronunciation and bad grammar!”

Taken at face value, complaints like the above seem perfectly reasonable. If you want to get a message through, then you need to communicate in a way that is understood. What could be more reasonable than that? There are a few underlying assumptions here that need a fair bit of consideration before we can address that issue. For instance, what exactly is Proper English? And by implication, what is Improper English? But let us start with a few basics.

Spoken vs written language

We normally communicate by either speaking or writing. Speech is our primary mode of communication. We learnt to speak before we learnt to read/write, both as individuals and as a species. Without speech there would be no writing. The reverse is not true.

Speech and writing are separate, albeit related, code systems. Even though there is a considerable overlap in lexicon and grammar, they nonetheless function according to their own separate sets of rules and constraints. Hence we need to keep them clearly separated.

In speech, for instance, we have a whole range of extra-linguistic phenomena that help us interpret what people say, such as intonation, stress, loudness, mimicry, body posture, finger pointing, and so on. Even the clothes a speaker wears and the dirt under his/her finger nails can be used as clues when decoding a message. In a written text, all those speech-typical signals are lacking. On the other hand, written texts have their own distinguishing features, e.g. font type, text formatting (such as bold-print, italics, underlining, font colour, etc.), paragraph organisation, headlines, info boxes, illustrations, paper quality, and so on and so forth. In short, spoken and written messages are constructed differently, and consequently we also decode them differently.

However, there are differences also where you might hypothetically expect similatities. For instance, the spelling of English does not match the pronunciation. Hence while there are 6 letters signifying vowels in written English (a, e, i, o, u, y), there are more than 10 actual vowel sounds (excl. diphthongs) in any variety of spoken English. Some spelling/pronunciation mismatches are archaisms, testifying to old pronunciations. This does not, however, mean that they are not functional or useful. While the written words right/write/rite are kept apart in writing, they are pronounced identically in most forms of English. This actually makes reading easier. Since the words are visually distinct, they are not likely to be confused and thus their meanings are quicker to access. Moreover, the different spellings maintain a visual consistency among a set of related words, in this case write/wrote/written (all contain wr-t) and rite/ritual (both contain rit). Compare also anxious and anxiety which are pronounced something like angkshious and angziety, respectively, but are kept visually related by their anx-parts. This maintains a consistency on the word/grammar level at the expense of congruency on the spelling/pronunciation level. It’s an unavoidable trade-off, but very useful for readers.

Vocabulary and grammar differ between written and spoken language. We commonly use more elaborate sentence constructions in writing than we do in speech, and while we write £5, we use a different word order in speech, namely five pounds (instead of pounds five).

Speech is immediate and momentary, while writing is planned and lasting. We make up, edit and correct our spoken utterances on the spot, while the listener is hearing them. Written messages are often edited and revised before they are read by their intended recipients. Hence readers of written texts are less likely to witness any corrections and edits. In speech, however, these occur naturally during the actual communication. Thus it is only normal to expect more errors and mistakes in speech. This has to be taken into account when evaluating either speech or writing.

Change and variation is natural and ever-present

Human languages are heterogenous and they constantly change. In fact, languages are in constant flux. They display variation in both time and space. They always have and they always will. It is a necessary property of any living language. When a language ceases to change, it dies. It becomes non-functional and no longer serves the purpose as a useful tool of communication.

Language cannot be a closed and rigid system of rules for a variety of reasons. It has to be adaptable and flexible, simply because we need to be able to talk about new things, or even old things in new ways. Such things as metaphors, similes, analogies, sarcasms, and so on, constantly change and enrich our languages with new forms and new constructions. This is what poets and authors do all the time. Well, at least the interesting ones. But it’s not only poets who do it. Other people do it, too. Some people do it more deliberately and innovatively than others, but we all do it to some extent; perhaps not with as impressive results as Shakespeare did, but still, we do it.

It is not only the use of words and phrases that change. Every aspect of language changes, pronunciation and grammar included. The reasons can be many, ranging from the biological/physiological workings of our speech apparatus to intentional idiosyncracies used to mark one’s own identity or make a joke. Some are due to errors and mistakes, no doubt. But whatever their reasons, some of them catch on and spread, either because they are considered more prestigious (e.g. if a famous person uses it), or because they are felt as better in some way (e.g. when the shorter mobile is used instead of the compound mobile phone). Clearly we cannot dismiss changes in principle. They happen, sometimes for reasons known to us, but most commonly for reasons that are unknown. This, of course, leaves the field open for people to make up all kinds of speculative and bizarre reasons, such as “laziness” and “stupidity”.

Different individuals do have different language skills, habits and preferences. I know a different set of words than you do. I’m comfortable with a different way of expressing myself than you. These differences exist, but such differences do not unproblematically translate into values of good or bad.

More than just messages

Language is not only used to convey messages. With language, we also signal our identities and group belongings. This is an important function of language, especially spoken language. We imitate those we want to identify with. With language, we create boundaries to other people who do not speak like us, or towards people we do not want to sound like, for whatever reasons. For instance, kids do not want to sound like their grand-parents. Nor do grand-parents want to sound like thirteen-year olds. Scots do not want to sound like Londoners, and vice-versa. Middle class people do not want to sound like chavs. Hence they make sure they don’t speak like them. In their opinion, they of course speak “proper” English while the so-called chavs speak “bad” English.

This segregating function of language exists in all speech communities, and the causes are largely social. People naturally form groups, both temporary and long-lasting ones, large and small. It can be professional groups like lawyers, electricians, clergy, etc., or it can be social groups like family circles, chess clubs, buddies, street gangs, and so on. These groups create and maintain in-group specific behaviours, be it dress codes, hand shakes, in-jokes, whatever, incl. particular linguistic behaviours such as specific forms of salutations and other fixed phrases, common technical terminology, peculiar pronunciations of certain words, and whatever else. These in-group peculiarities, furthermore, are part of what defines any given group. They reinforce the group’s identity and signal that identity towards outsiders.

What, who, how, where and when?

We constantly adapt our language use according to a variety of factors, such as what we say (the topic), to whom we are saying it (the intended receiver), how we say it (the medium), as well as when and where we say it (the context/milieu). For instance, if there is a lot of background noise we may chose to scream. If there’s a lot of other people around, we may instead chose to whisper. We speak differently to our loved ones than we do to our bosses. When we communicate we usually do so with an intended audience in mind. It can be a single individual (e.g. in face-to-face dialogues) or it can be a large non-specific mass of people (e.g. when giving a national speech on radio). Thus we adjust our speech accordingly. That’s why “Sup?” is a perfectly valid formulation in one context, while “How do you do?” might be considered better in another. This is also the reason why there is no one single correct way of communicating. There are in fact innumerable correct ways.

Sometimes one can hear complaints from adults who have overheard adolescents speak unintelligble English on a bus, or some other public place. This kind of complaint has little rational basis to it. The needs for successful communication have evidently been adequately met, as they obviously understand each other. They have no intention, nor any obligation, to make themselves understood by outsiders. Their use of “unintelligble” language is no indication that the adolescents in question cannot speak “proper” English. In all likelihood, they can. They are simply sensitive to the fact that they are intending their communication for each other, not those around them. Their choice of language behaviour signals “This is who we are” and “This conversation is not meant for you but my mate(s)”.

Standard English

When people talk about Proper English, it is usually some form of written English they mean. In particular, they imply the kind of English tought in schools, and which is commonly referred to as Standard English or Queen’s English.

Sometimes you hear people say things like “Common rules (of Standard English) are necessary to secure good communication”. However, this is looking at it the wrong way around. It is our need to communicate that has created and continues to maintain whatever rules there are. Some of these rules (generalisations) have been “discovered” by scholars and subsequentlty printed in books (grammars), which others then have come to treat as indisputable dogma. It is important to remember here that writers sometimes get it wrong, and not every stipulated grammar rule is a valid rule. Nor can a grammar ever be complete. And because languages constantly change and adapt, every published grammar is instantly obsolete. Think of a grammar as a photograph capturing a single moment in time. And not only that, no matter how much of the landscape it captures, there will always be something outside the frame, or beyond the horizon.

Standard English is not a natural language. It is an artificial construct existing only in written form. Even though it is a written language, many people try to emulate it in their speech, especially in formal situations. It is not the case that dialectal forms of English are deviations or even variations of Standard English. If anything, it is the other way round. Standard English is to spoken English dialects as the poodle is to wild dog species. More specifically, Standard English is an artificial variation of Midlands-based dialects, just like the poodle is an artificially created variation of a once-domesticated wolf. What we today recognise as Standard English has been deliberately engineered and promoted by the social and academic elite over the past 500 years or so.

It is some idealised form of this artificial, written Standard English that people usually have in mind when they complain about other people’s Englishes, be they written or spoken. And this is the basis of their irrationality. Everyday English is not the same as Standard English, nor should it be, and anyone expecting it to be is by definition wrong, even foolish.

So what are people complaining about?

There are seemingly no limits to what people can complain about when it comes to language. However, some complaints have been repeated so often that they have become unquestioned clichés rather than observations based on any rational thinking. The ironic thing is that many of the things that “language snobs” complain about aren’t even errors to begin with. Their complaints have been refuted many times by linguists, but the internet in particular is ripe with the same age-old complaints, including such dear things as split infinitives, double negations, the word “like”, saying bigger than me instead of bigger than I, writing could of instead of could have, and many others. Let’s have a look at some of these.

Split infinitives

Complaining about split infinitives seems to be a favourite. However, there is nothing wrong with them. They are fully permissible by English grammar, and they have been used by many generations of speakers and writers. Sometimes sentences become more clear with split infinitives than without them. Compare the following three versions:
 — He prepared silently to accompany her
 — He prepared to silently accompany her
 — He prepared to accompany her silently
In the first sentence, silently modifies prepared. In the second, it modifes accompany. The third sentence is ambiguous. If you want to make it clear that it is the accompanying that is done silently, instead of the preparation, then you chose the second sentence, the one with the split infinitive. That is not grammatically flawed. It is stylistically good.

Double negations

Double negations is another favourite gripe among language snobs. They are sometimes claimed to cancel each other out. Thus He didn’t say nothing can allegedly be misinterpreted as He said something. This is just plain wrong. No one is likely to construe such a meaning unless they intentionally try to. The double negation is indeed redundantly marked (i.e. it is pleonastic), but this is quite a common phenomenon in languages. It can be used for emphasis or simply to make sure that the negation is heard. This is valuable in speech if not in writing. There is certainly nothing ungrammatical about it. There are even constructions in which we expect double negations to occur, as in neither … nor, in which negation is doubly (redundantly) marked.

It is even possible to argue that those who use double negations are more attuned to communicative needs than those who don’t use them. In Old English, the common negative structure was something like ic ne lufie (lit. I not love). The negative particle ne was frequently destressed in speech, for which reason it was strengthened with another negation marker, noht (nothing), giving rise to the Middle English construction ic ne seye not (lit. I not say nothing). This doubly-negated construction ensured that the negation wasn’t lost in transmission. (When the original negation ne later disappeared altogether, the newer negation, noht, was brought forward with the help of the auxiliary do giving rise to the Modern English construction I do not know.) In modern times, we see the same process happening again. Negative particles are frequently destressed in constructions like I don’t have it, as opposed to I do not have it. This creates a natural need to strengthen the destressed negation with an additional negation, as in I don’t have nothing. Thus the second negation is not there to cancel the first one out. It’s there to make sure the negation is heard. That’s not being ungrammatical or unidiomatic. It’s being sensitive to communicative needs.

(As a side-note, there are cases where we use double negations with the seemingly intended purpose of having them cancel each other out. But this seems to be possible only when the negations apply to the same word, as in not uncommon, in which the negations are not and un-. However, in such cases the resulting semantics does not equal the simple positive root, in this case common, so that semantically speaking, there is more going on here than a mere cancelling out.)

I or me

Whether to use me or I after than depends on how one interprets than. Is it a conjunction or a preposition? Both interpretations are permissible in English. Many words can function as both prepositions and conjunctions. In a somewhat simplified way, you can say that it depends on what comes afterwards. Is it a noun phrase or is it a verb phrase? In he did it before me, before is a preposition since what follows is a single pronoun. In he did it before I did, it is a conjunction since what follows is a verb phrase. This double-functionality is an integral part of English grammar (spoken as well as written), and there is nothing wrong with it. Hence he is bigger than me (preposition) is just as correct as he is bigger than I am (conjunction).

The curse of the Bishop

As already indicated, the above-mentioned complaints aren’t valid complaints at all. They have nonetheless been prevalent complaints for many generations, despite having been refuted many many times. They seem to originate from a set of (pseudo)rules established by Bishop Lowth in an influential book on English grammar which he published in 1762. They are likely based on Bishop Lowth’s own, idiosyncratic aversions against other people’s English which just happened to differ from his own. Due to his social prestige and position in the church, his opinions came to be propagated by generations of English teachers, who are no doubt well-meaning, but misguided and wrong nonetheless.

Words like like

A more recent annoyance is directed towards the use of like in phrases such as it’s only, like, an hour and he was, like, stupid or something. There seems to be no end to the stigma attached to those who use like like this. Needless to say, their complaints are emotional rather than rational. The word like is multifunctional in most people’s English. It can appear as a verb, as in I like it, an adjective, of like mind, a conjunction, he eats like there is no tomorrow, or a preposition, she walks like a duck. For many people, these (and perhaps a handful others) are the only accepted uses of like. Like in so many other cases, this, too, rests on the misapplication of ideal written standards to the spoken language. In spoken English, like has a further function seldom noted by grammar books (which are based on written language). It can also be a so-called discourse particle/marker, a functional category found in all languages. Indeed, discourse particles form an important and seemingly essential part of any spoken language. In English, words and phrases like like, well, I mean, you know, as well as others, can all be used as discourse particles fulfilling a variety of important functions. For instance, like can be used as an approximation marker, as in it was, like, two hours ago (emphasising that something is an estimate), mitigation marker, he was, like, stupid (lessening the impact of the accusation of being stupid), quotation marker, e.g. so he was like, “Ooh, my brain hurts” (framing reported speech, i.e. signalling that something is being quoted), as well as other things. Discourse particles are seldom, if ever, meaningless ticks. They only appear superfluous and meaningless to those whose ideal language is some form of Standard Written English and/or to those who don’t pay attention to what those little particles actually do.

Could of

A frequently ridiculed construction in written English, especially on the internet, is the use of of where you would normally expect a have, as in must of, should of, etc. Worth noting is that this particular mistake is always self-correcting, so even though it can be construed as a grammatical error, it does not create any confusion with regard to the message conveyed. However, it is also worth noting that it is not the grave error many people think it is. The grammatical template Auxiliary Verb + Preposition + Main Verb is already acceptable in English. It is used in, for instance, I ought to go home, in which ought to go is grammatically analoguos to should of done, i.e. Auxiliary + Preposition + Main Verb. There are even hints of a consistent division of labour here. While to is followed by infinitive forms, of is followed past participle forms. There is clearly more at play here than it being a simple error.

Commonly mixed homophones

Other oft-noted mistakes/errors concern the mixing of it’s/its, they’re/their/there, we’re/were/where, you’re/your, hear/here, and others. Admittedly, these are genuine errors, but they are always self-correcting. It is very difficult to come up with contexts where they’d create any serious misunderstadings. They may be eye-catching, but can hardly be considered detrimental to communication as such. Some might even argue that since they aren’t distinguished in speech at all, being pronounced identically, we might as well dispense with the distinctions entirely and use a single form for each.

Language is self-regulating and optimal

Any given speech community will always regulate its language behaviour so as to be an optimal tool designed for easy flow of information (make yourself understood within your group) as well as a tool for signalling an individual speaker’s identity (tell your surrounding who you are and who you are not). The need to succesfully convey messages favours similarities and common rules, while the need to mark one’s identity favours differences and idiosyncracies.

Since language is also context-dependent (who says what to whom, how, when and where), what counts as optimal varies from situation to situation. Sometimes being only able to haggle over prices at a flea market is enough to qualify as functionally optimal. Sometimes it’s enough only to be able to hurl insults across a border. In other situations, such as parliamentary debates or when romancing a loved one, more elaborate language behaviour is called for. The situation, the participants and their individual motives dictate how communication occurs and what forms it takes.

The point here is that many factors have to be taken into account when assessing what is or is not appropriate and/or functional language. Using the sole reference frame of Standard (Written) English simply won’t do. It’s deeply ignorant of what language is and how it works. When a perceived error keeps being repeated over and over again, generation after generation, then there is always more to it than it being a simple “error”. Sloppiness, lazyness and stupidity are never the answers, provided you’re interested in understanding language as opposed to merely denouncing what breaks a perceived idealised dogma.

What people use in their daily lives could do well with a lot less respect for normative standards. Correcting is completely superfluous in ordinary, everyday language use, mainly because linguistic errors are typically self-correcting, be they grammatical errors, typos or mispronunciations. If they weren’t, they wouldn’t be spotted. And when they do create genuine ambiguities, it is always better to ask for clarifications rather than trying to correct them. Communicatively speaking, that is a much more productive solution.

I think people pay way too much attention to spelling in their writing. There’s no harm whatsoever in allowing a more free and liberated spelling in ordinary language use, especially in personal letters, emails, internet forums, etc. There are admittedly some contexts where a normative spelling is preferable, and where correcting (genuine) errors is a good thing. This concerns mainly educational settings where language either is being taught as a subject or forms part of the curricular activities (e.g. essay writing). Official texts, regulations and legal contracts may also benefit from fixed spellings, but newspapers and prose publishers have hardly any reasons to abide by dogmatic spelling conventions.

Linguistic errors (real or imagined) are no more harmful to the English language, or even communication in general, than picking up the wrong fork at a fancy dinner party. That is, it may jar the (over)sensibilities of some snobs, but ultimately has no effects beyond that. To me, language snobs (prescriptivists, purists, the Grammar Police) are like extreme creationists. Instead of observing and understanding what language really is, they chose to believe in some sort of mythic ideal (Standard/Proper English) which they use as a holy dogma, especially when judging and denouncing the behaviour of other people. They then vilify whatever behaviour they perceive break the rules of their interpretation of this revered dogma.

Virtually every complaint about “bad language” is nothing more than just another stick to beat other people over the head with in order to feel superior. It is a behaviour not very different from school-yard bullies who point a finger at the kid who dresses differently or talks with a lisp. Unfortunately, many complaints (be they valid or not) seem to rest on an underlying rationale that goes something like “Yes, I can understand what you’re saying, but I don’t want to, so it’s your fault!”

That’s a shame.

SELECT BIBLIOGRAPHY

Aitchison, Jean. 1991. Language change: progress or decay? 2nd edition. Cambridge University Press.
Andersson, Lars-Gunnar & Peter Trudgill. 1992. Bad language. London: Penguin Books.
Brook, G.L. 1978. English dialects. 3rd edition. London: Andre Deutsch.
Coates, Richard. 1989. A solution to the ‘must of’ problem. In: York papers in linguistics, v. 14, p. 159-167.
Crystal, David. 2008. Txtng: the gr8 db8. Oxford University Press.
D’Arcy, Alexandra. 2007. Like and language ideology: disentangling fact from fiction. In: American speech, v. 82, p. 386-419.
Foster, Brian. 1970. The changing English language. Harmondsworth: Penguin Books.
Janson, Tore. 2002. Speak: a short history of languages. Oxford University Press.
Jespersen, Otto. 1933. Essentials of English grammar. London: George Allen & Unwin.
Jespersen, Otto. 1938. Growth and structure of the English language. 9th edition. Oxford: Basil Blackwell.
Mattson, Jenny. 2009. The subtitling of discourse particles: a corpus-based study of well, you know, I mean, and like, and their Swedish translations in ten American films. PhD dissertation. University of Gothenburg. (download)
Todd, Loreto & Ian Hancock. 1990. International English usage. London: Routledge.

A FEW USEFUL WEBSITES

David Crystal’s blog
Language log
Urban dictionary
Wordspy
World Wide Words

Welcome to XX10!

So, the new year’s begun.

It seems most people will be calling it twenty ten. How unimaginative! In fact, I’ve tried, unsuccessfully so far, to introduce my own little name for it, namely, Double X Ten, i.e. twenty = XX (and) ten. However, it doesn’t seem to be catching on.

(Come to think of it, I guess it could be called Triple X, too, or XXX, i.e. twenty = XX, ten = X. Ha!)

Is illegal copying of software theft?

If I take a DVD with a piece of software on it, without the owner’s approval, then everyone would agree that it is an act of theft. But if I only copy the disc’s content, or download it over the internet, and thus do not deprive the owner of any physical object, is it still theft?

The most basic understanding of theft would be when you take possession of someone’s physical property without that someone’s approval, be it a car, a wallet, or some jewels. Prototypically theft involves touchable things, but it can also extend to nonphysical things. For instance, you can illegally empty someone else’s bank account without actually moving any physical coins or money around. It would still be theft, even though no physical objects are involved. In both cases, you would have deprived the rightful owner of something of value.

But can the concept of ‘theft’ be extended to illegal copying and downloading of software? Clearly you have not deprived the owner of anything physical or even digital. You have merely copied it. The software itself is still there in its original place, so how can it be theft?

If I steal a physical DVD with software on it, it is not the disc itself I want. It’s the content of the disc that I want. If I illegally copy or download it, I’m after the same thing. The fact that I’m not taking the software with its physical container/carrier seems irrelevant to me. I have illegally transferred something into my possession that doesn’t belong to me. I have thereby also unduly benefited from someone else’s property. I have infringed on the legal owner’s right to control it’s distribution. Does this amount to theft? Instinctively I would say yes, it does.

Now, I can understand if people object to this. It’s common to treat words and their meanings as fixed points in the universe. If you have a fixed concept of the word ‘theft’, and try to apply that to illegal copying/downloading, then you would naturally conclude that illegal copying is not an act of theft because you’re not depriving the owner of the thing you’re making a copy of.

But words and meanings are not fixtures. Nor should they be treated as such. The world around us changes all the time, and so we must constantly re-negotiate our vocabulary to match it. Otherwise our language would eventually be useless.

The meaning of ‘theft’ relies on (at least) three concepts, namely, property, ownership and possession, as well as on how those concepts are transferred between keeper and taker. When the idea of theft was originally thought up (an occasion now long lost to history), there were no digital products around. Now there are. I can have ownership and possession of a physical thing like a car, and I can have ownership and possession of a digital product like a piece of software or a digital recording.

A piece of software cannot normally change hands in a physical sense, only copied. That is, while you can transfer the ownership of software, you cannot physically transfer the property itself. You can copy it and then delete the original, but unless you transfer the software’s physical carrier/container, the software by itself cannot be transferred.

If the concept of ‘theft’ depends necessarily on the illegal transfer of the property itself, it should by implication never be possible to steal digital products. To me, there’s something wrong, and obsolete, about that. In principle, anything that can be possessed can also be stolen. It really isn’t that much of a stretch to re-think the idea of ‘theft’ to include illegal copying/downloading. We need to focus on theft as an act of illegally taking possession of a property, and only that. The physical transfer of the property itself does not have to be involved.

I should perhaps emphasize that I’m not talking about the legal definition of ‘theft’ here. I’m trying to understand a colloquial usage of the word ‘theft’, in particular my own. And to be quite frank, I’m not even sure that I’m all that categorical about it. Perhaps we do need a new word for this. I guess my only point is that it’s at least not impossible to think of illegal copying/downloading as an act of theft.

For realz!

Have you ever wondered what the Z in for realz is actually doing there? I have. The dictionary form of the phrase is for real, without a final Z.

There is a (formally) similar expression, for keeps, in which the S is historically a plural marker. Apparently, the word keeps is short for keepies, and originates from some sort of game in which players collected marbles. The ones one won, one “kept”, and these became known as keepies. Possibly the phrase for keeps has contributed to the formation of for realz by way of analogy. Although admittedly, it sounds a bit far-fetched.

The Z in for realz seems clearly to be a plural marker (i.e. plural S), but why has it been added at all? There doesn’t seem to be any plurality involved in the semantics of the phrase, not even metaphorically. (It means ‘in earnest’, ‘really’, ‘truthfully’.)

We may get at a solution if we look at phrases like many thanks and many apologies, in which there clearly is a plural S on each respective noun. But here it makes sense. You can easily think of many acts of thanking or apologising, so here the plural meaning is semantically justified. The assumption is, of course, that the more you thank or apologise, the more sincere you are. However, when we say many thanks, we say just that. We don’t usually go on actually thanking multiple times (although that may happen, too). The point here being that the plural forms’ major function is to intensify or emphasize the act of thanking or apologising, not to mark a plurality of acts as such (which in these phrases would only count as a minor, secondary function).

It is conceivable that it is this intensifying function of the plural S that is being used in for realz. Hence in this phrase at least, English plural S seems to have developed a function devoid of plurality. If that really is so, then it would be interesting to see if this intensifying S pops up elsewhere in the language.

(Another explanation would be that there’s some cross-linguistic interference going on, namely, from Spanish de veras, in which veras is a plural noun with a Spanish plural marker S. If that’s the case, then it would seem that English has incorporated the Spanish plural S as an intensifier, and again, devoid of plurality.)

How much is every other?

On a recent math test for Swedish 7th graders, one question asked: How many percent is every tenth? (Hur många procent är var tionde?) You’d think the answer is a straightforward 10%, but is it? Several students have had difficulties with that particular question. Why?

In order to arrive at 10% you have to assume infinity. The question requires you to imagine a situation where you can pick every 10th apple (for instance) out of an infinite amount of apples. Hence it requires a certain level of abstraction that isn’t necessarily intuitive, because in reality no one has an infinite amount of apples to chose from.

In fact, the expression every Nth can correspond to a whole number of varying percentages unless you have been taught to assume infinity. For instance, every 4th equals 25% only when the total amount is either four, eight, twelve, any other number divisible by four, or else if the total is (the hypothetical) infinity. If the total is, say, five, then picking every 4th apple may only give you 20%. That is, { 1, 2, 3, 4=pick, 1 }, leaving you with four unpicked apples, namely, the first three and the one remaining after the one you picked. If the total is seven, every 4th equals a mere 14.28%, i.e. { 1, 2, 3, 4=pick, 1, 2, 3 }, or one out of seven.

If the
total is:
1 2 3 4 5 6 7 8 9 10 11  … 
then
‘every
4th’
equals:
0% 0% 0% 25% 20% 16.7% 14.3% 25% 22% 20% 18.2%  … 

An important point here is that every Nth is not the same as a Nth. While a 4th always equals 25%, irrespective of the assumed total, every 4th does not, simply because every Nth is unit-based, meaning it counts only whole units. That is, you don’t cut up any "remainders" as you would if you were to chose a 4th of all apples.

I don’t have any statistics showing how often students experience difficulties with this seemingly uncomplicated question, but I know that at least some do. This is undoubtedly a tricky question, if not a trick question. Answering it in the expected way is in any case a cultural feat (i.e. you have to be taught to assume the abstract concept of infinity) just as much as it is a logical or mathematical one.

Don’t say he’s foreign

Here’s an odd bit of language usage.

What qualifies something or someone to be of a "foreign persuasion"? In its most literal interpretation, you would expect it to refer to someone being influenced by a foreigner, a foreign nation, or otherwise something foreign. After some googling, it seems clear that that is in fact also how many people use it.

However, the phrase has another usage, too, and an odd one at that. For some people, it’s a roundabout way of referring to foreign (and foreigner?), as when writing about "films of the independent and foreign persuasion", or the "foreign persuasion in NASCAR" (referring to foreign-born drivers). Can NASCAR drivers be persuaded to become foreign-born?

In those contexts, "foreign persuasion" doesn’t refer to any particular opinions (as in Christian persuasion), or people who have been persuaded into believing or doing something. It’s simply used as a tortuous alternative to foreign.

When did foreign become a derogatory word in English?

How literal is literal?

Have you noticed that literally doesn’t literally mean ‘literally’ anymore?

It’s now quite frequently used as a general intensifier. I’ve only recently begun to notice it in phrases like I literally died laughing and I literally worked myself to death. Online dictionaries say this is an "informal" usage.

Compact English Dictionary:

1. in a literal manner or sense
2. informal used for emphasis (rather than to suggest literal truth)

Cambridge International Dictionary of English:

1. used to emphasize what you are saying
  <He missed that kick literally by miles>
  <I was literally bowled over by the news>
2. simply or just
  <Then you literally cut the sausage down the middle>

Merriam-Webster’s Online Dictionary:

1. in a literal sense or manner : actually
  <took the remark literally>
  <was literally insane>
2. in effect : virtually
  <will literally turn the world upside down to combat cruelty or injustice — Norman Cousins>

They add:

Since some people take sense 2 to be the opposite of sense 1, it has been frequently criticized as a misuse. Instead, the use is pure hyperbole intended to gain emphasis, but it often appears in contexts where no additional emphasis is necessary.

None of my readily available print dictionaries mention this "informal" usage, although the thesaurus part of Collins dictionary and thesaurus (publ. 1987) lists actually and really under literally, words which can similarly be used for emphasis. Also, one of my English-Swedish dictionaries, Engelsk-Svensk ordbok by Kärre, Lindqvist, Nöjd & Redin, publ. 1938, does add "fullkomligt" (meaning ‘entirely’) as a possible translation for literally, but labels it "familjärt" (i.e. colloquial).

It’s use for emphasis must be fairly new. At least I’ve just started noticing it, although that’s admittedly no proof of anything but my observational skills. However, older dictionaries don’t seem to recognise it at all. For instance, the online version of Webster’s 1828 dictionary says:

1. According to the primary and natural import of words; not figuratively.
  <A man and his wife cannot be literally one flesh>
2. With close adherence to words; word by word.
  <So wild and ungovernable a poet cannot be translated literally.>

(The same appears in the online version of Webster’s 1913 edition.)

Anyway, I’m not opposing this (new?) usage. The fact that words change meaning is an inevitable and natural feature of any living language. It’s a sign of health. Word meanings are only stable in dead languages.

You are what you blog

Well, you are, if you believe the Typealyzer, which is some sort of text analyser. They don’t give any details of what it is that they do exactly, and should probably not be taken too seriously. Still, having typed in the URL of my own blog, it concludes the following about me:

The logical and analytical type. They are especialy attuned to difficult creative and intellectual challenges and always look for something more complex to dig into. They are great at finding subtle connections between things and imagine far-reaching implications.

They enjoy working with complex things using a lot of concepts and imaginative models of reality. Since they are not very good at seeing and understanding the needs of other people, they might come across as arrogant, impatient and insensitive to people that need some time to understand what they are talking about.

So while I’m logical, creative and intellectual, I’m also an insensitive, arrogant bastard. That should make some people very happy to hear, I suppose.