English Morphemes
Listed by Frequency of Use
Created July 31, 2010
Last Updated Aug 1, 2010
What is This?
This is a list of morphemes, or smallest units of meaning taken from typical English text as found in the Brown Corpus. Many words have been been split into their consituent parts so that the frequency of occurance of those parts could be tallied. For example, take the sentence:
I saw the boys running away.
In addition to the words themselves, there are three additional morphemes in this sentence. The tense of the verb see" is marked as past perfect, the noun boy is marked as plural with -s, and the verb run is marked as progressive with -ing.
I see+PST the boy+PL run+PROG away.
In addition to these standard morphemes, this list also seperates out derivational affixes by their function. For example, [n] from [v] means that some prefix or suffix was used that turned a verb into a noun, such as application from apply. Thus a sentence like:
He happily accepted the application.
Would be broken down into something like this:
He happy+([adv] from [adj]) accept+PST the apply+([n] from [v]).
A few non-standard derivational affixes are also included such as -N'T for deriving can't from can, and wouldn't from would.
This list, then, is the result of analyzing the Brown Corpus for the frequency of occurance of all morphemes and dereivational affixes. The first 35 entries account for just over 50% of all the text in the corpus and represent, therefore, the most important morphemes in English. The first hundred morphemes account for 60% of the corpus, but it takes the next 1200 morphemes to bring the total to 90%. Beyond that, another 4000 morphemes are required to account for the remaining 10%, minus a small residue of very rare words that occur less than 8 or 10 times in a million words of typical English text.
It should be noted that many words appear more than once in the list because they have been split into separate entries depending on their use. For example, the word object can be either the verb in I object to that remark. or the noun in Give me that object you are holding. The frequency of occurance for the word object is, therefore, split into two separate frequencies listed as two separate entries. The same is true for words that have many different meanings such as press which has entries for [v] push, [v] squeeze juice from, [v] iron, as clothing, [v] persuade, [v] continue (press on), [n] machine tool, [n] printing press, [n] juice extractor (wine press, apple press) [n] publisher, i.e. a small press, [n] printed news media, [n] people from the media (the press are here), [adj] important (pressing matters)... and so on.
In addition to splitting words into separate entries, several English affixes have been combined into single entries. Affixes that perform the same function are grouped together. For example, -al on a noun performs the same function as -ly and -ful on a noun, namely that of creating an adjective for "having the property named by the noun." This can be seen in the examples magic -> magical, friend -> friendly, and beauty -> beautiful. These are considered to be three forms of the same morpheme, and are listed as one entry.
When viewing the list remember that each entry represents a whole family of related and derived words, so that the entry [adj] new stands for such additional derived words as newly, newer, newest, newness and so on. Likewise entries like [n] natural include derived froms such as natural, naturally, unnatural, naturalistic, unnaturally, and so on.
Lumping and Splitting: The Rationale
Perhaps I should justify my choices and decision to lump surface forms and differing lexical forms. The principle reason for the list is to know which general types of morphemes to develop first for a new conlang. For example, since one would usually work out the whole system of personal pronouns as unit, rather than one at a time, those were lumped together.
Also, non-overlapping forms apply specifically to English syntax and etymology and may not reflect the way in which such derived forms are used in a conlang. Many prepositions have been split by their general meaning and then recombined with others that share that meaning, even if they are not applied in a syntactically identical manner in English.
The gerund form has been separated from the progressive and from the using of "-ing" as an adjective so that "I am running", "the running boy...", and "Running is not necessary" are represented by three different entries. The gerund form is far less frequent and appears later in the list, so it does not appear in the top 100 of the list.
As for different meanings of derived form, where different type of nouns can be derived from a single verb, these are noted, so that we might have "[n] from [v] the doer of the action (educator)", "[n] from [v] the result of the action (education)", and so on. Those that don't yet appear on the web page are on the list, but further down than the those that have been listed so far. Some of those may need clarification, and I will straighten them out as I get them tallied and formatted.
Where are the Number Words?
With a few exceptions like first, the number words like fifty and so on, have been left out of the list. This is beacuse when developing a conlang one usually works out the number system separately, and as a cohesive whole rather than one number at a time as needed.
The List
Here are the first 100 or so entries, accounting for almost 62% of all the morphemes
in the corpus. The rest are still being worked on and formatted for the list. More will be
added as they are completed.
(as of Aug 1, 2010)
| Morpheme | Example | Cumulative Usage |
|---|---|---|
| articles, definite and indefinite | 8.71% | |
| PERSONAL PRONOUNS + POSSESIVE + REFLEXIVE | including "who" | 15.67% |
| PAST TENSE MARKER | 22.02% | |
| PLURAL MARKER | 26.70% | |
| and | 29.21% | |
| [n] from [v]: Result of the action. | illumination; education; a meeting, the building, containment, betrayal, proposal, annoyance, appearance | 31.72% |
| [prep] of, belonging to, part of | member of the club | 33.15% |
| COPULA | This is an apple. Apples are red. | 34.54% |
| PROGRESSIVE VERB MARKER | I am running. | 35.92% |
| [prep] at, to, for, toward, in the direction of | look at; laugh at, went to Boston…Head for home. | 37.22% |
| [prep] of, about, containing, measure of | Book of poems. Box of nails. Pound of butter… | 38.46% |
| [prep] in, located within | I hear it in their voices. The mood in Cuba... In 1845… | 39.41% |
| PASSIVE VERB MARKER | It has been taken. | 40.33% |
| [adj] from [n] indicating presence of a quality | magical, natural, friendly, extremely, graceful, beautiful | 41.24% |
| POSSESSIVE MARKER | John's book. The queen of England's crown… | 41.92% |
| no, not, un-, dis- | negation words and affixes | 42.60% |
| that, subord. Clause | I discovered that he already left. | 43.27% |
| [adv] from [adj]: newly, firmly | 43.93% | |
| [prep] in, with, on, by means of | He was paid in cash. Hit it with a hammer. Fed on bread and water. | 44.51% |
| [prep] from, of, originating at, away or apart from | He is from Boston. Get away from that. Francis of Assisi. | 45.08% |
| [pron] this, these, demonstrative pronoun | I want this cookie. | 45.64% |
| [prep] for, in, regarding, concerning, with respect to | Thanks for your help. specializes in physics. Check on the baby. | 46.20% |
| [adj] from [v] | the falling water | 46.69% |
| what, which, which thing | Here's what (which thing) I can do. (what/which) train do I take? | 47.17% |
| [adj] many, much, more, most | (with comparative and superlative forms) | 47.63% |
| INFINITIVE MARKER | 48.05% | |
| [aux] must, have to, compelled to | We have to leave now. | 48.46% |
| [conj] but | 48.86% | |
| [n] from [v] or [n]: doer of the action | baker, actor, sailor, computer, cameraman, mountaineer, equestrian | 49.25% |
| [conj] or | 49.59% | |
| [v] have, hold, posses | I have a book | 49.93% |
| PRESENT PERFECT MARKER | 50.24% | |
| [v] say | 50.55% | |
| [v] do | 50.86% | |
| [prep] at, location in time or space | See you at 5:00. He's at the store. | 51.15% |
| [n] from [adj]: Name of the attribute | agressiveness, agility | 51.43% |
| [adj] all, total number or amount | 51.71% | |
| [n] there, that place | 51.97% | |
| PARTICIPLE MARKER | 52.22% | |
| [aux] would | 52.46% | |
| [prep] to, until, extending to | from February to March | 52.69% |
| [prep] while, during, as, in at | at night, In his travels he saw… As he left the room… | 52.92% |
| [prep] for, purpose, beneficiary | This book is for Tom. ...donations for children's education. | 53.15% |
| [prep] on, upon, located upon, located at a certain time | on this site... My hat is on my head. On the fourth of July. | 53.37% |
| [conj] when | I'll see you when you get home. | 53.60% |
| [prep] by, by means of, using | We got here by train. | 53.82% |
| that, which, who: restrictive clause | The boy that ran away | 54.03% |
| [prep] with, cooperation, interaction, co-participation | Work with me... Don't argue with me. He is with the FBI. | 54.25% |
| [n] man, male human | 54.46% | |
| [adv] out, away from the inside | 54.67% | |
| [adv/adj] up, upward | 54.86% | |
| [conj] if | I will see you if you get here on time. | 55.05% |
| FUTURE VERB MARKER | 55.24% | |
| [aux] can, be able to | I can run. | 55.43% |
| -N'T | can't, shouldn't | 55.62% |
| [v] go | motion in general or motion away from a referenced location: Go away. | 55.81% |
| [n] time | 55.99% | |
| [prep] of, point to an attribute | cost of government; top of the mountain; the inside of the box... | 56.18% |
| [pron] that, those, demonstrative pronoun | I want that cookie. | 56.35% |
| [v] see | 56.52% | |
| [prep] into, toward the inside of | We went into the cave. | 56.68% |
| [prep] as, role, identification | known as Robin Hood... identified as the man who..., in his role as Hamlet..., Vincent Price as Captain Hook. | 56.85% |
| [v] come | motion toward a referenced location: Come here. | 57.01% |
| [v] know | 57.17% | |
| could | 57.33% | |
| [v] make, create, cause, construct | make a mess; make them aware… | 57.49% |
| [n] year | 57.64% | |
| [adj] little, small | 57.79% | |
| [conj] than, unequal comparison | less than, greater than, taller than… | 57.94% |
| [v] take | 58.09% | |
| [adj] only, lone, sole | 58.24% | |
| [adj] other | 58.39% | |
| [prep] in, within a range or bounds | It is in his power | 58.53% |
| [adj] some | 58.68% | |
| [v] get, fetch | 58.82% | |
| [adv] then, next (in order) | First jump, then run. | 58.96% |
| [n] state, country, political territory | The state of Ohio. The Secretary of State | 59.10% |
| [prep] to, for the purpose of | He quit to open his own store. | 59.24% |
| [prep] as, for, like, function, purpose | He used a rock (as, like, for) a hammer. | 59.38% |
| [n] day | 59.50% | |
| [adj] new | 59.63% | |
| [adv] now | 59.75% | |
| [n] first | 59.87% | |
| COMPARATIVE MARKER | bigger, faster, longer | 59.99% |
| [v] give | 60.11% | |
| [adj] any | 60.23% | |
| [v] use | 60.35% | |
| [v] work | 60.46% | |
| [prep] in, into or in a state or condition | cut in half... fell in love. | 60.57% |
| [adj] from [v]: capable of performing the action | abusive, reflective, adaptive, adhesive | 60.67% |
| [v] look | 60.78% | |
| [prep] by, subject of a passive | seen by many… was inspired by Plato. | 60.89% |
| [prep] after, following in time or space | After the game… The clowns came after the elephants. | 60.99% |
| [aux] may, allowed, permitted to | Yes, you may go outside. | 61.09% |
| [mod] from [adj] | (high) highly unusual, (absolute) absolutely essential | 61.20% |
| [adj] great | 61.30% | |
| [prep] for, duration | I was there for two hours. | 61.40% |
| [adj] from [n] of a type named by the noun | acidic, linguistic | 61.50% |
| [v] find | 61.59% | |
| [adj] long | 61.69% |