English Morphemes
Listed by Frequency of Use

Created July 31, 2010
Last Updated Aug 1, 2010

What is This?

This is a list of morphemes, or smallest units of meaning taken from typical English text as found in the Brown Corpus. Many words have been been split into their consituent parts so that the frequency of occurance of those parts could be tallied. For example, take the sentence:

I saw the boys running away.

In addition to the words themselves, there are three additional morphemes in this sentence. The tense of the verb see" is marked as past perfect, the noun boy is marked as plural with -s, and the verb run is marked as progressive with -ing.

I see+PST the boy+PL run+PROG away.

In addition to these standard morphemes, this list also seperates out derivational affixes by their function. For example, [n] from [v] means that some prefix or suffix was used that turned a verb into a noun, such as application from apply. Thus a sentence like:

He happily accepted the application.

Would be broken down into something like this:

He happy+([adv] from [adj]) accept+PST the apply+([n] from [v]).

A few non-standard derivational affixes are also included such as -N'T for deriving can't from can, and wouldn't from would.

This list, then, is the result of analyzing the Brown Corpus for the frequency of occurance of all morphemes and dereivational affixes. The first 35 entries account for just over 50% of all the text in the corpus and represent, therefore, the most important morphemes in English. The first hundred morphemes account for 60% of the corpus, but it takes the next 1200 morphemes to bring the total to 90%. Beyond that, another 4000 morphemes are required to account for the remaining 10%, minus a small residue of very rare words that occur less than 8 or 10 times in a million words of typical English text.

It should be noted that many words appear more than once in the list because they have been split into separate entries depending on their use. For example, the word object can be either the verb in I object to that remark. or the noun in Give me that object you are holding. The frequency of occurance for the word object is, therefore, split into two separate frequencies listed as two separate entries. The same is true for words that have many different meanings such as press which has entries for [v] push, [v] squeeze juice from, [v] iron, as clothing, [v] persuade, [v] continue (press on), [n] machine tool, [n] printing press, [n] juice extractor (wine press, apple press) [n] publisher, i.e. a small press, [n] printed news media, [n] people from the media (the press are here), [adj] important (pressing matters)... and so on.

In addition to splitting words into separate entries, several English affixes have been combined into single entries. Affixes that perform the same function are grouped together. For example, -al on a noun performs the same function as -ly and -ful on a noun, namely that of creating an adjective for "having the property named by the noun." This can be seen in the examples magic -> magical, friend -> friendly, and beauty -> beautiful. These are considered to be three forms of the same morpheme, and are listed as one entry.

When viewing the list remember that each entry represents a whole family of related and derived words, so that the entry [adj] new stands for such additional derived words as newly, newer, newest, newness and so on. Likewise entries like [n] natural include derived froms such as natural, naturally, unnatural, naturalistic, unnaturally, and so on.

Lumping and Splitting: The Rationale

Perhaps I should justify my choices and decision to lump surface forms and differing lexical forms. The principle reason for the list is to know which general types of morphemes to develop first for a new conlang. For example, since one would usually work out the whole system of personal pronouns as unit, rather than one at a time, those were lumped together.

Also, non-overlapping forms apply specifically to English syntax and etymology and may not reflect the way in which such derived forms are used in a conlang. Many prepositions have been split by their general meaning and then recombined with others that share that meaning, even if they are not applied in a syntactically identical manner in English.

The gerund form has been separated from the progressive and from the using of "-ing" as an adjective so that "I am running", "the running boy...", and "Running is not necessary" are represented by three different entries. The gerund form is far less frequent and appears later in the list, so it does not appear in the top 100 of the list.

As for different meanings of derived form, where different type of nouns can be derived from a single verb, these are noted, so that we might have "[n] from [v] the doer of the action (educator)", "[n] from [v] the result of the action (education)", and so on. Those that don't yet appear on the web page are on the list, but further down than the those that have been listed so far. Some of those may need clarification, and I will straighten them out as I get them tallied and formatted.

Where are the Number Words?

With a few exceptions like first, the number words like fifty and so on, have been left out of the list. This is beacuse when developing a conlang one usually works out the number system separately, and as a cohesive whole rather than one number at a time as needed.

The List

Here are the first 100 or so entries, accounting for almost 62% of all the morphemes in the corpus. The rest are still being worked on and formatted for the list. More will be added as they are completed.
(as of Aug 1, 2010)

MorphemeExampleCumulative
Usage
articles, definite and indefinite 8.71%
PERSONAL PRONOUNS + POSSESIVE + REFLEXIVEincluding "who"15.67%
PAST TENSE MARKER 22.02%
PLURAL MARKER 26.70%
and 29.21%
[n] from [v]: Result of the action.illumination; education; a meeting, the building, containment, betrayal, proposal, annoyance, appearance31.72%
[prep] of, belonging to, part of member of the club33.15%
COPULAThis is an apple. Apples are red.34.54%
PROGRESSIVE VERB MARKERI am running.35.92%
[prep] at, to, for, toward, in the direction oflook at; laugh at, went to Boston…Head for home.37.22%
[prep] of, about, containing, measure ofBook of poems. Box of nails. Pound of butter…38.46%
[prep] in, located withinI hear it in their voices. The mood in Cuba... In 1845…39.41%
PASSIVE VERB MARKERIt has been taken.40.33%
[adj] from [n] indicating presence of a qualitymagical, natural, friendly, extremely, graceful, beautiful41.24%
POSSESSIVE MARKERJohn's book. The queen of England's crown…41.92%
no, not, un-, dis-negation words and affixes42.60%
that, subord. ClauseI discovered that he already left.43.27%
[adv] from [adj]: newly, firmly 43.93%
[prep] in, with, on, by means ofHe was paid in cash. Hit it with a hammer. Fed on bread and water.44.51%
[prep] from, of, originating at, away or apart fromHe is from Boston. Get away from that. Francis of Assisi.45.08%
[pron] this, these, demonstrative pronounI want this cookie.45.64%
[prep] for, in, regarding, concerning, with respect toThanks for your help. specializes in physics. Check on the baby.46.20%
[adj] from [v]the falling water46.69%
what, which, which thingHere's what (which thing) I can do. (what/which) train do I take?47.17%
[adj] many, much, more, most(with comparative and superlative forms)47.63%
INFINITIVE MARKER 48.05%
[aux] must, have to, compelled toWe have to leave now.48.46%
[conj] but 48.86%
[n] from [v] or [n]: doer of the actionbaker, actor, sailor, computer, cameraman, mountaineer, equestrian 49.25%
[conj] or 49.59%
[v] have, hold, possesI have a book49.93%
PRESENT PERFECT MARKER 50.24%
[v] say 50.55%
[v] do 50.86%
[prep] at, location in time or spaceSee you at 5:00. He's at the store.51.15%
[n] from [adj]: Name of the attributeagressiveness, agility51.43%
[adj] all, total number or amount 51.71%
[n] there, that place 51.97%
PARTICIPLE MARKER 52.22%
[aux] would 52.46%
[prep] to, until, extending tofrom February to March52.69%
[prep] while, during, as, in atat night, In his travels he saw… As he left the room…52.92%
[prep] for, purpose, beneficiaryThis book is for Tom. ...donations for children's education.53.15%
[prep] on, upon, located upon, located at a certain timeon this site... My hat is on my head. On the fourth of July.53.37%
[conj] whenI'll see you when you get home.53.60%
[prep] by, by means of, usingWe got here by train.53.82%
that, which, who: restrictive clauseThe boy that ran away54.03%
[prep] with, cooperation, interaction, co-participationWork with me... Don't argue with me. He is with the FBI.54.25%
[n] man, male human 54.46%
[adv] out, away from the inside 54.67%
[adv/adj] up, upward 54.86%
[conj] ifI will see you if you get here on time.55.05%
FUTURE VERB MARKER 55.24%
[aux] can, be able toI can run.55.43%
-N'Tcan't, shouldn't55.62%
[v] gomotion in general or motion away from a referenced location: Go away.55.81%
[n] time 55.99%
[prep] of, point to an attributecost of government; top of the mountain; the inside of the box...56.18%
[pron] that, those, demonstrative pronounI want that cookie.56.35%
[v] see 56.52%
[prep] into, toward the inside ofWe went into the cave.56.68%
[prep] as, role, identificationknown as Robin Hood... identified as the man who..., in his role as Hamlet..., Vincent Price as Captain Hook.56.85%
[v] comemotion toward a referenced location: Come here.57.01%
[v] know 57.17%
could 57.33%
[v] make, create, cause, constructmake a mess; make them aware…57.49%
[n] year 57.64%
[adj] little, small 57.79%
[conj] than, unequal comparisonless than, greater than, taller than…57.94%
[v] take 58.09%
[adj] only, lone, sole 58.24%
[adj] other 58.39%
[prep] in, within a range or boundsIt is in his power58.53%
[adj] some 58.68%
[v] get, fetch 58.82%
[adv] then, next (in order)First jump, then run.58.96%
[n] state, country, political territoryThe state of Ohio. The Secretary of State59.10%
[prep] to, for the purpose ofHe quit to open his own store.59.24%
[prep] as, for, like, function, purposeHe used a rock (as, like, for) a hammer.59.38%
[n] day 59.50%
[adj] new 59.63%
[adv] now 59.75%
[n] first 59.87%
COMPARATIVE MARKERbigger, faster, longer59.99%
[v] give 60.11%
[adj] any 60.23%
[v] use 60.35%
[v] work 60.46%
[prep] in, into or in a state or conditioncut in half... fell in love.60.57%
[adj] from [v]: capable of performing the actionabusive, reflective, adaptive, adhesive60.67%
[v] look 60.78%
[prep] by, subject of a passiveseen by many… was inspired by Plato.60.89%
[prep] after, following in time or spaceAfter the game… The clowns came after the elephants.60.99%
[aux] may, allowed, permitted toYes, you may go outside.61.09%
[mod] from [adj](high) highly unusual, (absolute) absolutely essential61.20%
[adj] great 61.30%
[prep] for, durationI was there for two hours.61.40%
[adj] from [n] of a type named by the nounacidic, linguistic61.50%
[v] find 61.59%
[adj] long 61.69%




<< BACK HOME