Pattern Word Dictionary
Gary J. Shannon
Created 2002
Updated June 25, 2010
Changes to the New Pattern Word Dictionary
The original pattern word dictionary that I created in 2002 had over 100,000 entries. Unfortunately, those entries included a lot of garbage resulting from typos in the scanned source corpora and rare words used only in one document. Over the years I tried on several occasions to clean up the file by hand, but the shear size of it made that task impossible to do by hand. Thus, the need for this new and improved pattern word dictionary version 2.0.
This pattern word dictionary was created by extracting words and counting their frequencies of occurrence from The Brown Corpus as well as the text of a dozen books from Project Gutenberg. To insure that the words collected were legitimate words and not typographical errors in the source documents, I only save words that occurred enough times to put them in the top 10,000 words collected. Approximately 700 of those words were filtered out manually, leaving approximately 9300 words. This is a much smaller pattern word dictionary than my previous one, but the quality of the word list is much higher, making the dictionary more useful overall.
In addition, the format of the entries has changed. Previously a pattern word such as "position" would have been coded as ABCDEDBF where like letters in the original word became like letters in the pattern code.
In the new dictionary I've eliminated the letters that are not part of the pattern, leaving a cleaner, easier to read pattern code. The same word "position" is now coded as -A-B-BA- where only letters that are duplicated somewhere in the word are assigned pattern letters.
Except for those changes, what follows below is the text from my original 2002 pattern word dictionary page.
Pattern Word Dictionary
In solving cryptograms I have often resorted to a little printed pattern word dictionary I have, but I was frustrated by how small it was and how often the needed word was not in the dictionary. I decided to start building my own pattern word dictionary, one that could continue to grow as I scanned more documents for more words to add to it.
Among the several methods of notating patterns I've used the convention that letters that appear more than once in a word are assigned to the next sequential letter of the alphabet. Letters that are not duplicated anywhere in the word are replaced by hypens. For example the word "that" has the pattern A--A because the first and last letters are the same, but the other letters do not duplicate. The word "impossible" will have the pattern A---BBA--- highlighting exactly where to look for duplicated letters.
Sorted by Frequency
One important feature is that the words are not sorted alphabetically within each pattern, but are sorted according to how frequently that word appears in all the text files scanned. Thus under the pattern A--A the word "that" occurs before the word "area" because "that" is a much more commonly used word.
Recent Updates
Download the Dictionary
The entire text of this pattern word dictionary can be downloaded here. (87 KB)
< Back Home