I am working on something that needs a list of words, without regard to american vs. british or accents or anything: just has to be as many words as possible. There are a whole bunch of aspell dictionaries available. First expand the files:
preunzip *.wl
Then merge into a single list, eliminating duplicates:
sort –unique –ignore-case *.wl >list.txt
In additon, I want everything to be UTF-8:
iconv -f ISO8859-1 -t UTF-8 list.txt >ulist.txt
Pretty simple. The merged english word list has 137,883 words.
February 6, 2009 at 12:05 pm |
I couldnt find a .wl file in the dictionary package i wished to extract wordlist for.
then i figued out that i need to uncompress the .cwl file and NOT the .wl file
preunzip *.cwl
rest of the steps are pretty much the same