Nonword databases

On this page I post nonword databases of interest for educational or research purposes. If you know any other good resource, please send me a message!

ARC Nonword Database

The ARC Nonword Database contains 358,534 monosyllabic nonwords, 48,534 pseudohomophones, and 310,000 non-pseudohomophonic nonwords. Items can be selected from the ARC Nonword Database on the basis of a wide variety of properties known or suspected to be of theoretical importance for the investigation of reading.

Rastle, K., Harrington, J., & Coltheart, M. (2002). 358,534 nonwords: The ARC Nonword Database. The Quarterly Journal Of Experimental Psychology. A, Human Experimental Psychology, 55(4), 1339-1362.


Wuggy: A multilingual pseudoword generator

Wuggy is a pseudoword generator particularly geared towards making nonwords for psycholinguistic experiments. Wuggy makes pseudowords in Basque, Dutch, English, French, German, Serbian (Cyrillic and Latin), Spanish, and Vietnamese.

Keuleers, E., & Brysbaert, M. (2010). Wuggy: A multilingual pseudoword generator. Behavior Research Methods, 42(3), 627-633.


The English Lexicon Project

The English Lexicon Project affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords. The goal of the English Lexicon Project is to collect normative data for speeded naming and lexical decision for over 40,000 words across 1200 subjects at 6 different universities. These data will be integrated into a database along with descriptive characteristics of the words used in the study. Researchers interested in psycholinguistics, human memory, computational modeling, and other fields will find these data useful.

Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., et al. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445-459.



WordGen is a program that uses the CELEX and Lexique lexical databases for word selection and nonword generation in Dutch, English, German, and French. Items can be generated specifying any combination of seven linguistic constraints: number of letters, neighborhood size, frequency, summated position-nonspecific bigram frequency, minimum position-nonspecific bigram frequency, position-specific frequency of the initial and final bigram, and orthographic relatedness.

Duyck, W., Desmet, T., Verbeke, L. P. C., & Brysbaert, M. (2004). WordGen: A tool for word selection and nonword generation in Dutch, English, German, and French. Behavior Research Methods, 36(3), 488-499.