We created 3085 unique sentences; we then presented each sentence to an online participant without the last word, and asked them to provide the most likely response. For example,

He hated bees and feared encountering a _______________.

Common responses were "hive" and "beehive", but less common (and still appropriate) responses included "disease" or "yellowjacket". We count the frequency of each response to provide the likelihood of any given response as a proportion:

1. He hated bees and feared encountering a __________.

        * hive (0.42)
        * swarm (0.20)
        * bee (0.09)
        * nest (0.08)
        * wasp (0.06)
        * beehive (0.05)
        * sting (0.04)
        * stinger (0.03)
        * hornet (0.02)
        * disease (0.01)
        * yellowjacket (0.01)
        * No Response (0.01)

The code we used to generate these norms is available from http://github.com/jpeelle/sentence-prediction with the resulting norms in the examples folder. The key files are:

output.md - a text file (in Markdown format) of all 3085 sentences and responses with corresponding probability of response
sentences.csv - a text file (which you can open in Excel) with one row per sentence. In addition to the responses, it contains sentence-level information such as the number of responses and the response entropy.

For more information on how these norms were generated, please see this paper:

(coming soon)

If you use these norms please cite the paper. Please post any questions to the GitHub page (under “issues”).

The Speech, Hearing, And Communication (SHAC) Lab