Roselab
Home page of Ralph L. Rose at Waseda University
Roselab
R. Rose at Waseda Univ.
Roselab
 

This is the main information and download page for the application Word Quiz Constructor. You can download the application from the link below. Further below is the Readme.md file (also contained in the download archive) which explains how to set up and use the application.

Download link: Word Quiz Constructor (current version) (220 Mb)

Following are several publications and presentations related to this application.

  • 2020 Rose, R. “Improving the Production Efficiency and Well-formedness of Automatically- Generated Multiple-Choice Cloze Vocabulary Questions”. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 7096–7103. [link]
  • 2016 Rose, R. "Automatic Word Quiz Construction Using Regular and Simple English Wikipedia". In Proceedings of the International Technology, Education and Development Conference (INTED), pp. 8032–8040. [link]
  • 2014 Aug. Rose, R. “WQC: A tool for quick automatic word quiz construction” International Association for Applied Linguistics (AILA), Brisbane, Australia.
  • 2014 Jul. Rose, R. “Automated vocabulary quiz creation using online and offline corpora”. Teaching and Language Corpora (TaLC) conference. Lancaster University, UK.

Word Quiz Constructor (Readme.md)

by Ralph L. Rose
Last updated: 2020 June 6

Thank you for downloading Word Quiz Constructor (WQC). This Readme.md file is intended to help you get the program set up and running. Please note that this is probably not for the faint-of-heart. Users will need some familiarity with how to run command-line programs and edit settings files to insert file paths.

Quickstart

For those who know what they're doing, here's a really brief quickstart guide for WQC.

  1. Install Java runtime (if not already installed): https://www.java.com/download/
  2. Unzip the WQC archive wherever you like.
  3. Open settings.ini in a text editor and change every instance of "/path/to/WQC" to the actual path to where you put the WQC folder. Also, update the proxy info or delete if unnecessary.
  4. At the command line, run the following command.
/path/to/runtime/java -jar /path/to/WQC/WordQuizConstructor.jar
 -e "/path/to/WQC/settings.ini"

If all is successful, after a little while (1-2 mins on my computer), you should see a very short quiz file named output.txt in the output folder.

Java runtime

WQC requires the Java runtime environment (JRE). [Note that you do NOT need the development environment, known as JDK.] Following is a link to the java.com web site where the JRE can be downloaded for free.

https://www.java.com/download/

Set up

Unzip the WQC archive wherever you wish. Inside the main folder are the following.

  • Readme.md - this file
  • WordQuizConstructor.jar - the main program file
  • settings.ini - the settings file
  • resources - a folder of various local resources that WQC depends on

Basic settings (settings.ini)

Before WQC can work, several file/folder paths need to be set in the settings.ini file. The complete file path from root (e.g., "C:" on Windows, "/" on Linux) will need to be specified. In most cases, the paths are simply subfolders of the WQC home directory, so you can just replace "/path/to/WQC" with the appropriate path.

If you are using WQC behind a proxy, then set the proxy information as shown. Otherwise, delete the lines or at least delete the values.

Running WQC

WQC is a command-line application, and customization is down through command-line arguments. The basic format to run WQC would be as follows.

/path/to/java/runtime -jar WordQuizConstructor.jar
-e /path/to/settings.ini [other arguments...]

The arguments are explained in detail further below.

A WQC "quiz"

One run of WQC will generate one "quiz" as defined by the user. At present, WQC generates vocabulary quiz items based on the Coxhead (2000) Academic Word List (AWL) sublists. WQC can generate three types of questions testing these words, as follows.

Multiple choice cloze items (multichoice_cloze)

A question with a stem sentence containing a blank which corresponds to one AWL word, the key. After the stem, a (custom) number of options are given including the key and distractor words.

On the local level Benum was ________ in local politics in Verdal municipality
from 1959 to 1979.
a. involved   b. constituted   c. similar   d. uncontextualised

Multiple choice synonym items (multichoice_synonym)

A question with a stem sentence with one word 'highlighted' -- the target AWL word. Then, a (custom) number of options are given from which the user must choose the closest synonym.

The photo and data are {processed} and a physical card is printed on
hi-quality photo paper and sent from the USA to any destination.
Which of the following words is closest in meaning to the root word 'process'?
a. authoritative   b. contracted   c. treat   d. constituted

Free response cloze items (free_cloze)

A question with two stem sentences each containing a blank corresponding to the same AWL word, the key. Also, a 'hint' with the first letter of the word and a definition are given. Then, a blank space is given for testees to write a suitable word freely.

(a) On the supply side, it is ______ that lenders maximise expected profits
in a competitive market.
(b) The levels in AB could be ______ to be too low to induce pharyngeal cell fates.
Hint: This word begins 'a' and can be defined as 'take to be the case or to be true'
Answer: ____________    (assessed)

Quiz definition

The key to producing a quiz, then, is the definition which is a character string consisting of the name of an item type and the number desired separated by a colon (:). For a quiz with multiple types, the type definitions are separated by a comma (,). Hence,

multichoice_cloze:8,multichoice_synonym:5,free_cloze:3

generates a quiz with 8 multiple choice cloze items, 5 multiple choice synonym items, and 3 free response cloze items, in that order.

multichoice_synonym:6,multichoice_cloze:10

generates a quiz with 6 multiple choice synonym items and 10 multiple choice cloze items, in that order.

Command-line arguments

This section explains the various command-line argument options.

Quiz structure: -s

Possible values: As described just above in "Quiz definition".

Default value: multichoice_cloze:1,multichoice_synonym:1,free_cloze:1

-s multichoice_synonym:6,multichoice_cloze:10

Target context frequency (minimum) threshold: -t

Possible values: integer number

Default value: 100

This is the floor threshold for the frequency of trigram containing the target word at its center (e.g., if the target is analysis, then the trigram might be detailed analysis of).

A high threshold is desirable here, so that WQC will choose contexts in which the target word often appears. However, if the threshold is set too high, then WQC may reject many candidate stems and take longer to produce items.

The optimal value here will depend on the frequency list used. In my experience, I have found that with the BAWE frequency list, a value of 2 is suitable. But for Google Books NGrams (i.e, phrasefinder), a value of 100 is suitable.

-t 100

Distractor context frequency (maximum) threshold: -p

Possible values: integer number

Default value: 40

This is the ceiling threshold for the frequency of the trigram containing the target word at its center, but where the target is replaced with a candidate distractor (e.g., if the target is analysis and the trigram context is detailed analysis of, and a candidate distractor is income, then the test context would be detailed income of)

In contrast to the target context frequency should be relatively low. If it is too high, then there is greater risk that the distractor context is plausible.

The optimal value here will depend on the frequency list used. In my experience, I have found that with the BAWE frequency list, a value of 0 is optimal. For Google Books NGrams (i.e., phrasefinder), 40 is suitable.

-p 40

Number of options: -a

Possible values: integer number

Default value: 4

This determines the number of options WQC will provide for multiple choice items.

-a 4

Target AWL sublist tag: -g

Possible values: Coxhead AWL Sublist N where N = 1 to 10

Default value: Coxhead AWL Sublist 1

Multiple sublists may be selected by adding a comma, but with no space:

-g "Coxhead AWL Sublist 1,Coxhead AWL Sublist 2"

Output format: -o

Possible values: text, csv, docx, moodlexml, quizlet

Default value: text

No matter which output format is selected, a text-formatted version will (also) be output.

The csv format was designed specifically for upload to a learning management system at my university that is now defunct. But with the right mapping, it can probably be imported into other learning management systems.

The quizlet format is also somewhat specialized and will only output multichoice cloze and free cloze items, but in the following way:

  • For multichoice cloze: target word [tab] stem (with blank)
  • For free cloze: target word [tab] gloss (from WordNet)

These can be readily imported into the Quizlet online flashcard tool.

-o docx

Frequency list: -n

Possible values: bawe, phrasefinder

Default value: phrasefinder

The BAWE frequency list is based on the texts in the BAWE corpus. The phrasefinder option is an on-line service that provides frequency info based on the Google Books NGrams.

-n phrasefinder

Corpus source for stem sentences: -c

Possible values: bawe, en_wikipedia, simple_wikipedia

Default value: en_wikipedia

The candidate stem sentences can be drawn from BAWE (off-line) or from Wikipedia (on-line). As for the latter, it is possible to choose between the regular English version of Wikipedia or the simple English version. The latter is a better source when trying to build vocabulary quiz items for English as a second/foreign language learners.

-c simple_wikipedia

Output file name: -f

Possible values: filename string consistent with system filename restrictions

Default value: output.txt

-f awl_quiz_sublist_1.docx

Verbose mode: -v

Possible values: none

Default: not verbose

This option has no possible values. If included on the command-line, WQC will be more verbose in its output as it goes through the process of generating a quiz. Note that printing to standard output takes some time. When producing a long or large amounts of quizzes, it may be better to NOT include the verbose option.

Unique targets: -i

Possible values: none

Default: not unique

This option has no possible values. When included on the command-line, WQC will try to make sure that there are no target words repeated within the quiz. However, multiple words from the same word are allowed (e.g., analyzes and analysis). When using this option, be careful that the number of items in the quiz is not larger than the number of items in the selected sublist(s), otherwise, WQC may be stuck in an infinite loop...

Readability index algorithm: -r

Possible values: ari, linsear, flesch

Default: linsear

Three possibility readability indexes are used to estimate the difficulty level of candidate stem sentences: Automated Readability Index (ARI), Linsear Write (LW), and Flesch-Kincaid (FK). Any of these may be selected. Preliminary testing suggests that LW is the best of these, producing items slightly faster than the other options.

-r linsear

Readability index limit: -l

Possible values: integer number

Default value: 12

This represents the ceiling threshold for the readability of a candidate stem sentence as computed by the readability index algorithm above. The index approximates the US grade level: 1-6 is US elementary school, 7-12 is junior high and high school, above 12 is college and beyond.

Setting this number too low will it very difficult for WQC to find stem sentences. In my experience, I have found that 12 is a suitable value for English as a foreign languages learners. A higher number might be feasible for tests of native English speakers in, say, a university setting.

-l 12

Settings file: -e

Possible value: character string representing system path and filename

-e "/path/to/WQC/settings.ini"

Examples

/path/to/runtime/java -jar /path/to/WQC/WordQuizConstructor.jar
-e "/path/to/WQC/settings.ini"

This is the bare minimum that would be need to get WQC to run and output something. It will output a quiz with one question each of multiple choice cloze, multiple choice synonym, and free cloze with target words drawn from Sublist 1.

/path/to/runtime/java -jar /path/to/WQC/WordQuizConstructor.jar
-e "/path/to/WQC/settings.ini" -g “Coxhead AWL Sublist 1”
-s multichoice_cloze:10,multichoice_synonym:5,free_cloze:5
-a 5 -c en_wikipedia -n phrasefinder -t 100 -p 40 -r linsear
-l 12 -o docx -f awl_quiz_sublist_1.docx

This example shows a complete specification of nearly all the command-line arguments. This would create a 20-item quiz drawing stems from English Wikipedia where the multiple choice questions have 5 options (including the key), uses the Google Books NGrams (phrasefinder) frequency with 100 as a target threshold and 40 as a distractor threshold, uses the Linsear Write readability algorithm with a limiting threshold of 12 and outputting the quiz in docx format to a file named awl_quiz_sublist_1.docx.

/path/to/runtime/java -jar /path/to/WQC/WordQuizConstructor.jar
-e "/path/to/WQC/settings.ini" -g “Coxhead AWL Sublist 3”
-s multichoice_cloze:150 simple_wikipedia -o moodlexml
-f awl_quiz_sublist_3.xml

This example mostly uses the default settings but creates a set of 150 multiple choice cloze items in the Moodle XML format. Thus, it could be imported into the Moodle Quiz Bank and used to form a quiz which draws randomly from the 150 items.