Technical details

This project is co-ordinated by Nick Thieberger as part of his ARC-funded Future Fellowship grant. He arranged with the National Library of Australia to have all the microfilmed images from Section XII of the Bates papers digitised. The 24,670 images were renamed following the NLA's manuscript naming convention, and the typescripts in that collection (some 4,000 pages) were sent to be typed. The typing for this questionnaire-based material used tables to distinguish the words and their meanings. When we got the typed versions back we added tags to the content.

Conal Tuohy designed the structure of the dataset according to the Text Encoding Initiative TEI: P5 Guidelines, to embody both a facsimile of the original set of manuscripts and a structured dataset for complex research questions.

Sasha Wilmoth and Alice Kaiser-Schatzlein wrote the python script to generate alternate spellings.

Where possible, each language represented in a wordlist is identified and words from that language are tagged to distinguish them from English terms for searching. Places, language names, 'tribe' or local group names, and individual names are also tagged to allow them to be searched. Each document is also geocoded so it can be presented on the map of words.

Only words that occur in the questionnaire (around 1800 words, listed here) are presented on the map of words.

Using the "Map of vocabularies" allows you to look at any given vocabulary and to see where it was recorded.


How you can help!

There is still lots to do to fix the relationship between the text and the images of manuscripts. If you notice any mismatches or missing images, please let us know so we can fix it. Thanks!


Note that all 298 images in folder 46 are not presented in these webpages. All 33 of the the items in folder 46 are copied from existing published sources (like the Science of Man or Curr wordlists) and so do not need to be republished here.

Note that all 1,024 items in folders 37 and 38 are not included here. These represent typed notes on the grammar of Western Australian languages.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

The moral rights of the speakers are asserted