The 21st Century COE Program "Usage-Based Linguistic Informatics"(2002-2006)

Multilingual Corpora

The collection of linguistic usage data based on basic research continued in fiscal year 2004 and analysis of this data was commenced. Also during this time, field research and retrieval/collection of language resources via the Internet was conducted by graduate students. Improvement of our discourse corpus, which we began constructing in fiscal year 2002, continued. Construction of function-specific and objective-specific multilingual corpora was carried out following integration of these data. Function-specific and objective-specific multilingual corpora are aimed at linguistic functions such as desires, questions and permission or grammatical research on subjects such as qualification, voice and modality, and they provide essential material for both linguistic and discourse analysis. These data are reclassified from the perspectives of basic and field-specific vocabulary, and then they are used to construct lexical category corpora (LC corpora). LC corpora are useful for extracting cooccurrence vocabulary relationships in specific fields. Function-specific and objective-specific corpora as well as LC corpora are not only applied to linguistic research, but can also be used for the development and improvement of the Dialogue and Vocabulary Modules. Moreover, development of advanced teaching materials for liberal arts courses such as those related to linguistics and cultural research can be carried out based on objective-specific corpora.

Language Function-Specific Corpus


Research Objective-Specific Corpus


Linguistic Culture Portal Site