Mandarin Topic-oriented Conversation Corpus (Academia Sinica)

Type

Collection

Collection Identifier

c0518

Description

The Mandarin Topic-oriented Conversation Corpus (MTCC) was recorded in 2002, from January to March. The conversations are natural discussion between two familiar persons. The topic of the conversations is on one chosen event happened in 2001. There are in total 60 speakers (age: from 14 to 63), 30 conversations. The total length of data is 11 hours, and the average length of each conversation is 22 minutes. The annotation system is designed to mark discourse functions in natural conversations. Opening, main discussion and closing are the three main parts of a natural, topic-oriented conversation. The main discussion contains discourse functions intended to start a discussion, to negotiate a topic, to introduce a topic, to talk about a topic, and to end the discussion. In order to build a multimodal database together with the metadata of the transcription texts, all sound files are segmented and stored in stereo files. The total size is 6.78GB. With the help of Translist, 29 conversations (185,000 characters) are completely transcribed and annotated.

Language

English;Chinese 

Rights

link

Subject

Language

Temporal Coverage

2001

Dates Collection Accumulated

2002 

Owner

Institute of Linguistics, Academia Sinica

Is Located At

Institute of Linguistics, Academia Sinica

Is Accessed Via

link: http://mmc.sinica.edu.tw/mtcc_e.htm

Super-Collection

Language

Associated collection

Academia Sinica Tagged Corpus of Early Mandarin Chinese(Institute of Linguistics, Academia Sinica);Formosan Language Archive(Institute of Linguistics, Academia Sinica);Southern-Min Archive: A Database of Historical Change in Language Distribution(Institute of Linguistics, Academia Sinica);