The Corpus of Spoken Istrovenetian/Fiuman and Croatian (C-ORAL-IC)
Keywords:
jezično uzorkivanje; govoreni korpusi; kodno preključivanje; dvojezični diskursAbstract
Bilingual conversational corpora are invaluable for studying genuine contact phenomena in spontaneous bilingual speech. This paper presents the Corpus of Spoken Istrovenetian/Fiuman and Croatian (C-ORAL-IC), the first corpus documenting unscripted Istrovenetian and Fiuman dialects spoken among bilinguals in the Istrian and Kvarner areas of Croatia. The region has a long history of Croatian and Italian cultural and linguistic interaction, shaping a complex sociolinguistic system with diglossic and polyglossic relations. C-ORAL-IC includes data from 87 bilingual/multilingual speakers and features over 85,000 tokens and 27,000 types. Available on TalkBank (BilingBank subsection) [https://talkbank.org, https://biling.talkbank.org/access/C-ORAL-IC.html], it includes transcribed, phonologically adapted, coded, segmented and morphologically tagged recordings. Additional participant data on language history and usage are available. C-ORAL-IC provides a rich resource for exploring spontaneous bilingual speech, offering insights into conversational features, structure, and synchronic changes in Istrovenetian/Fiuman.