The Wayback Machine - https://web.archive.org/web/20111227154717/http://www.tau.ac.il:80/humanities/semitic/cosih.html

îàâø äòáøéú äîãåáøú áéùøàì (îòî"ã)

The Corpus of Spoken Israeli Hebrew (CoSIH)

ìâéøñä òáøéú ä÷ù  ëàï.

Plans for The Corpus of Spoken Israeli Hebrew (CoSIH) started to take shape in 1998. The model according to which CoSIH would be compiled was to consist of a thousand sets of recordings ("cells") with 5000 words each, i.e., a corpus of 5M words. This model was first described in a previous version of this internet page which has eventually found its place in Hary and Izre'el 2003. A more sophisticated model has been published in Izre'el, Hary and Rahav 2001.

Although at the initial stages of modeling the corpus-to-be and preparing our steps towards its compilation we designed a pilot of 20 sets of 3-hour recordings, we have eventually ended up with some 50 sets, each including between 8 to 16 hours of uninterrupted recording of everyday speech (Izre'el and Rahav 2004). Therefore, while lack of enough financial support has prevented us from continuing our initial plans, we believe that the recordings hitherto made, which will become available through this channel, will form a nice collection of texts, enough to give a solid database of Spoken Israeli Hebrew, and thus to enhance its study by ways hitherto unavailable to the research community.

Since the first recordings were made in 2001, we have made some progress in transcribing them, mainly by students using these data for research in course papers, seminar papers, MA theses (ëäï úùñ"ã; æéìáø-åøåã úùñ"ä) and doctoral dissertations. At this stage (summer 2009), we are editing and preparing these texts for dissemination with the aid of an Israel Science Foundation (ISF) grant given to Esther Borochovsky Bar-Aba fo her research on concise utterances in spoken Israeli Hebrew. The recordings will be disseminated electronically using the alignment software ELAN, still to be further developed for web publication for CorpAfroAs, the Corpus of spoken Afroasiatic languages. We hope that CoSIH will be disseminated, at least with its first bulk of texts, in 2011. These texts will include each at least 1000 to 2000 words, and – at least in the initial stages – will consist mainly of recordings of native speakers.

Transcription of most texts will be in the standard Hebrew orthography (Izre'el 2004), prosodically parsed into prosodic groups (=intonation units) (Amir, Silber-Varod and Izre'el 2004; Izre'el 2005; Izre'el and Silber-Varod forthcoming). A small sample of texts has been published (in transcription only) in éæøòàì úùñ"á(à). A sample of texts with ca. 42,000 words in preliminary transcriptions (with no prosodic marking) has been used as a basis for a tagged corpus by Dalia Bojan in the MILA site of the Technion. 

CoSIH was initiated and designed by a team of Israeli and international scholars:

Core team: Shlomo Izre'el, Tel-Aviv University (director); Benjamin Hary, Emory University (principal investigator); John Du Bois, University of California at Santa arbara (corpus analyst); Mira Ariel, Tel-Aviv University (discourse analysis and pragmatics); Giora Rahav, Tel-Aviv University (statistics and sociology).

Advisory board: Eliezer Ben-Rafael, Tel Aviv University (sociolinguistics – sociological aspects); Yaakov Bentolila, Ben Gurion University (sociolinguistics – linguistic aspects); Otto Jastrow, Universität Erlangen-Nürenberg (transcription, phonology, dialectology); Shmuel Bolozky, University of Massachusetts at Amherst (phonology, morphology); Geoffrey Khan, Cambridge University (syntax); Elana Shohamy, Tel Aviv University (language education).

 

For contact click here

 

Bibliography (including studies that have been based – partly or wholly – on CoSIH texts)

àåæøåá, ôàáì. úù"ò. îé÷åí ääèòîä åîáðä äîñø áùéç äñôåðèðé áòáøé. òáåãú âîø ì÷øàú úåàø îåñîê. äàåðéáøñéèä äòáøéú, éøåùìéí.

■ âåðï, àéìï. úùñ"è. îåøôåôåðåìåâéä ùì äùåøù áôåòì áòáøéú éùøàìéú îãåáøú. òáåãú âîø ì÷øàú úåàø îåñîê. àåðéáøñéèú úì-àáéá.

■ âåðï, òéðú. úùñ"å. ðèééú ùí äòöí áòáøéú äîãåáøú: úäìéëé çéèåó. çéáåø ìùí ÷áìú úåàø ãå÷èåø ìôéìåñåôéä. äàåðéáøñéèä äòáøéú, éøåùìéí.

æéìáø-åøåã, åøã. úùñ"ä. îàôééðé âáåìåú ùì éçéãåú ôøåæåãéåú áòáøéú äãáåøä: ðéúåç úôéñúé åà÷åñèé. òáåãú âîø ì÷øàú úåàø îåñîê. àåðéáøñéèú úì-àáéá.

■ éæøòàì, ùìîä. úùñ"á(à). îàâø äòáøéú äîãåáøú áéùøàì (îòî"ã): ãåâîàåú è÷ñèéí. ìùåððå ñ"ã: 314-289. (áîàîø æä âí ãåâîàåú àçãåú ìè÷ñèéí îúåòú÷éí áúòúé÷ öø; úòúé÷éí àìä ùåôøå åäåòìå òì äàéðèøðè òí ää÷ìèåú òöîï: http://www.tau.ac.il/humanities/semitic/meeting.pdf ; http://www.tau.ac.il/humanities/semitic/meeting.wav ; http://www.tau.ac.il/humanities/semitic/cardrive.pdf; http://www.tau.ac.il/humanities/semitic/cardrive.wav; http://www.tau.ac.il/humanities/semitic/folkstory.pdf; http://www.tau.ac.il/humanities/semitic/folkstory.wav

■ éæøòàì, ùìîä (òåøê), áñéåòä ùì îøâìéú îðãìñåï. úùñ"á(á). îãáøéí òáøéú: ìç÷ø äìùåï äîãåáøú åäùåðוּú äìùåðéú áéùøàì. (úòåãä, éç.) úì àáéá: àåðéáøñéèú úì àáéá.

     îàîøéí áëøê æä äòåñ÷éí éùéøåú áëéðåï îòîã:

              øäá, âéåøà, ãâéîú àåëìåñééä ìëéðåï îàâø îééöâ, òî' 445-439;

              äøé, áðéîéï åùìîä éæøòàì, äîåãì äúëðåðé ùì îàâø äòáøéú äîãåáøú áéùøàì (îòî"ã), òî' 458-447
                    (ú÷öéø; ìîàîø äùìí ø'
http://www.tau.ac.il/humanities/semitic/maamad2000.html åáàðâìéú ìäìï Hary and Izre'el 2003);

              寸åÌí, øâéðä, äòøåú îúåãåìåâéåú òì ëéðåï îàâø äòáøéú äîãåáøú áéùøàì (îòî"ã), òî' 477-459.

         

■ éæøòàì, ùìîä. úùñ"â-úùñ"ã. îç÷ø äòáøéú äîãåáøú: äöòã äøàùåï - òì øéùåí äãéáåø ìöøëé îç÷ø. ìùåððå ìòí ð"ã: 601-911.

■ éæøòàì, ùìîä. îãéáåø ìúçáéø – îúéàåøéä ìúîìéì. áãôåñ. áúåê: îùä áø-àùø åçééí ëäï (òåøëéí). ñôø éåáì ìëáåã àäøåï ãåúï.

■ éæøòàì, ùìîä. úùñ"â. îàâø äòáøéú äîãåáøú áéùøàì (îòî"ã); ùìá à': áãé÷ä èøåîéú - ãå"ç øàùåðé. áúåê: ãðéàì ñéåï åôáìå-éöç÷ äìåé-÷éøèö'å÷ (òåøëéí). ÷åì ìéò÷á: àñåôú îàîøéí ìëáåã ôøåô' éò÷á áï-èåìéìä. (àùì áàø ùáò: îç÷øéí áîãòé äéäãåú, 8.) áàø-ùáò: àåðéáøñéèú áï-âåøéåï áðâá. 222-211.

■ éæøòàì, ùìîä, áðéîéï äøé åâéåøà øäá. úùñ"á. ì÷øàú ëéðåï îàâø äòáøéú äîãåáøú áéùøàì. ìùåððå ñ"ã (úùñ"á): 287-265.

■ éæøòàì, ùìîä ååøã æéìáø-åøåã. áãôåñ. àåîø ìðúç ìðúç: òì úôéùú ä÷áåöä äôøåæåãéú áòáøéú äîãåáøú. áìùðåú òáøéú 64-63.

■ ëäï, ñîãø. úùñ"ã. åùàéðå éåãò ìùàåì - îä äåà àåîø? – ãøëé äùàìä áòáøéú äîãåáøú. òáåãú âîø ì÷øàú úåàø îåñîê, àåðéáøñéèú úì-àáéá.

Amir, Noam, Vered Silber-Varod and Shlomo Izre'el. 2004. Characteristics of Intonation Unit Boundaries in Spontaneous Spoken Hebrew: Perception and Acoustic Correlates. In: Bernard Bel and Isabelle Marlien (eds.). Speech Prosody 2004, Nara, Japan, March 23-26, 2004: Proceedings. 677-680.

Dekel, Nurit. 2010. A Matter of Time: Tense, Mood and Aspect in Spontaneous Spoken Israeli Hebrew. PhD dissertation, The University of Amsterdam. Amsterdam: LOT.

Hary, Benjamin H. 2003. (ed.). Corpus Linguistics and Modern Hebrew: Towards the Compilation of The Corpus of Spoken Israeli Hebrew (CoSIH). Tel Aviv: Tel Aviv University, The Chaim Rosenberg School of Jewish Studies.

            Papers in this volume dealing directly with the compilation of CoSIH:

                Rahav, Giora, Population Sampling for the Establishment of a representative Corpus, pp. 181-188.

                Hary, Benjamin and Shlomo Izre'el, The Preparatory Model of The Corpus of Spoken Israeli Hebrew (CoSIH), pp. 189-219.

                Werum, Regina E., Methodological Remarks on Creating the Corpus of Spoken Israeli Hebrew (CoSIH), pp. 221-241.

Izre'el, Shlomo, Benjamin Hary and Giora Rahav. 2001. Designing CoSIH: The Corpus of Spoken Israeli Hebrew. International Journal of Corpus  Linguistics 6: 171-197.

Izre'el, Shlomo. 2004. Transcribing Spoken Israeli Hebrew: Preliminary Notes. In: Dorit Diskin Ravid and Hava Bat-Zeev Shyldkrot (Eds.). Perspectives on Language and Language Development: Essays in Honor of Ruth A. Berman. Dodrecht: Kluwer. 2004. 61-72.

Izre'el, Shlomo. 2005. Intonation Units and the Structure of Spontaneous Spoken Language: A View from Hebrew. In: Cyril Auran, Roxanne Bertrand, Catherine Chanet, Annie Colas, Albert Di Cristo, Cristel Portes, Alain Reynier and Monique Vion (eds.). Proceedings of the IDP05 International Symposium on Discourse-Prosody Interfaces. CD ROM.

Izre'el, Shlomo and Giora Rahav. 2004. The Corpus of Spoken Israeli Hebrew (CoSIH); Phase I: The Pilot Study. In: Nelleke Oostdijk, Gjert Kristoffersen, and Geoffrey Sampson (eds.). LREC 2004 Sattelite Workshop; Fourth International Conference on Language Resources and Evaluation: Compiling and Processing Spoken Language Corpora (Lisbon, Portugal). Paris: ELRA - European Language Resources Association. 1-7.

Mettouchi, Amina, Anne Lacheret-Dujour, Vered Silber-Varod & Shlomo Izre'el. 2007. Only Prosody? Perception of speech segmentation. Nouveaux cahiers de linguistique française 28: Intefaces discoursprosodie : actes du 2ème Symposium international & Colloque Charles Bally, 207-218. Sound files and transcriptions: http://clf.unige.ch/annexe.php?article=108.

Ozerov, Pavel. 2010. Accent and information structure in spontaneous Modern Hebrew conversation. MA thesis, The Hebrew University of Jerusalem. (Hebrew; English summary.)

Morty Proxy This is a proxified and sanitized view of the page, visit original site.