îàâø äòáøéú
äîãåáøú áéùøàì (îòî"ã)
The Corpus of Spoken Israeli Hebrew (CoSIH)
ìâéøñä òáøéú ä÷ù ëàï.
Plans for The Corpus of Spoken Israeli Hebrew (CoSIH)
started to take shape in 1998. The model according to which CoSIH
would be compiled was to consist of a thousand sets of recordings
("cells") with 5000 words each, i.e., a corpus of 5M words. This
model was first described in a previous version of this internet page which has
eventually found its place in Hary and Izre'el 2003. A more sophisticated model
has been published in Izre'el, Hary and Rahav 2001.
Although at the initial stages of modeling the corpus-to-be and
preparing our steps towards its compilation we designed a pilot of 20 sets of 3-hour recordings, we have
eventually ended up with some 50 sets, each including between 8 to 16 hours of
uninterrupted recording of everyday speech (Izre'el and Rahav
2004). Therefore, while lack of enough financial support has prevented us from
continuing our initial plans, we believe that the recordings hitherto made,
which will become available through this channel, will form a nice collection
of texts, enough to give a solid database of Spoken Israeli Hebrew, and thus to
enhance its study by ways hitherto unavailable to the research community.
Since the first recordings were made in 2001, we have made some progress
in transcribing them, mainly by students using these data for research in
course papers, seminar papers, MA theses (ëäï úùñ"ã;
æéìáø-åøåã úùñ"ä) and doctoral dissertations. At this stage (summer 2009), we
are editing and preparing these texts for dissemination with the aid of an
Israel Science Foundation (ISF) grant given to Esther Borochovsky
Bar-Aba fo her research on
concise utterances in spoken Israeli Hebrew. The recordings will be
disseminated electronically using the alignment software ELAN, still to be further
developed for web publication for CorpAfroAs, the Corpus of spoken Afroasiatic
languages. We hope that CoSIH will be disseminated,
at least with its first bulk of texts, in 2011. These texts will include each
at least 1000 to 2000 words, and – at least in the
initial stages – will consist mainly of recordings of native speakers.
Transcription of most texts will be in the standard Hebrew orthography
(Izre'el 2004), prosodically parsed into prosodic
groups (=intonation units) (Amir, Silber-Varod and
Izre'el 2004; Izre'el 2005; Izre'el and Silber-Varod
forthcoming). A small sample of texts has been published (in transcription
only) in éæøòàì úùñ"á(à). A sample of texts with ca. 42,000 words
in preliminary transcriptions (with no prosodic marking) has been used as a
basis for a tagged corpus by Dalia Bojan in the MILA
site of the Technion.
CoSIH was initiated and designed by a team of
Israeli and international scholars:
Core team: Shlomo Izre'el, Tel-Aviv University (director);
Benjamin Hary, Emory University (principal investigator); John Du Bois,
University of California at Santa arbara (corpus
analyst); Mira Ariel, Tel-Aviv University (discourse analysis and pragmatics); Giora Rahav, Tel-Aviv University
(statistics and sociology).
Advisory board: Eliezer Ben-Rafael, Tel
Aviv University (sociolinguistics – sociological aspects); Yaakov Bentolila, Ben Gurion University
(sociolinguistics – linguistic aspects); Otto Jastrow,
Universität Erlangen-Nürenberg
(transcription, phonology, dialectology); Shmuel
Bolozky, University of Massachusetts at Amherst (phonology, morphology);
Geoffrey Khan, Cambridge University (syntax); Elana Shohamy, Tel Aviv University (language education).
For contact click here
Bibliography
(including studies that have been based – partly or wholly – on CoSIH texts)
■ àåæøåá, ôàáì. úù"ò. îé÷åí ääèòîä åîáðä äîñø áùéç äñôåðèðé áòáøé. òáåãú âîø ì÷øàú úåàø îåñîê. äàåðéáøñéèä äòáøéú, éøåùìéí.
■ âåðï, àéìï. úùñ"è. îåøôåôåðåìåâéä ùì äùåøù áôåòì áòáøéú éùøàìéú îãåáøú. òáåãú âîø ì÷øàú úåàø îåñîê. àåðéáøñéèú úì-àáéá.
■ âåðï, òéðú. úùñ"å. ðèééú ùí äòöí áòáøéú
äîãåáøú: úäìéëé çéèåó. çéáåø ìùí ÷áìú úåàø ãå÷èåø
ìôéìåñåôéä. äàåðéáøñéèä äòáøéú, éøåùìéí.
■ æéìáø-åøåã, åøã. úùñ"ä. îàôééðé
âáåìåú ùì éçéãåú ôøåæåãéåú áòáøéú äãáåøä:
ðéúåç úôéñúé åà÷åñèé. òáåãú âîø ì÷øàú úåàø îåñîê. àåðéáøñéèú
úì-àáéá.
■
éæøòàì, ùìîä. úùñ"á(à). îàâø äòáøéú
äîãåáøú áéùøàì (îòî"ã): ãåâîàåú è÷ñèéí. ìùåððå ñ"ã:
314-289. (áîàîø æä âí ãåâîàåú àçãåú ìè÷ñèéí îúåòú÷éí áúòúé÷ öø; úòúé÷éí àìä
ùåôøå åäåòìå òì äàéðèøðè òí ää÷ìèåú òöîï:
http://www.tau.ac.il/humanities/semitic/meeting.pdf
; http://www.tau.ac.il/humanities/semitic/meeting.wav
; http://www.tau.ac.il/humanities/semitic/cardrive.pdf; http://www.tau.ac.il/humanities/semitic/cardrive.wav; http://www.tau.ac.il/humanities/semitic/folkstory.pdf; http://www.tau.ac.il/humanities/semitic/folkstory.wav
■ éæøòàì, ùìîä (òåøê), áñéåòä ùì îøâìéú
îðãìñåï. úùñ"á(á). îãáøéí òáøéú: ìç÷ø äìùåï äîãåáøú åäùåðוּú
äìùåðéú áéùøàì. (úòåãä, éç.) úì àáéá: àåðéáøñéèú úì
àáéá.
îàîøéí áëøê æä äòåñ÷éí éùéøåú áëéðåï îòîã:
øäá, âéåøà, ãâéîú àåëìåñééä
ìëéðåï îàâø îééöâ, òî' 445-439;
äøé, áðéîéï
åùìîä éæøòàì, äîåãì
äúëðåðé ùì îàâø äòáøéú äîãåáøú áéùøàì (îòî"ã),
òî' 458-447
(ú÷öéø; ìîàîø äùìí ø'
http://www.tau.ac.il/humanities/semitic/maamad2000.html åáàðâìéú ìäìï Hary and
Izre'el 2003);
寸åÌí,
øâéðä, äòøåú îúåãåìåâéåú
òì ëéðåï îàâø äòáøéú äîãåáøú áéùøàì (îòî"ã), òî'
477-459.
■
éæøòàì, ùìîä. úùñ"â-úùñ"ã. îç÷ø äòáøéú äîãåáøú:
äöòã äøàùåï - òì øéùåí äãéáåø ìöøëé îç÷ø. ìùåððå ìòí ð"ã:
601-911.
■
éæøòàì, ùìîä. îãéáåø
ìúçáéø – îúéàåøéä ìúîìéì. áãôåñ. áúåê: îùä áø-àùø åçééí ëäï (òåøëéí). ñôø
éåáì ìëáåã àäøåï ãåúï.
■ éæøòàì, ùìîä. úùñ"â. îàâø äòáøéú
äîãåáøú áéùøàì (îòî"ã); ùìá à': áãé÷ä èøåîéú -
ãå"ç øàùåðé. áúåê: ãðéàì ñéåï åôáìå-éöç÷
äìåé-÷éøèö'å÷ (òåøëéí). ÷åì ìéò÷á: àñåôú
îàîøéí ìëáåã ôøåô' éò÷á áï-èåìéìä.
(àùì áàø ùáò: îç÷øéí áîãòé äéäãåú, 8.) áàø-ùáò: àåðéáøñéèú áï-âåøéåï áðâá. 222-211.
■
éæøòàì, ùìîä, áðéîéï äøé åâéåøà øäá. úùñ"á. ì÷øàú ëéðåï
îàâø äòáøéú äîãåáøú áéùøàì. ìùåððå ñ"ã (úùñ"á): 287-265.
■
éæøòàì, ùìîä ååøã æéìáø-åøåã. áãôåñ. àåîø ìðúç ìðúç: òì úôéùú ä÷áåöä äôøåæåãéú áòáøéú äîãåáøú. áìùðåú
òáøéú 64-63.
■
ëäï, ñîãø. úùñ"ã. åùàéðå éåãò ìùàåì - îä äåà àåîø? – ãøëé äùàìä
áòáøéú äîãåáøú. òáåãú âîø ì÷øàú úåàø îåñîê, àåðéáøñéèú úì-àáéá.
■ Amir, Noam, Vered Silber-Varod and Shlomo Izre'el. 2004. Characteristics of
Intonation Unit Boundaries in Spontaneous Spoken Hebrew: Perception and
Acoustic Correlates. In: Bernard Bel and Isabelle
Marlien (eds.). Speech
Prosody 2004, Nara, Japan, March 23-26, 2004: Proceedings. 677-680.
■ Dekel, Nurit.
2010.
A
Matter of Time: Tense, Mood and Aspect in Spontaneous Spoken Israeli Hebrew.
PhD dissertation, The University of Amsterdam. Amsterdam: LOT.
■ Hary, Benjamin H. 2003. (ed.). Corpus
Linguistics and Modern Hebrew: Towards the Compilation of The Corpus of Spoken Israeli Hebrew (CoSIH).
Tel Aviv: Tel Aviv University, The Chaim
Rosenberg School of Jewish Studies.
Papers in this volume dealing directly with
the compilation of CoSIH:
Rahav,
Giora, Population Sampling for the Establishment of a representative Corpus,
pp. 181-188.
Hary, Benjamin and Shlomo
Izre'el, The Preparatory Model of The Corpus of Spoken Israeli Hebrew
(CoSIH), pp. 189-219.
Werum, Regina
E., Methodological
Remarks on Creating the Corpus of Spoken Israeli Hebrew (CoSIH),
pp. 221-241.
■ Izre'el, Shlomo, Benjamin Hary and Giora
Rahav. 2001. Designing CoSIH: The Corpus of Spoken Israeli Hebrew. International
Journal of Corpus
Linguistics 6: 171-197.
■ Izre'el, Shlomo. 2004. Transcribing
Spoken Israeli Hebrew: Preliminary Notes. In: Dorit
Diskin Ravid and Hava Bat-Zeev Shyldkrot
(Eds.). Perspectives on Language and Language Development: Essays in
Honor of Ruth A. Berman. Dodrecht: Kluwer. 2004. 61-72.
■ Izre'el, Shlomo. 2005. Intonation
Units and the Structure of Spontaneous Spoken Language: A View from Hebrew.
In: Cyril Auran, Roxanne Bertrand, Catherine Chanet, Annie Colas, Albert Di Cristo, Cristel
Portes, Alain Reynier and
Monique Vion (eds.). Proceedings
of the IDP05 International Symposium on Discourse-Prosody Interfaces.
CD ROM.
■ Izre'el, Shlomo and Giora Rahav. 2004. The Corpus of
Spoken Israeli Hebrew (CoSIH); Phase I: The Pilot
Study. In: Nelleke Oostdijk,
Gjert Kristoffersen, and
Geoffrey Sampson (eds.). LREC 2004 Sattelite
Workshop; Fourth International Conference on Language Resources and Evaluation:
Compiling and Processing Spoken Language Corpora (Lisbon, Portugal). Paris:
ELRA - European Language Resources Association. 1-7.
■ Mettouchi, Amina,
Anne Lacheret-Dujour, Vered
Silber-Varod & Shlomo Izre'el. 2007. Only Prosody? Perception of speech segmentation.
Nouveaux cahiers de linguistique française 28: Intefaces
discours – prosodie : actes du 2ème Symposium
international & Colloque Charles Bally, 207-218.
Sound files and transcriptions: http://clf.unige.ch/annexe.php?article=108.
■
Ozerov, Pavel. 2010. Accent and information
structure in spontaneous Modern Hebrew conversation. MA
thesis, The Hebrew University of Jerusalem. (Hebrew; English
summary.)