Från dialektinspelning till talspråkskorpus – beskrivning av ett korpusbygge

dc.contributor.authorLisa Södergård
dc.contributor.authorTherese Leinonen
dc.contributor.organizationfi=kieli- ja käännöstieteiden laitos|en=School of Languages and Translation Studies|
dc.contributor.organization-code1.2.246.10.2458963.20.56461112866
dc.converis.publication-id2315952
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/2315952
dc.date.accessioned2022-10-28T13:45:19Z
dc.date.available2022-10-28T13:45:19Z
dc.description.abstract<p> The Talko corpus of Swedish spoken in Finland is a new research tool consisting of audio files linked to annotation, i.e., transcriptions on two parallel levels and part-of-speech tagging. The corpus is searchable through a web-based interface. The re­cord­ings were made in 2005–2008 in all parts of Swedish-language Finland. They have been transcribed in a broad phonetic transcription as well as in a standard ortho­graphic transcription. The part-of-speech tagging is done with TreeTagger, trained on the Stockholm-Umeå Corpus of written Swedish. The automatically pro­duced part-of-speech tags are manually corrected for subsets of the data, and the manually corrected data are subsequently added to the training data. This will grad­ually improve the result of the automatic tagging and compensate for differences between spoken and written Swedish and between Finland-Swedish and Sweden-Swedish.</p>
dc.identifier.isbn978-951-51-2996-3
dc.identifier.issn1795-4428
dc.identifier.olddbid184093
dc.identifier.oldhandle10024/167187
dc.identifier.urihttps://www.utupub.fi/handle/11111/45800
dc.identifier.urnURN:NBN:fi-fe2021042714566
dc.language.isosv
dc.okm.affiliatedauthorLeinonen, Therese
dc.okm.discipline6121 Languagesen_GB
dc.okm.discipline6121 Kielitieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityDomestic publication
dc.okm.typeA3 Book
dc.publisherHelsingin yliopisto
dc.publisher.countryFinlanden_GB
dc.publisher.countrySuomifi_FI
dc.publisher.country-codeFI
dc.publisher.isbn978-951-45; 978-951-51; 978-952-10; 978-952-84
dc.publisher.placeHelsinki
dc.relation.ispartofseriesNordica Helsingiensia
dc.relation.volume48
dc.source.identifierhttps://www.utupub.fi/handle/10024/167187
dc.titleFrån dialektinspelning till talspråkskorpus – beskrivning av ett korpusbygge
dc.title.bookIdeologi, identitet, intervention. Tionde nordiska dialektologkonferensen
dc.year.issued2017

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Från dialektinspelning till talspråkskorpus.pdf
Size:
198.91 KB
Format:
Adobe Portable Document Format
Description:
Pre-print