Towards Universal Web Parsebanks

dc.contributor.authorJuhani Luotolahti
dc.contributor.authorJenna Kanerva
dc.contributor.authorVeronika Laippala
dc.contributor.authorSampo Pyysalo
dc.contributor.authorFilip Ginter
dc.contributor.organizationfi=digitaalinen kielentutkimus, espanja, italia, kiina, ranska, saksa|en=Digital Language Studies, Chinese, French, German, Italian, Spanish|
dc.contributor.organizationfi=kieli- ja puheteknologia|en=Language and Speech Technology|
dc.contributor.organizationfi=tietojenkäsittelytiede|en=Computer Science|
dc.contributor.organizationfi=tietotekniikan laitos|en=Department of Computing|
dc.contributor.organization-code1.2.246.10.2458963.20.36764574459
dc.contributor.organization-code1.2.246.10.2458963.20.47465613983
dc.contributor.organization-code1.2.246.10.2458963.20.85312822902
dc.contributor.organization-code2606803
dc.converis.publication-id3174454
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/3174454
dc.date.accessioned2025-08-27T23:55:21Z
dc.date.available2025-08-27T23:55:21Z
dc.description.abstract<div> Recently, there has been great interest both in the development of cross-linguistically applicable annotation schemes and in the application of syntactic parsers at web scale to create parsebanks of online texts. The combination of these two trends to create massive, consistently annotated parsebanks in many languages holds enormous potential for the quantitative study of many linguistic phenomena, but these opportunities have been only partially realized in previous work. In this work, we take a key step toward universal web parsebanks through a single-language case study introducing the first retrainable parser applied to the Universal Dependencies representation and its application to create a Finnish web-scale parsebank. We further integrate this data into an online dependency search system and demonstrate its applicability by showing linguistically motivated search examples and by using the dependency syntax information to analyze the language of the web corpus. We conclude with a discussion of the requirements of extending from this case study on Finnish to create consistently annotated web-scale parsebanks for a large number of languages.</div>
dc.format.pagerange211
dc.format.pagerange220
dc.identifier.isbn978-1-5108-0816-4
dc.identifier.olddbid204872
dc.identifier.oldhandle10024/187899
dc.identifier.urihttps://www.utupub.fi/handle/11111/53624
dc.identifier.urnURN:NBN:fi-fe2021042715097
dc.language.isoen
dc.okm.affiliatedauthorLuotolahti, Matti
dc.okm.affiliatedauthorKanerva, Jenna
dc.okm.affiliatedauthorLaippala, Veronika
dc.okm.affiliatedauthorPyysalo, Sampo
dc.okm.affiliatedauthorGinter, Filip
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.conferenceInternational Conference on Dependency Linguistics (Depling)
dc.source.identifierhttps://www.utupub.fi/handle/10024/187899
dc.titleTowards Universal Web Parsebanks
dc.title.bookProceedings of the International Conference on Dependency Linguistics (Depling'15)
dc.year.issued2015

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
depling2015.pdf
Size:
233.54 KB
Format:
Adobe Portable Document Format