Towards Universal Web Parsebanks
| dc.contributor.author | Juhani Luotolahti | |
| dc.contributor.author | Jenna Kanerva | |
| dc.contributor.author | Veronika Laippala | |
| dc.contributor.author | Sampo Pyysalo | |
| dc.contributor.author | Filip Ginter | |
| dc.contributor.organization | fi=digitaalinen kielentutkimus, espanja, italia, kiina, ranska, saksa|en=Digital Language Studies, Chinese, French, German, Italian, Spanish| | |
| dc.contributor.organization | fi=kieli- ja puheteknologia|en=Language and Speech Technology| | |
| dc.contributor.organization | fi=tietojenkäsittelytiede|en=Computer Science| | |
| dc.contributor.organization | fi=tietotekniikan laitos|en=Department of Computing| | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.36764574459 | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.47465613983 | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.85312822902 | |
| dc.contributor.organization-code | 2606803 | |
| dc.converis.publication-id | 3174454 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/3174454 | |
| dc.date.accessioned | 2025-08-27T23:55:21Z | |
| dc.date.available | 2025-08-27T23:55:21Z | |
| dc.description.abstract | <div> Recently, there has been great interest both in the development of cross-linguistically applicable annotation schemes and in the application of syntactic parsers at web scale to create parsebanks of online texts. The combination of these two trends to create massive, consistently annotated parsebanks in many languages holds enormous potential for the quantitative study of many linguistic phenomena, but these opportunities have been only partially realized in previous work. In this work, we take a key step toward universal web parsebanks through a single-language case study introducing the first retrainable parser applied to the Universal Dependencies representation and its application to create a Finnish web-scale parsebank. We further integrate this data into an online dependency search system and demonstrate its applicability by showing linguistically motivated search examples and by using the dependency syntax information to analyze the language of the web corpus. We conclude with a discussion of the requirements of extending from this case study on Finnish to create consistently annotated web-scale parsebanks for a large number of languages.</div> | |
| dc.format.pagerange | 211 | |
| dc.format.pagerange | 220 | |
| dc.identifier.isbn | 978-1-5108-0816-4 | |
| dc.identifier.olddbid | 204872 | |
| dc.identifier.oldhandle | 10024/187899 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/53624 | |
| dc.identifier.urn | URN:NBN:fi-fe2021042715097 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Luotolahti, Matti | |
| dc.okm.affiliatedauthor | Kanerva, Jenna | |
| dc.okm.affiliatedauthor | Laippala, Veronika | |
| dc.okm.affiliatedauthor | Pyysalo, Sampo | |
| dc.okm.affiliatedauthor | Ginter, Filip | |
| dc.okm.discipline | 113 Computer and information sciences | en_GB |
| dc.okm.discipline | 113 Tietojenkäsittely ja informaatiotieteet | fi_FI |
| dc.okm.internationalcopublication | not an international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A4 Conference Article | |
| dc.publisher.country | United States | en_GB |
| dc.publisher.country | Yhdysvallat (USA) | fi_FI |
| dc.publisher.country-code | US | |
| dc.relation.conference | International Conference on Dependency Linguistics (Depling) | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/187899 | |
| dc.title | Towards Universal Web Parsebanks | |
| dc.title.book | Proceedings of the International Conference on Dependency Linguistics (Depling'15) | |
| dc.year.issued | 2015 |
Tiedostot
1 - 1 / 1