An Unsupervised Query Rewriting Approach Using N-gram Co-occurrence Statistics to Find Similar Phrases in Large Text Corpora
| dc.contributor.author | Hans Moen | |
| dc.contributor.author | Laura-Maria Peltonen | |
| dc.contributor.author | Henry Suhonen | |
| dc.contributor.author | Hanna-Maria Matinolli | |
| dc.contributor.author | Riitta Mieronkoski | |
| dc.contributor.author | Kirsi Telen | |
| dc.contributor.author | Kirsi Terho | |
| dc.contributor.author | Tapio Salakoski | |
| dc.contributor.author | Sanna Salanterä | |
| dc.contributor.organization | fi=hoitotieteen laitos|en=Department of Nursing Science| | |
| dc.contributor.organization | fi=kieli- ja puheteknologia|en=Language and Speech Technology| | |
| dc.contributor.organization | fi=tyks, vsshp|en=tyks, varha| | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.27201741504 | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.47465613983 | |
| dc.contributor.organization-code | 2607400 | |
| dc.converis.publication-id | 44203057 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/44203057 | |
| dc.date.accessioned | 2022-10-28T14:37:05Z | |
| dc.date.available | 2022-10-28T14:37:05Z | |
| dc.description.abstract | <p>We present our work towards developing a system that should find, in a large text corpus, contiguous phrases expressing similar meaning as a query phrase of arbitrary length. Depending on the use case, this task can be seen as a form of (phraselevel) query rewriting. The suggested approach works in a generative manner, is unsupervised and uses a combination of a semantic word n-gram model, a statistical language model and a document search engine. A central component is a distributional semantic model containing word n-grams vectors (or embeddings) which models semantic similarities between ngrams of different order. As data we use a large corpus of PubMed abstracts. The presented experiment is based on manual evaluation of extracted phrases for arbitrary queries provided by a group of evaluators. The results indicate that the proposed approach is promising and that the use of distributional semantic models trained with uni-, bi-and trigrams seems to work better than a more traditional unigram model.<br /></p> | |
| dc.format.pagerange | 131 | |
| dc.format.pagerange | 139 | |
| dc.identifier.isbn | 978-91-7929-995-8 | |
| dc.identifier.issn | 1650-3686 | |
| dc.identifier.jour-issn | 1650-3686 | |
| dc.identifier.olddbid | 189298 | |
| dc.identifier.oldhandle | 10024/172392 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/44347 | |
| dc.identifier.url | https://www.aclweb.org/anthology/W19-6114/ | |
| dc.identifier.urn | URN:NBN:fi-fe2021042827307 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Moen, Hans | |
| dc.okm.affiliatedauthor | Peltonen, Laura-Maria | |
| dc.okm.affiliatedauthor | Suhonen, Henry | |
| dc.okm.affiliatedauthor | Matinolli, Hanna-Maria | |
| dc.okm.affiliatedauthor | Rosio, Riitta | |
| dc.okm.affiliatedauthor | Telen, Kirsi | |
| dc.okm.affiliatedauthor | Terho, Kirsi | |
| dc.okm.affiliatedauthor | Salakoski, Tapio | |
| dc.okm.affiliatedauthor | Salanterä, Sanna | |
| dc.okm.affiliatedauthor | Dataimport, tyks, vsshp | |
| dc.okm.discipline | 113 Computer and information sciences | en_GB |
| dc.okm.discipline | 113 Tietojenkäsittely ja informaatiotieteet | fi_FI |
| dc.okm.internationalcopublication | not an international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A4 Conference Article | |
| dc.publisher.country | Sweden | en_GB |
| dc.publisher.country | Ruotsi | fi_FI |
| dc.publisher.country-code | SE | |
| dc.relation.conference | Nordic Conference on Computational Linguistics | |
| dc.relation.ispartofjournal | Linköping Electronic Conference Proceedings | |
| dc.relation.ispartofseries | NEALT Proceedings Series | |
| dc.relation.volume | 42 | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/172392 | |
| dc.title | An Unsupervised Query Rewriting Approach Using N-gram Co-occurrence Statistics to Find Similar Phrases in Large Text Corpora | |
| dc.title.book | Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) | |
| dc.year.issued | 2019 |
Tiedostot
1 - 1 / 1
Ladataan...
- Name:
- W19-6114.pdf
- Size:
- 163.52 KB
- Format:
- Adobe Portable Document Format
- Description:
- Publisher's PDF