Selaus asiasanan mukaan kokoelmassa Pro gradu -tutkielmat ja diplomityöt sekä syventävien opintojen opinnäytetyöt (kokotekstit)
Aineistot 1-1 / 1
-
Distinguishing Noise and Main Text Content from Web-Sourced Plain Text Documents Using Sequential Neural Networks
(04.05.2022)Boilerplate removal and the identification of the actual textual content is a crucial step in web corpus creation. However, existing methods don’t always filter out the noise perfectly and are often not applicable for plain ...avoin