Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 47–54
Brussels, Belgium, November 1, 2018. c©2018 Association for Computational Linguistics
47
Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical
Constructions
Kira Droganova∗ Filip Ginter† Jenna Kanerva† Daniel Zeman∗
∗Charles University, Faculty of Mathematics and Physics
†University of Turku, Department of Future Technologies
{droganova,zeman}@ufal.mff.cuni.cz
{figint,jmnybl}@utu.fi
Abstract
In this paper, we focus on parsing rare and
non-trivial constructions, in particular ellip-
sis. We report on several experiments in
enrichment of training data for this specific
construction, evaluated on five languages:
Czech, English, Finnish, Russian and Slovak.
These data enrichment methods draw upon
self-training and tri-training, combined with
a stratified sampling method mimicking the
structural complexity of the original treebank.
In addition, using these same methods, we
also demonstrate small improvements over the
CoNLL-17 parsing shared task winning sys-
tem for four of the five languages, not only re-
stricted to the elliptical constructions.
1 Introduction
Dependency parsing of natural language text may
seem like a solved problem, at least for resource-
rich languages and domains, where state-of-the-
art parsers attack or surpass 90% labeled attach-
ment score (LAS) (Zeman et al., 2017). However,
certain syntactic phenomena such as coordination
and ellipsis are notoriously hard and even state-
of-the-art parsers could benefit from better mod-
els of these constructions. Our work focuses on
one such construction that combines both coor-
dination and ellipsis: gapping, an omission of a
repeated predicate which can be understood from
context (Coppock, 2001). For example, in Mary
won gold and Peter bronze, the second instance
of the verb is omitted, as the meaning is evident
from the context. In dependency parsing this cre-
ates a situation where the parent node is missing
(omitted verb won) while its dependents are still
present (Peter and bronze). In the Universal De-
pendencies annotation scheme (Nivre et al., 2016)
gapping constructions are analyzed by promoting
one of the orphaned dependents to the position
of its missing parent, and connecting all remain-
ing core arguments to that promoted one with the
orphan relation (see Figure 1). Therefore the de-
pendency parser must learn to predict relations be-
tween words that should not usually be connected.
Gapping has been studied extensively in theoreti-
cal works (Johnson, 2009, 2014; Lakoff and Ross,
1970; Sag, 1976). However, it received almost no
attention in NLP works, neither concerned with
parsing nor with corpora creation. Among the re-
cent papers, Kummerfeld and Klein (2017) pro-
posed a one-endpoint-crossing graph parser able
to recover a range of null elements and trace types,
and Schuster (Schuster et al., 2018) proposed two
methods to recover elided predicates in sentences
with gapping. The aforementioned lack of corpora
that would pay attention to gapping, as well as
natural relative rarity of gapping, leads to its un-
derrepresentation in training corpora: they do not
provide enough examples for the parser to learn
gapping. Therefore we investigate methods of en-
riching the training data with new material from
large raw corpora.
The present work consist of two parts. In the
first part, we experiment on enriching data in gen-
eral, without a specific focus on gapping construc-
tions. This part builds upon self-training and tri-
training related work known from the literature,
but also develops and tests a stratified approach for
selecting a structurally balanced subcorpus. In the
second part, we focus on elliptical sentences, com-
paring general enrichment of training data with en-
richment using elliptical sentences artificially con-
structed by removal of a coordinated element.
2 Data
2.1 Languages and treebanks
For the parsing experiments we selected five tree-
banks from the Universal Dependencies (UD) col-
48
(a)
Marie won gold and Peter won bronze
nsubj obj
conj
cc
nsubj obj (b)
Marie won gold and Peter bronze
nsubj obj
conj
cc orphan
Figure 1: UD representation of a sentence with repeated verb (a), and with an omitted verb in a gapping construc-
tion (b).
lection (Nivre et al., 2016). We experiment with
the following treebanks: UD Czech, UD English,
UD Finnish, UD Russian-SynTagRus, and
UD Slovak. With the exception of UD Russian-
SynTagRus, all our experiments are based on
UD release 2.0. This UD release was used in the
CoNLL-17 Shared Task on Multilingual Parsing
from Raw Text to Universal Dependencies (Zeman
et al., 2017), giving us a point of comparison to
the state-of-the-art. For UD Russian-SynTagRus,
we use UD release 2.1, which has a considerably
improved annotation of elliptic sentences. For
English, which has only a few elliptical sentences
in the original treebank, we also utilize in testing
a set of elliptical sentences gathered by Schuster
et al. (2018).
This selection of data strives to maximize the
amount of elliptical constructions present in the
treebanks (Droganova and Zeman, 2017), while
also covering different modern languages and pro-
viding variation. Decisions are based on the work
by Droganova and Zeman (2017) who collected
statistics on elliptical constructions that are explic-
itly marked with orphan relation within the UD
treebanks. Relatively high number of elliptical
constructions within chosen treebanks is the prop-
erty of the treebanks rather than the languages.
2.2 Additional material
Automatic parses As an additional data source
in our parsing experiments, we use the multilin-
gual raw text collection by Ginter et al. (2017).
This collection includes web crawl data for 45
languages automatically parsed using the UDPipe
parser (Straka and Strakova´, 2017) trained on the
UD version 2.0 treebanks. For Russian, where we
use newer version of the treebank, we reparsed the
raw data with UDPipe model trained on the corre-
sponding treebank version to agree with the tree-
bank data in use.
As our goal is to use the web crawled data to
enrich the official training data in the parsing ex-
periments, we want to ensure the quality of the
automatically parsed data. To achieve this, we
apply a method that stands between the standard
self-training and tri-training techniques. In self-
training, the labeled training data (L) is iteratively
enriched with unlabeled data (U ) automatically la-
beled with the same learning system (L = L+Ul),
whereas in tri-training (Zhou and Li, 2005) there
are three different learning systems, A, B and C,
and the labeled data for the system A is enriched
with instances from U on which the two other sys-
tems agree, therefore La = L+(Ub∩Uc). Differ-
ent variations of these methods have been success-
fully applied in dependency parsing, for example
(McClosky et al., 2006; Søgaard and Rishøj, 2010;
Li et al., 2014; Weiss et al., 2015). In this work we
use two parsers (A andB) to process the unlabeled
crawl data, and then the sentences where these two
parsers fully agree are used to enrich the training
data for the system A, i.e. La = L + (Ua ∩ Ub).
Therefore the method can be seen as a form of ex-
panded self-training or limited tri-training. A sim-
ilar technique is successfully used for example by
Sagae and Tsujii (2007) in parser domain adapta-
tion and Bjo¨rkelund et al. (2014) in general pars-
ing.
In our experiments the main parser used in fi-
nal experiments as well as labeling the crawl data,
is the neural graph-based Stanford parser (Dozat
et al., 2017), the winning and state-of-the-art sys-
tem from the CoNLL-17 Shared Task (Zeman
et al., 2017). The secondary parser for labeling
the crawl data is UDPipe, a neural transition-based
parser, as these parses are already provided to-
gether with the crawl data. Both of these parsers
include their own part-of-speech tagger, which is
trained together (but not jointly) with the depen-
dency parser in all our experiments. In the fi-
nal self-training web crawl datasets we then keep
only deduplicated sentences with identical part-
of-speech and dependency analyses. All results
reported in this paper are measured on gold to-
kenization, and the parser hyperparameters are
those used for these systems in the CoNLL-17
49
Shared Task.
Artificial treebanks on elliptical constructions
For specifically experimenting on elliptical con-
structions, we additionally include data from
the semi-automatically constructed artificial tree-
banks by Droganova et al. (2018). These treebanks
simulate gapping by removing words in particular
coordination constructions, providing data for ex-
perimenting with the otherwise very rare construc-
tion. For English and Finnish the given datasets
are manually curated for grammaticality and flu-
ency, whereas for Czech the quality relies on the
rules developed for the process. For Russian and
Slovak, which are not part of the original artifi-
cial treebank release, we create automatically con-
structed artificial datasets by running the pipeline
developed for the Czech language. Size of the ar-
tificial data is shown in Table 1.
Token Sentence
Czech 50K 2876
English 7.3K 421
Finnish 13K 1000
Russian 87K 5000
Slovak 7.1 564
Table 1: The size of the artificial data
3 Experiments
First, we set out to evaluate the overall quality of
the trees in the raw enrichment dataset produced
by our self-training variant by parsing and filtering
web crawl data. In our baseline experiments we
train parsers (Dozat et al., 2017) using purely the
new self-training data. From the full self-training
dataset we sample datasets comparable to the sizes
of the original treebanks to train parsers. These
parsers are then evaluated using the original test
set of the corresponding treebank. This gives us
an overall estimate of the self-training data quality
compared to the original treebanks.
3.1 Tree sampling
Predictably, our automatically selected self-
training data is biased towards short, simple sen-
tences where the parsers are more likely to agree.
Long sentences are in turn often composed of sim-
ple coordinated item lists. To rectify this bias, we
employ a sampling method which aims to more
closely follow the distribution of the original tree-
bank compared to randomly sampling sentences
from the full self-training data. We base the sam-
pling on two features of every tree: the number of
tokens, and the number of unique dependency re-
lation types divided by the number of tokens. The
latter accounts for tree complexity, as it penalizes
trees where the same relation type is repeated too
many times, and it specifically allows us to down-
sample the long coordinated item lists where the
ratio drops much lower than average. We of course
take into account that a relation type can naturally
occur more than once in a sentence, and that it is
not ideal to force the ratio close to 1.0. However,
as the sampling method tries to mimic the distri-
bution from the original treebank, it should to pick
the correct variance while discarding the extremes.
The sampling procedure proceeds as follows:
First, we divide the space of the two features,
length and complexity, into buckets and estimate
from the treebank training data the target distri-
bution, and the expected number of trees to be
sampled in each bucket. Then we select from the
full self-training dataset the appropriate number
of trees into each bucket. Since the web crawl
data is heavily skewed, it is not possible to ob-
tain a sufficient number of sampled trees in the ex-
act desired distribution, because many rare length–
complexity combinations are heavily underrepre-
sented in the data. We therefore run the sampling
procedure in several iterations, until the desired
number of trees have been obtained. This results
in a distribution closer to, although not necessarily
fully matching, the original treebank.
To evaluate the impact of this sampling proce-
dure, we compare it to two baselines. RandomS
randomly selects the exact same number of sen-
tences as the above-mentioned Identical sampling
procedure. This results in a dataset which is con-
siderably smaller in terms of tokens, because the
web crawl data (on which the two parsers agree) is
heavily biased towards short trees. To make sure
our evaluation is not affected by simply using less
data in terms of tokens, we also provide the Ran-
domT baseline, where trees are randomly selected
until the same number of tokens is reached as in
the Identical sample. Here we are able to evaluate
the quality of the sampled data, not its bulk.
In Table 2 we see that, as expected, when sam-
pling the same amount of sentences as in the train-
ing section of the original treebank, the RandomS
sampling produces datasets considerably smaller
in terms of tokens, whereas RandomT results in
50
Language Random T Random S Identical TB
Czech 102K/982K 68K/611K 68K/982K 68K/1175K
English 18K/183K 13K/102K 13K/183K 13K/205K
Finnish 17K/144K 12K/92K 12K/144K 12K/163K
Russian 73K/694K 49K/431K 49K/694K 49K/871K
Slovak 11K/83K 8K/58K 8K/83K 8K/81K
Table 2: Training data sizes after each sampling strategy compared to the original treebank training section (TB),
sentences/tokens.
datasets considerably larger in terms of trees when
the same amount of tokens as in the RandomS
dataset is sampled. This confirms the assumption
that parsers tend to agree on shorter sentences in
the web crawl data, introducing the bias towards
them. On the other hand, when the same number
of sentences is selected as in the RandomS sam-
pling and the original treebank, the Identical sam-
pling strategy results in dataset much closer to the
original treebank in terms of tokens.
Parsing results for the different sampling strate-
gies are shown in Table 3. Except for Slovak, the
results follow an intuitively expectable pattern: the
sample with the least tokens results in the worst
score, and of the two samples with the same num-
ber of tokens, the one which follows the treebank
distribution receives the better score. Surprisingly,
for Slovak the sampling strategy which mimics
the treebank distribution receives a score almost
3pp lower than the one with random sampling of
the same amount of tokens. A possible explana-
tion is given in the description of the Slovak tree-
bank which mentions that it consists of sentences
on which two annotators agreed, and is biased to-
wards short and simple sentences. The data is
thus not representative of the language use, pos-
sibly causing the effect. Lacking a better explana-
tion for the time being, we also add the RandomT
sampling dataset into our experiments for Slovak.
Overall, the parsing results on the automatically
selected data are surprisingly good, lagging only
several percent points behind parsers trained on
the manually annotated treebanks.
3.2 Enrichment
In this section, we test the overall suitability of
the sampled trees as an additional data for pars-
ing. We produce training data composed of the
original treebank training section, and a progres-
sively increasing number of sampled trees: 20%,
100%, and 200% (relative to the treebank train-
ing data size, i.e. +100% sample doubles the to-
tal amount of training data). The parsing results
Language Random T Random S Identical TB
Czech 88.50% 88.18% 88.77% 91.20%
English 83.67% 82.86% 84.18% 86.94%
Finnish 82.67% 80.69% 83.01% 87.89%
Russian 91.28% 90.85% 91.49% 93.35%
Slovak 85.02% 83.67% 82.35% 86.04%
Table 3: Results of the baseline parsing experiments,
using only automatically collected data, reported in
terms of LAS%. Random T: random sample, same
amount of tokens as in the Random S samples; Random
S: random sample, same amount of sentences as in the
original treebanks; Identical: identical sample, imitates
the distribution of trees in the original treebanks. For
comparison, the TB column shows the LAS of a parser
trained on the original treebank training data.
Language TB +20% +100% +200%
Czech 91.20% 91.13% 90.98% 90.72%
English 86.94% 87.32% 87.43% 87.29%
Finnish 87.89% 87.83% 88.24% 88.32%
Russian 93.35% 93.38% 93.22% 93.08%
Slovak 86.04% 87.89% 88.36% 88.36%
Slovak T 86.04% 88.14% 88.57% 88.77%
Table 4: Enriching treebank data with identical sam-
ple from automatic data, LAS%. TB: original tree-
bank (baseline experiment; the scores are better than re-
ported in the CoNLL-17 Shared Task because we eval-
uate on gold segmentation while the shared task sys-
tems are evaluated on predicted segmentation); +20%
– +200%: size of the identical sample used to enrich
the treebank data (with respect to the original treebank
size). Slovak T: enriching Slovak treebank with ran-
dom tokens sample instead of identical.
are shown in Table 4. Positively, for all languages
except Czech, we can improve the overall pars-
ing accuracy, for Slovak by as much as 2.7pp,
which is a rather non-trivial improvement. In gen-
eral, the smaller the treebank, the larger the ben-
efit. With the exception of Slovak, the improve-
ments are relatively modest, in the less than half-
a-percent range. Nevertheless, since our baseline
is the winning parser of the CoNLL-17 Shared
Task, these constitute improvements over the cur-
rent state-of-the-art. Based on these experiments,
51
we can conclude that self-training data extracted
from web crawl seem to be suitable material for
enriching the training data for parsing, and in next
section we continue to test whether the same data
and methods can be used to increase occurrences
of a rare linguistic construction to make it more
learnable for parsers.
3.3 Ellipsis
Our special focus point is that of parsing elliptic
constructions. We therefore test whether increas-
ing the number of elliptical sentences in the train-
ing data improves the parsing accuracy of these
constructions, without sacrificing the overall pars-
ing accuracy. We follow the same data enrichment
methods as used above in general domain and
proceed to select elliptical sentences (recognized
through the orphan relation) from the same self-
training data automatically produced from web
crawl (Section 2.2). We then train parsers using
a combination of the ellipsis subset and the orig-
inal training section for each language. We en-
rich Czech, Russian and Slovak training data with
elliptical sentences, progressively increasing their
size by 5%, 10% and 15%. For Finnish, only 5%
of elliptical sentences was available in the filtered
web crawl data, and for English not a single sen-
tence.
The experiments showed mixed results (Ta-
ble 5). For Russian and Slovak the accuracy of
the dependencies involved in gapping is improved
by web crawl enrichment, whereas the results
for Czech remained largely the same and Finnish
slightly decreased (column Web crawl). Unfortu-
nately, for Slovak and Finnish, we cannot draw
firm conclusions due to the small number of or-
phan relations in the test set. For English, even the
treebank results are very low: the parser predicts
only very few orphan relations (recall 1.71%) and
the web crawl data contains no orphans on which
the two parsers could agree, thus making it impos-
sible to enrich the data using this method. Clearly,
English requires a different strategy, and we will
return to it shortly. Positively, none of the lan-
guages substantially suffered in terms of overall
LAS when adding extra elliptical sentences into
the training data. For Slovak, we can even see a
significant improvement in overall parsing accu-
racy, in line with the experiments in Section 3.1.
Increasing the proportion of orphan sentences in
the training data has the predictable effect of in-
creasing the orphan F-score and decreasing the
overall LAS of the parser. These differences are
nevertheless only very minor and can only be ob-
served for Czech and Russian which have suffi-
cient number of orphan relation examples in the
test set. For Slovak, with 18 examples, we can-
not draw any conclusions, and for English and
Finnish, there is not a sufficient number of orphan
examples in the filtered web crawl data to allow us
to vary the proportion.
For all languages, we also experiment with the
artificial elliptic sentence dataset of Droganova et
al. (2018), described earlier in Section 2.2. For
Czech, English and Finnish, the dataset contains
semi-automatically produced, and in the case of
English and Finnish, also manually validated in-
stances of elliptic sentences. For Slovak and Rus-
sian, we replicate the procedure of Droganova et
al., sans the manual validation, obtaining artifi-
cial orphan datasets for all the five languages un-
der study. Subsequently, we train parsers using a
combination of sentences from the artificial tree-
bank and the original training set. The results of
this experiments are in Table 5, column Artificial.
Compared to web crawl, the artificial data results
in a lower performance on orphans for Czech, Slo-
vak and Russian, and higher for Finnish, but once
again keeping in mind the small size of Finnish
and Slovak test set, it is difficult to come to a firm
conclusion. Clearly, though, the web crawl data
does not perform substantially worse than the ar-
tificial data, even though it is gathered fully au-
tomatically. A very substantial improvement is
achieved on English, where the web crawl data
fails to deliver even a single orphan example,
whereas the artificial data gains recall of 9.62%.
This offers us an opportunity to once again try
to obtain orphan examples for English from the
web crawl data, since this time we can train the
parsers on the combination of the original tree-
bank and the artificial data, hopefully resulting in
parsers which are in fact able to predict at least
some orphan relations, which in turn can result in
new elliptic sentences from the web crawl data. As
seen from Table 5, the artificial data increases the
orphan F-score from 3.36% to 17.18% relative to
training only on the treebank, and we are there-
fore able to obtain a parser which is at least by the
order of magnitude comparable to the other four
languages in parsing accuracy of elliptic construc-
tions. We observe no loss in terms of the over-
52
Language All Treebank Web crawl +5/+10/+15% Artificial
LAS O Pre O Rec O F LAS O Pre O Rec O F LAS O Pre O Rec O F
Czech 418 91.20% 54.84% 56.94% 55.87%
91.22% 48.96% 61.72% 54.60%
91.15% 51.79% 58.85% 55.10%91.11% 49.80% 60.05% 54.45%
91.06% 50.19% 62.68% 55.74%
English 2+466 86.94% 100.00% 1.71% 3.36% — — — — 86.95% 80.36% 9.62% 17.18%
Finnish 43 87.89% 66.67% 32.56% 43.75% 87.76% 48.15% 30.23% 37.14% 88.04% 54.76% 53.49% 54.12%
Russian 138 93.35% 44.57% 29.71% 35.65%
93.50% 42.86% 39.13% 40.91%
93.20% 33.14% 40.58% 36.48%93.41% 38.26% 41.30% 39.72%
93.42% 40.69% 42.75% 41.70%
Slovak 18 86.04% 60.00% 16.67% 26.09%
87.90% 36.36% 22.22% 27.59%
87.80% 37.50% 16.67% 23.08%87.76% 33.33% 16.67% 22.22%
87.80% 30.77% 22.22% 25.81%
Table 5: Enriching treebank data with elliptical sentences. All: number of orphan labels in the test data;
Treebank: original treebank (baseline experiment); Web crawl: Enriching the original treebank with the elliptical
sentences extracted from the automatically parsed web crawl data; Artificial: Enriching the original treebank
with the artificial ellipsis treebank; LAS, %: overall parsing accuracy; O Prec (orphan precision): number of
correct orphan nodes divided by the number of all predicted orphan nodes; O Rec (orphan recall): number
of correct orphan nodes divided by the number of gold-standard orphan nodes; O F (Orphan F-score): F-
measure restricted to the nodes that are labeled as orphan : 2PR / (P+R). For English, the orphan P/R/F scores
are evaluated on a dataset of the two orphan relations in the original test section, combined with 466 English
elliptic sentences of Schuster et al. (2018). The extra sentences are not used in the LAS column, so as to preserve
comparability of overall LAS scores across the various runs.
all LAS, demonstrating that it is in fact possible
to achieve a substantial improvement in parsing of
a rare, non-trivial construction without sacrificing
the overall performance.
Using the web data self-training filtering pro-
cedure with two parsers trained on the tree-
bank+artificial data, we can now repeat the exper-
iment with enriching parser training data with or-
phan relations, results of which are shown in Ta-
ble 6. We test the following models:
• original UD English v.2.0 treebank;
• original UD English v.2.0 treebank com-
bined with the artificial sentences;
• original UD English v.2.0 treebank com-
bined with the artificial sentences and web
crawl dataset; size progressively increased by
5%, 10% and 15%. Here we use the original
UD English v.2.0 treebank extended with the
artificial sentences to train the models (Sec-
tion 2.2) that produce the web crawl data for
English.
The best orphan F-score of 36%, more than ten
times higher compared to using the original tree-
bank, is obtained by enriching the training data
with 15% elliptic sentences from the artificial and
filtered web data. The orphan F-score of 36% is
on par with the other languages and, positively, the
overall LAS of the parser remains essentially un-
changed — the parser does not sacrifice anything
Model LAS O Precision O Recall O F-score
Treebank 86.94% 100% 1.71% 3.36%
Artificial 86.95% 80.36% 9.62% 17.18%
Art.+Web 5% 86.72% 86.11% 19.87% 32.29%
Art.+Web 10% 86.68% 78.36% 22.44% 34.88%
Art.+Web 15% 87.07% 84.38% 23.08% 36.24%
Table 6: Enriching the English treebank data with
elliptical sentences. LAS, %: overall parsing accu-
racy; O Precision (orphan precision): number of cor-
rect orphan labels divided by the number of all pre-
dicted orphan nodes; O Recall (orphan recall): num-
ber of correct orphan labels divided by the number
of gold-standard orphan nodes; O F-score (orphan F-
score): F-measure restricted to the nodes that are la-
beled as orphan : 2PR / (P+R). For English, the or-
phan P/R/F scores are evaluated on a dataset of the two
orphan relations in the original test set, combined with
466 English elliptic sentences of Schuster et al. (2018).
The extra sentences are not used in the LAS column,
so as to preserve comparability of overall LAS scores
across the various runs. This is necessary since ellip-
tic sentences are typically syntactically more complex
and would therefore skew overall parser performance
evaluation.
in order to gain the improvement on orphan re-
lations. These English results therefore not only
explore the influence of the number of elliptical
sentences on the parsing accuracy, but also test
a method applicable in the case where the tree-
bank does not contain almost any elliptical con-
structions and results in parsers that only generate
the relation very rarely.
53
4 Conclusions
We have explored several methods of enrich-
ing training data for dependency parsers, with a
specific focus on rare phenomena such as ellip-
sis (gapping). This focused enrichment leads to
mixed results. On one hand, for several languages
we did not obtain a significant improvement of the
parsing accuracy of ellipsis, possibly in part owing
to the small number of testing examples. On the
other hand, though, we have demonstrated that for
English ellipsis parsing accuracy can be improved
from single digit numbers to performance on par
with the other languages. We have also validated
the method of constructing artificial elliptical ex-
amples as a mean to enrich parser training data.
Additionally, we have shown that useful training
data can be obtained using web crawl data and
a self-training or tri-training style method, even
though the two parsers in question differ substan-
tially in their overall performance.
Finally, we have shown that this parser train-
ing data enrichment can lead to improvements of
general parser accuracy, improving upon the state
of the art for all but one language. The improve-
ment was especially notable for Slovak. Czech
was the only treebank not benefiting from this ad-
ditional data, likely owing to the fact that is is an
already very large, and homogenous treebank. As
part of these experiments, we have introduced and
demonstrated the effectiveness of a stratified sam-
pling method which corrects for the skewed dis-
tribution of sentences selected in the web filtering
experiments.
Acknowledgments
The work was partially supported by the grant
15-10472S of the Czech Science Foundation
(GACˇR), the GA UK grant 794417, Academy of
Finland, and Nokia Foundation. Computational
resources were provided by CSC - IT Center for
Science, Finland.
References
Anders Bjo¨rkelund, O¨zlem C¸etinog˘lu, Agnieszka
Falen´ska, Richa´rd Farkas, Thomas Mu¨ller, Wolf-
gang Seeker, and Zsolt Sza´nto´. 2014. Self-training
for Swedish Dependency Parsing – Initial Results
and Analysis. In Proceedings of the Fifth Swedish
Language Technology Conference (SLTC 2014).
Elizabeth Coppock. 2001. Gapping: In defense of
deletion. In Proceedings of the Chicago Linguistics
Society, volume 37, pages 133–148.
Timothy Dozat, Peng Qi, and Christopher D Manning.
2017. Stanford’s Graph-based Neural Dependency
Parser at the CoNLL 2017 Shared Task. Proceed-
ings of the CoNLL 2017 Shared Task: Multilingual
Parsing from Raw Text to Universal Dependencies,
pages 20–30.
Kira Droganova and Daniel Zeman. 2017. Elliptic
Constructions: Spotting Patterns in UD Treebanks.
In Proceedings of the NoDaLiDa 2017 Workshop on
Universal Dependencies (UDW 2017), 135, pages
48–57.
Kira Droganova, Daniel Zeman, Jenna Kanerva, and
Filip Ginter. 2018. Parse Me if You Can: Artifi-
cial Treebanks for Parsing Experiments on Ellipti-
cal Constructions. Proceedings of the 11th Interna-
tional Conference on Language Resources and Eval-
uation (LREC 2018).
Filip Ginter, Jan Hajicˇ, Juhani Luotolahti, Milan
Straka, and Daniel Zeman. 2017. CoNLL 2017
Shared Task - Automatically Annotated Raw Texts
and Word Embeddings. LINDAT/CLARIN digi-
tal library at the Institute of Formal and Applied
Linguistics (U´FAL), Faculty of Mathematics and
Physics, Charles University.
Kyle Johnson. 2009. Gapping is not (VP) ellipsis. Lin-
guistic Inquiry, 40(2):289–328.
Kyle Johnson. 2014. Gapping.
Jonathan K Kummerfeld and Dan Klein. 2017. Parsing
with Traces: An O(n4) Algorithm and a Structural
Representation. arXiv preprint arXiv:1707.04221.
George Lakoff and John Robert Ross. 1970. Gapping
and the order of constituents. Progress in linguis-
tics: A collection of papers, 43:249.
Zhenghua Li, Min Zhang, and Wenliang Chen.
2014. Ambiguity-aware ensemble training for semi-
supervised dependency parsing. In Proceedings of
the 52nd Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Pa-
pers), volume 1, pages 457–467.
David McClosky, Eugene Charniak, and Mark John-
son. 2006. Effective self-training for parsing. In
Proceedings of the main conference on human lan-
guage technology conference of the North American
Chapter of the Association of Computational Lin-
guistics, pages 152–159. Association for Computa-
tional Linguistics.
Joakim Nivre, Marie-Catherine de Marneffe, Filip Gin-
ter, Yoav Goldberg, Jan Hajicˇ, Christopher Man-
ning, Ryan McDonald, Slav Petrov, Sampo Pyysalo,
Natalia Silveira, Reut Tsarfaty, and Daniel Zeman.
2016. Universal Dependencies v1: A Multilingual
Treebank Collection. In Proceedings of the 10th In-
ternational Conference on Language Resources and
54
Evaluation (LREC 2016), pages 1659–1666, Por-
torozˇ, Slovenia.
Ivan Sag. 1976. Deletion and Logical Form. MIT.
PhD dissertation.
Kenji Sagae and Junichi Tsujii. 2007. Dependency
parsing and domain adaptation with LR models and
parser ensembles. In Proceedings of the 2007 Joint
Conference on Empirical Methods in Natural Lan-
guage Processing and Computational Natural Lan-
guage Learning (EMNLP-CoNLL).
Sebastian Schuster, Joakim Nivre, and Christopher D.
Manning. 2018. Sentences with Gapping: Parsing
and Reconstructing Elided Predicates. In Proceed-
ings of the 16th Annual Conference of the North
American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies
(NAACL 2018).
Anders Søgaard and Christian Rishøj. 2010. Semi-
supervised dependency parsing using generalized
tri-training. In Proceedings of the 23rd Interna-
tional Conference on Computational Linguistics,
pages 1065–1073. Association for Computational
Linguistics.
Milan Straka and Jana Strakova´. 2017. Tokenizing,
POS Tagging, Lemmatizing and Parsing UD 2.0
with UDPipe. In Proceedings of the CoNLL 2017
Shared Task: Multilingual Parsing from Raw Text to
Universal Dependencies, pages 88–99, Vancouver,
Canada. Association for Computational Linguistics.
David Weiss, Chris Alberti, Michael Collins, and Slav
Petrov. 2015. Structured Training for Neural Net-
work Transition-Based Parsing. In Proceedings of
ACL 2015, pages 323–333.
Daniel Zeman, Martin Popel, Milan Straka, Jan
Hajicˇ, Joakim Nivre, Filip Ginter, Juhani Luoto-
lahti, Sampo Pyysalo, Slav Petrov, Martin Pot-
thast, Francis Tyers, Elena Badmaeva, Memduh
Go¨krmak, Anna Nedoluzhko, Silvie Cinkova´, jr.
Jan Hajicˇ, Jaroslava Hlava´cˇova´, Va´clava Kettnerova´,
Zdenˇka Uresˇova´, Jenna Kanerva, Stina Ojala, Anna
Missila¨, Christopher Manning, Sebastian Schuster,
Siva Reddy, Dima Taji, Nizar Habash, Herman Le-
ung, Marie-Catherine de Marneffe, Manuela San-
guinetti, Maria Simi, Hiroshi Kanayama, Valeria
de Paiva, Kira Droganova, He´ctor Martı´nez Alonso,
C¸ag˘r C¸o¨ltekin, Umut Sulubacak, Hans Uszkor-
eit, Vivien Macketanz, Aljoscha Burchardt, Kim
Harris, Katrin Marheinecke, Georg Rehm, Tolga
Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran
Yu, Emily Pitler, Saran Lertpradit, Michael Mandl,
Jesse Kirchner, Hector Fernandez Alcalde, Jana Str-
nadova´, Esha Banerjee, Ruli Manurung, Antonio
Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo
Mendonc¸a, Tatiana Lando, Rattima Nitisaroj, and
Josie Li. 2017. CoNLL 2017 shared task: Mul-
tilingual parsing from raw text to universal depen-
dencies. In Proceedings of the CoNLL 2017 Shared
Task: Multilingual Parsing from Raw Text to Uni-
versal Dependencies, pages 1–19, Stroudsburg, PA,
USA. Charles University, Association for Computa-
tional Linguistics.
Zhi-Hua Zhou and Ming Li. 2005. Tri-training: Ex-
ploiting unlabeled data using three classifiers. IEEE
Transactions on knowledge and Data Engineering,
17(11):1529–1541.