-
In SignBank, there are roughly 220k parallel samples from 141 puddles covering 76 language pairs, yet the distribution is unbalanced
-
Notably, most of the puddles are dictionaries, which we consider less valuable than sentence pairs (instances of continuous signing) for a general MT system. If dictionaries are used as training data, we expect models to memorize word mappings and not learn to generate sentences.
- Therefore, we treat the four sentence-pair puddles of the relatively high-resource language pairs as primary data and the other dictionary puddles as auxiliary data. Note that even the language pairs constituting the high-resource pairs of SignBank are low-resource compared to datasets used in mature MT systems for spoken languages, where millions of parallel sentences are commonplace
-
en-us (American English & American Sign Language) has 43,698 pairs
-
For encoding (when FSW is the source), they embed each factor separately and then concatenate them to the aligned symbolβs embedding. For decoding (when FSW is the target), they use a subset of factors (absolute x and y) because others are irrelevant for prediction.
It is not obvious how to best decode to FSW. We try the following strategies:
- predicting everything (including positional numbers) as target tokens all in one long target sequence, inspired by Chen et al. (2022)
- predicting symbols only (as a comparative experiment)
- predicting symbols with positional numbers as target factors