What’s behind the arcane research paper that Baidu helped spin into a machine translation breakth...
A story to think about - insignificant results hyped by an optimal industry lab's marketing machine gain too much attention.
Good article. The only part I had issue with was this
>With that in mind, he noted that STACL’s lower BLEU scores raises deployment issues.
BLEU is not a good metric to say much of anything with. Human translators/interpreters regularly have quite low BLEU scores.
To elaborate, BLEU scores simply measure n-gram overlap (where n-grams are sequences of words of length n). You can translate something just fine and not use the exact words that someone else used.
I agree 👍 but Bleu has nevertheless become the metric of choice in NLP research on summerisation.