A story to think about - insignificant results hyped by an optimal industry lab's marketing machine gain too much attention.

    • Good article. The only part I had issue with was this

      >With that in mind, he noted that STACL’s lower BLEU scores raises deployment issues.

      BLEU is not a good metric to say much of anything with. Human translators/interpreters regularly have quite low BLEU scores.

      • To elaborate, BLEU scores simply measure n-gram overlap (where n-grams are sequences of words of length n). You can translate something just fine and not use the exact words that someone else used.