GPT-4 vs. Human Translators Cont’d
GPT-4与人工翻译(续)
The already extensive list of human translators vs. AI literature continues to grow. A recent study led by researchers from China-based Westlake University, University College London, the University of Cambridge, and China-based language service provider, LanBridge Group, benchmarked GPT-4’s machine translation (MT) performance against human translators of varying expertise levels.
有关人类翻译与人工智能对比的文献清单已然十分庞大,且还在不断增加。近期,一项由中国西湖大学、英国伦敦大学学院、英国剑桥大学以及中国语言服务提供商兰桥集团的研究人员牵头开展的研究,以不同专业水平的人工翻译为参照,对 GPT-4 的机器翻译(MT)性能进行了评估。
According to the researchers, GPT-4 delivers translation quality comparable to junior- and mid-level human translators but falls short when compared to senior professionals, “indicating machine translation is yet [sic] a solved problem.”
研究人员表示,GPT-4 所提供的翻译质量与初级和中级水平的人工翻译相当,但与资深专业人士相比则稍显逊色,“这表明机器翻译仍是一个尚未解决的问题”
The evaluation covered three language pairs — English↔Chinese, English↔Russian, and Chinese↔Hindi — and three domains: News, Technology, and Biomedicine. The researchers asked junior, medium, and senior human translators — ranked based on their educational background, translation experience, and practical proficiency — to translate source sentences into the target language, alongside GPT-4 and SeamlessM4T.
该评估涵盖了三组语言对 —— 英语⇔汉语、英语⇔俄语以及汉语⇔印地语 —— 以及三个领域:新闻、技术和生物医学。研究人员邀请了初级、中级和高级的人工翻译人员(这些人员是依据其教育背景、翻译经验以及实际熟练程度进行排名),让他们与 GPT-4 以及 SeamlessM4T 一道,将源语句翻译成目标语言。
Independent expert annotators were then employed to assess the translations using the MQM schema, providing a comprehensive analysis of translation accuracy and fluency.
随后,聘请了独立的专家标注员,利用多维度质量度量(MQM)模式来对这些翻译进行评估,从而为翻译的准确性和流畅性提供全面分析
The researchers noted that their goal was to better understand large language model (LLM) translators by integrating them into the translation industry, allowing professionals to evaluate their quality against human translators at various levels. This approach provided deeper insights into the systematic differences between LLM-generated translations and human translations, offering a more comprehensive view of LLM translation quality.
研究人员指出,他们的目标是把大型语言模型(LLM)翻译工具融入翻译行业,借此更好地了解它们,以便让专业人士能够在各个级别上与人类翻译人员进行质量评估。、这种方法对LLM生成的翻译和人工翻译之间的系统差异提供了更深入的见解,进而为大型语言模型翻译质量提供更为全面的视角
As the researchers noted, they are “the first to evaluate LLMs against various levels of professional human translators and analyze the systematic differences between LLMs and human translators.”
正如研究人员所指出的,他们是 “率先针对不同水平的专业人工翻译人员对大型语言模型(LLM)进行评估,并分析大型语言模型与人工翻译人员之间系统性差异的(团队或群体等)”
The researchers found that GPT-4 matches junior-level translators in terms of accuracy, but it lags in fluency and stylistic adaptation when compared to senior professionals. Although GPT-4 consistently delivered accurate translations without omissions, additions, or hallucinations, its literal translation style often led to unnatural phrasing, particularly in technical and specialized domains like Technology.
研究人员发现,GPT-4在准确性方面与初级译员相当,但与高级专业人员相比,它在流畅性和文体适应方面落后。虽然GPT-4始终提供准确的翻译,没有遗漏,添加或幻觉,但其直译风格往往导致不自然的措辞,特别是在技术和专业领域,如技术。
While previous studies raised concerns about hallucinations in large language models, the researchers observed that GPT-4 made almost no hallucination errors across all evaluated directions.
此前的研究曾对大型语言模型中存在的臆造问题表示担忧,然而研究人员观察到GPT-4在所有评估方向上几乎都未出现臆造错误。
In addition to literal translation, the researchers noted that GPT-4 exhibited weaknesses in grammar and named entity recognition, showing lexical inconsistency. “We observe that GPT-4 exhibits two primary limitations: adherence to overly literal translations and lexical inconsistency,” they said.
研究人员指出,除了直译这一问题外,GPT-4 在语法以及命名实体识别方面也暴露出短板,存在词汇不一致的情况。他们表示:“我们观察到 GPT-4 主要存在两大局限:一是过度直译,二是存在词汇不一致的问题。”
Despite these challenges, GPT-4 was noted for maintaining “consistent translation quality across all evaluated language directions,” including in low-resource language pairs — a notable strength compared to traditional NMT systems like SeamlessM4T, which often struggle in such contexts. The researchers pointed out that “GPT-4 mitigates traditional machine translators’ drawback of significant performance gaps from resource-rich to resource-poor directions.”
尽管存在这些挑战,但GPT-4仍以“在所有评估的语言方向上保持一致的翻译质量”而闻名,包括在低资源语言对中-与传统的NMT系统(如无源M4 T)相比,这是一个显着的优势,后者经常在这种情况下挣扎。研究人员指出,“GPT-4减轻了传统机器翻译从资源丰富到资源贫乏方向的显著性能差距的缺点。
The researchers concluded that “GPT-4 represents a significant milestone in neural machine translation” and emphasized that “LLMs have the potential to replace human translators, especially junior and medium ones, feasibly.”
研究人员得出结论,“GPT-4代表了神经机器翻译的一个重要里程碑”,并强调“LLM有可能取代人类翻译,特别是初级和中级翻译。
Authors: Jianhao Yan, Pingchuan Yan, Yulong Chen, Jing Li, Xianchao Zhu, Yue Zhang
作者:严建浩,严平川,陈玉龙,李静,朱先超,张跃