Skip to main content
Request a quote

Why copying Arabic from browser-based PDF viewers misleads Google Translate

Introduction

The Department of Social Services (DSS) reported an issue involving Arabic text in their PDFs.

Text copied from a PDF in Adobe Acrobat and pasted into Google Translate translated accurately and reflected the text’s original meaning (see Figure 1).

However, the result was not the same when the PDF was opened in a browser-based viewer, such as Google Chrome. When this text was pasted into Google Translate, the translation was completely inaccurate. It even referenced unrelated topics, like Islamic law (see Figure 2).

This discrepancy raised concerns about how Arabic text is handled across different platforms. It also raised concerns about how translation tools interpret corrupted inputted text.

Following DSS’s report, we decided we wanted to learn more about this issue. Our investigation aimed to understand the root cause of the issue and provide practical recommendations to avoid misleading translations.

Observations

To understand the issue, we tested copying Arabic text from various platforms and pasting it into Google Translate. The phrase used for testing was:

أصل بيانات الإعاقة الوطنية 

In English, this means ‘origin of national disability data’.

Table 1 shows how the results varied depending on the source the text was copied from.

SourceInputResultTranslation Accuracy
Adobe Acrobatأصل بيانات الإعاقة الوطنيةOrigin of national disability data✅ Accurate
Microsoft Wordأصل بيانات الإعاقة الوطنيةOrigin of national disability data✅ Accurate
A webpageأصل بيانات الإعاقة الوطنيةOrigin of national disability data✅ Accurate
PDF in Firefoxصلبيانات اإلعاقةالوطنيةNational Disability Crosses  🟡 Partial
PDF in Google Chromeةنيطلو اةإلعاقت ايانابل صأI need help with the disability  ❌ Inaccurate
PDF in Microsoft Edgeأ ص ل ب اناي ا ت قاعلإ ة ا ول ط ين ةThe origin of the word “I am a woman”❌ Inaccurate

The issue appears only when copying from Google Chrome and Microsoft Edge PDF viewers, and partially when copied from Firefox.

Discussion

Arabic is a right-to-left language. Most modern software support right-to-left text. But copying from browser-based PDF viewers introduces a critical error: the character order is reversed.

SourceOutput (Arabic text)
Original text in PDF (Arabic)أصل بيانات الإعاقة الوطنية
Copied from Adobe Acrobat (correct)أصل بيانات الإعاقة الوطنية
Copied from Chrome (incorrect)ةنيطلو اةإلعاقت ايانابل صأ

This reversal makes the text unreadable and confusing for translation tools. For example, imagine copying the English sentence:

‘Copying text into Google Translate.’

and getting:

‘.etalsnarT elgooG otni txet gniypoC’

This is essentially what happens with Arabic text copied from browser-based PDF viewers. This is an issue as Arabic script and grammar rely heavily on correct word and letter order. Translation tools then try to interpret this incorrect text.

Impact on translation tools

When this reversed text is pasted into translation tools, the outputted text becomes unpredictable.

ToolInputTranslation
Google Translateةنيطلو اةإلعاقت ايانابل صأ“I need help with the disability”
QuillBotةنيطلو اةإلعاقت ايانابل صأ“The two have a difficult relationship”
EasyArabicTypingةنيطلو اةإلعاقت ايانابل صأ“So, the disabled is a disability”

These translations are not only inaccurate, they’re completely unrelated to the original text.

Why does Google Translate mention Islamic law?

When Google Translate encounters corrupted or ambiguous inputted text, such as reversed Arabic text, it attempts to interpret it based on patterns learned from its training data. For Arabic, much of this data is drawn from publicly available sources like religious texts, legal documents and formal publications. These domains are overrepresented in Arabic-language texts used to train large language models (LLMs), including translation systems.

As a result, when the inputted text is nonsensical or structurally broken, the model may ‘guess’ meaning based on dominant themes in its training data. This can lead to translations that reference topics like Islamic law, disability services or religious practices. This can happen even when the original text has no connection to these subjects.

Recent studies have highlighted this issue. For example, Naous et al. (2024) found that Arabic LLMs often default to culturally loaded or stereotypical content when faced with ambiguous prompts. This happens due to biases in their pre-training data. Similarly, the PALM dataset project (2025) revealed that many Arabic models trained on translated English data exhibit Western-centric biases, sometimes producing culturally inappropriate or irrelevant outputs.

Conclusion

Copying Arabic text from browser-based PDF viewers (such as Google Chrome or Microsoft Edge) reverses the character order. Translation tools then misinterpret this text, leading to inaccurate and sometimes culturally biased results.

What needs fixing

  • Browser-based PDF viewers: Improve right-to-left text handling in browser‑based PDF viewers to preserve logical character order when copying.
  • Translation Tools: Detect reversed right-to-left text and either warn users or attempt to correct this automatically.

The need for human translators

While translation tools like Google Translate are useful for quick and general translations, they can’t replace human translators when accuracy and nuance matters.

Human translation involves more than converting words. It requires understanding:

  • cultural context
  • tone
  • ambiguity.

People naturally interpret layered meanings and shifting topics, whether the conversation is about disability services, education or Islamic law. Machines, however, rely on training data and statistical patterns. This can lead to errors or biased outputted text, especially when the inputted text is corrupted or unclear. This is why human translators remain essential – they can adapt meaning to the audience and ensure that the message is both accurate and culturally appropriate.

Recommendations

From our investigation, we recommend:

  • using Adobe Acrobat or Firefox to view and copy Arabic text from PDFs
  • avoiding Google Chrome or Microsoft Edge PDF viewers to copy Arabic text for translation purposes
  • teaching users about this issue to prevent further miscommunication and misinterpretation.

Help your project reach wider audiences

We can help bring your accessible project to life. Submit a quote request online and we'll get back to you within 24 hours.