August 13, 2025
Why copying Arabic from browser-based PDF viewers misleads Google Translate
Introduction
The Department of Social Services (DSS) reported an issue involving Arabic text in their PDFs.
Text copied from a PDF in Adobe Acrobat and pasted into Google Translate translated accurately and reflected the text’s original meaning (see Figure 1).
Figure 1: Text copied from Adobe Acrobat

However, the result was not the same when the PDF was opened in a browser-based viewer, such as Google Chrome. When this text was pasted into Google Translate, the translation was completely inaccurate. It even referenced unrelated topics, like Islamic law (see Figure 2).
Figure 2: Text copied from Google Chrome PDF Viewer

This discrepancy raised concerns about how Arabic text is handled across different platforms. It also raised concerns about how translation tools interpret corrupted inputted text.
Following DSS’s report, we decided we wanted to learn more about this issue. Our investigation aimed to understand the root cause of the issue and provide practical recommendations to avoid misleading translations.
Observations
To understand the issue, we tested copying Arabic text from various platforms and pasting it into Google Translate. The phrase used for testing was:
أصل بيانات الإعاقة الوطنية
In English, this means ‘origin of national disability data’.
Table 1 shows how the results varied depending on the source the text was copied from.
Table 1: Translating Arabic text copied from different platforms with Google Translate
| Source | Input | Result | Translation Accuracy |
| Adobe Acrobat | أصل بيانات الإعاقة الوطنية | Origin of national disability data | ✅ Accurate |
| Microsoft Word | أصل بيانات الإعاقة الوطنية | Origin of national disability data | ✅ Accurate |
| A webpage | أصل بيانات الإعاقة الوطنية | Origin of national disability data | ✅ Accurate |
| PDF in Firefox | صلبيانات اإلعاقةالوطنية | National Disability Crosses | 🟡 Partial |
| PDF in Google Chrome | ةنيطلو اةإلعاقت ايانابل صأ | I need help with the disability | ❌ Inaccurate |
| PDF in Microsoft Edge | أ ص ل ب اناي ا ت قاعلإ ة ا ول ط ين ة | The origin of the word “I am a woman” | ❌ Inaccurate |
The issue appears only when copying from Google Chrome and Microsoft Edge PDF viewers, and partially when copied from Firefox.
Discussion
Arabic is a right-to-left language. Most modern software support right-to-left text. But copying from browser-based PDF viewers introduces a critical error: the character order is reversed.
Table 2: Example of correct vs. incorrect Arabic text copying from a PDF
| Source | Output (Arabic text) |
| Original text in PDF (Arabic) | أصل بيانات الإعاقة الوطنية |
| Copied from Adobe Acrobat (correct) | أصل بيانات الإعاقة الوطنية |
| Copied from Chrome (incorrect) | ةنيطلو اةإلعاقت ايانابل صأ |
This reversal makes the text unreadable and confusing for translation tools. For example, imagine copying the English sentence:
‘Copying text into Google Translate.’
and getting:
‘.etalsnarT elgooG otni txet gniypoC’
This is essentially what happens with Arabic text copied from browser-based PDF viewers. This is an issue as Arabic script and grammar rely heavily on correct word and letter order. Translation tools then try to interpret this incorrect text.
Impact on translation tools
When this reversed text is pasted into translation tools, the outputted text becomes unpredictable.
Table 3: Translated reversed text from different translation tools
| Tool | Input | Translation |
| Google Translate | ةنيطلو اةإلعاقت ايانابل صأ | “I need help with the disability” |
| QuillBot | ةنيطلو اةإلعاقت ايانابل صأ | “The two have a difficult relationship” |
| EasyArabicTyping | ةنيطلو اةإلعاقت ايانابل صأ | “So, the disabled is a disability” |
These translations are not only inaccurate, they’re completely unrelated to the original text.
Why does Google Translate mention Islamic law?
When Google Translate encounters corrupted or ambiguous inputted text, such as reversed Arabic text, it attempts to interpret it based on patterns learned from its training data. For Arabic, much of this data is drawn from publicly available sources like religious texts, legal documents and formal publications. These domains are overrepresented in Arabic-language texts used to train large language models (LLMs), including translation systems.
As a result, when the inputted text is nonsensical or structurally broken, the model may ‘guess’ meaning based on dominant themes in its training data. This can lead to translations that reference topics like Islamic law, disability services or religious practices. This can happen even when the original text has no connection to these subjects.
Recent studies have highlighted this issue. For example, Naous et al. (2024) found that Arabic LLMs often default to culturally loaded or stereotypical content when faced with ambiguous prompts. This happens due to biases in their pre-training data. Similarly, the PALM dataset project (2025) revealed that many Arabic models trained on translated English data exhibit Western-centric biases, sometimes producing culturally inappropriate or irrelevant outputs.
Conclusion
Copying Arabic text from browser-based PDF viewers (such as Google Chrome or Microsoft Edge) reverses the character order. Translation tools then misinterpret this text, leading to inaccurate and sometimes culturally biased results.
What needs fixing
- Browser-based PDF viewers: Improve right-to-left text handling in browser‑based PDF viewers to preserve logical character order when copying.
- Translation Tools: Detect reversed right-to-left text and either warn users or attempt to correct this automatically.
The need for human translators
While translation tools like Google Translate are useful for quick and general translations, they can’t replace human translators when accuracy and nuance matters.
Human translation involves more than converting words. It requires understanding:
- cultural context
- tone
- ambiguity.
People naturally interpret layered meanings and shifting topics, whether the conversation is about disability services, education or Islamic law. Machines, however, rely on training data and statistical patterns. This can lead to errors or biased outputted text, especially when the inputted text is corrupted or unclear. This is why human translators remain essential – they can adapt meaning to the audience and ensure that the message is both accurate and culturally appropriate.
Recommendations
From our investigation, we recommend:
- using Adobe Acrobat or Firefox to view and copy Arabic text from PDFs
- avoiding Google Chrome or Microsoft Edge PDF viewers to copy Arabic text for translation purposes
- teaching users about this issue to prevent further miscommunication and misinterpretation.
More articles
View all articles
October 15, 2025
Rock-climbing – a sport that has become more accessible
Learn about para-climbing, how it has been adapted for people with vision impairment and how our Inclusion Advisor, Assunta, is training for the Paralympics.
September 4, 2025
An Australian broadcasting program fostering inclusion
Learn about Audio Ability, an inclusive community radio program that provides tailored accessible training, mentorships and paid work placement for people with disability.
August 13, 2025
Is your organisation’s logo accessible?
What makes a logo accessible? We explain each of them in this article.

Help your project reach wider audiences
We can help bring your accessible project to life. Submit a quote request online and we'll get back to you within 24 hours.