problems and shortcomings of the Perso-Arabic script:
|1. A student interested in mastering the
Perso-Arabic writing will need to memorize the convoluted
spellings of almost all words, and their complex rules
and many exceptions. To master reading and writing in
Perso-Arabic takes at least 9 years of dedicated daily
practice. Yet, a large percentage of the educated adult
population of Iran has difficulty correctly reading the
literary works of the great writers and poets such as
Golestan and Boustan of Sa'di, Masnavi of Balkhi (aka
Rumi), or Shahnameh of Ferdowsi. Another significant percentage
of the educated adult population has difficulty reading
through a newspaper article without pronunciation errors
or writing essays without making spelling mistakes.
|2. Officially there are 34 primary letters
in the Perso-Arabic alphabet and 9 secondary symbols.
The 34 letters can take on a total of 118 different shapes
altogether, depending on their location inside a word.
To add to this confusing multitude of letters and symbols
is the fact that there are several renegade letters and
symbols that are not officially counted, but do exist
in some Arabic loan words. So the total number of letters
and symbols can vary anywhere from 42 to 44 to 46 to 131.
Examples of the renegade letters are:
short alef (or alefe maqsure) like in "Isā",
or "Musā". Round t ""
as in "dâeratolmaâref".
|3. Lack of short vowels: the short vowels: a,
e, and o are not part of the body of a word.
In rare occasions, usually reserved for beginners, three
floating symbols are used in their place. These in addition
to 6 other symbols, make up the 9 secondary symbols of
the Perso-Arabic alphabet. Without the short vowels, exact
or correct pronunciation of the words is difficult and
only possible with prior knowledge. New or unfamiliar
words such as foreign ones are impossible to pronounce
correctly. The lack of these vowels results in many examples
of words with different pronunciations sharing the same
spelling. The only way to distinguish these words is by
the context of the sentence in which they appear.
Examples of words with equivalent spellings but differing
or ser or sor), (dar
or dorr), (rastam
4. The existence of exclusively Arabic consonants.
Unlike Arabic where the sounds of these consonants are
different, in Persian no such difference exists. This
makes spelling of words a challenge for ones memory.
The consonants that share the same sounds are:
Examples of words that use exclusively Arabic letters
|5. Out of the total of 33 letters of the alphabet there
are only 17 distinguishable shapes. The complete letters
of the alphabet are created by adding dots and lines as
diacritics to the different shapes. Dots and lines are
thus the only differentiating symbols between many consonants.
This is a source of many errors.
6. Some letters of the alphabet take on different shapes
depending on their occurrence at the beginning, middle,
or the end of a word or if they occur separately.
The number of different shapes varies from a maximum
of 4 for some letters like
to 2, like in
7. Within a word, some letters have to be attached
to their preceding and/or following letters while others
|8. The frequently occurring letters
(yeh) represent several sounds.
(vâv) can represent the sounds "o", "u",
"v", "vu", and "ow" or fall
silent altogether. This characteristic alone is the source
of many pronunciation errors. An ancient word such as
"Faravahr" is commonly mispronounced as "Foruhar",
"mozdvar" has become "mozdur", and
"ranjvar" has turned into "ranjur".
A dispute has recently arisen whether words like "xosrow"
or "now" ought to be pronounced with a "w"
or not, since "ow" and "o" are represented
by the same single letter "vâv". As expected,
one letter representing many sounds can contribute to
many words with different pronunciations sharing the same
or "tu"), ("jow"
or "ju"), ("jur"
or "jowr"), ("shur"
or "showr"), etc.
An example of the the silent "vâv" is:
pronounced "xâb") and many others.
(yeh) can represent the sounds "y", "i"
and "iy". This multiple representation has further
exacerbated the problem of words with different pronunciations
sharing the same spelling.
("dir" or "deyr"), etc.
9. A great number of Persian words end with the letter
"h". The sound of this letter remains the
same, regardless of its location within a word, unless
it occurs at the end of a word. The consonant "h"
could then retain its normal sound or convert into the
vowels "e" or "a". Yet again we
are confronted with a phenomenon that results in many
different sounding words sharing a common spelling.
or "beh" or "bah"),
("na" or "noh" or "neh"),
or "keh" or "kah" or "koh"),
|10. The conjunction "-e" is one of the most
frequently occurring sounds in the Persian language. Yet,
it is either not written, because of the absence of the
short vowels, or can appear as a (yeh)
following a word that ends with an "h" that
is pronounced as "e" or "a".
As in two ways of writing
man - my house).
|11. Existence of many very different styles of writing
like: Shekaste, Naskh, Nasta'liq, etc. . One should emphasize
that these are not names of different fonts, as in Latin-based
alphabets, but different styles of writing altogether.
The Shekasteh or Nasta'liq styles cannot be rendered as
text on the WWW, but have to appear as images.
|12. Unlike words written in the Latin alphabet that
appear as one integrated grouping of letters, words written
in Perso-Arabic can appear as scattered, uncorrelated
set of letters. This is due to reason 7 above where some
letters could be attached to others and some not.
|13. In Perso-Arabic, words are written from right to
left, while numbers are written from left to right. This
inconsistency can be as small as a mere irritation, or
as large as a source of errors and unnecessary complications
in case of scientific, mathematical, or other documents
composed of words and numbers.
|14. E-mail communications using Perso-Arabic require
special keyboards, and text editors or word-processors
that can be cumbersome to use. One could not conveniently
use any standard keyboard and inscribe an e-mail message
using ones favorite web-based e-mail service such as HOTMAIL
|15. Perso-Arabic cannot be used with the enormous number
of software applications being developed daily that can
only work with the Latin alphabet. These include Optical
Character Recognition ( OCR ) packages. OCR software packages
for Perso-Arabic script are extremely rare, inaccurate,
and expensive. Also included are text to speech and speech
to text software which are utterly useless with the Perso-Arabic
16. Lack of a standard in electronic representation
of Perso-Arabic text.
At least 4 different ways exist in digitally inscribing
2) Windows Code 1256 (Arabic) (The XP version is not
backward compatible) - Not available on Netscape
3) A multitude of distinct web fonts
4) Unix fonts.
In addition, the absence of a single standard for database
storage and display of Persian text has created a multitude
of puzzles for engineers and developers, and become
a nuisance for ordinary users.
- The need to download fonts in order to read the
Perso-Arabic text on many web sites. Such online text
would most likely not get stored correctly by major
- Dependence on Unicode and Windows code characters
primarily used for Arabic, with minor additions for
Persian. The resulting online text has the appearance
of Arabic since the focus has been on developing letters
closely resembling fonts common to that language and
not Persian. The Persian-specific letters can be difficult
to distinguish when using smaller size fonts.
- The task of database and search engine development
for Persian text has become enormously more complex
without such a standard
17. The complex and convoluted rules regarding Perso-Arabic
spacers. These rules have presented, and will continue
to present major challenges to users interested in correctly
typing or displaying Persian words.
In digital orthography of Perso-Arabic words, 3 different
types of special spacers in addition to the normal word
spacer are needed. These are :
- Zero-Width Non-Joiner (ZWNJ):
In Persian, many of the letters of the alphabet naturally
connect with the following letter when written in
a word. However, when writing certain prefixes, suffixes
and compound words, we override this natural behavior
of joining letters, by inserting a Unicode ZWNJ character,
and prevent them from joining the following letter
but without adding a space between the two.
|Incorrect (normal space)
|Incorrect (no space)
|Correct (ZWNJ space)
- Zero-Width Joiner (ZWJ):
In Persian, a letter written in isolation (not joined
with the preceding or following letter) should revert
to its "isolated" form. There are certain situations
where it is desirable to override this natural behavior.
Of course, this discussion only concerns those letters
which have different forms when connected and standing
Example (Abbreviation of "Hejriye Shamsi"):
- Narrow No-Break Space (NNBS):
It has been suggested that it would be desirable to
have a space for Perso-Arabic compound words, wider
than the Zero-Width Non-Joiner but narrower than the
normal word space. If such compound words occur at
the end of a line they would resist word-wrap and
would not break up. Unfortunately, at the present
time there is no font that contains this space.
|Normal Word Space
The rules regarding using ZWNJ versus normal word spacers
have to be strictly obeyed in both digital and regular
orthography. If not followed, in addition to the problems
indicated above, a word could be mistaken as two or
more words, or several words could appear as a single
compound word. The inconsistency of spacing is a source
of parsing problems that lead to many inaccuracies in
software technologies such as the OCR and Machine Translation.
A Latin-based alphabet will do away with all these
complexities by requiring only the normal word spacer.
|18. To master the spelling rules of Perso-Arabic
requires practical knowledge of a very large number of
complex rules. Almost every spelling rule has one or more
exceptions. It is therefore impossible for any one person
to know, let alone obey all these rules. To enumerate
most of the Perso-Arabic spelling rules and exceptions
requires a 35 page booklet. Such booklet was created after
years of debate by experts, but has produced more questions
than solutions. The rest of the items on this page
are taken directly from this booklet.
|19. The rules governing the use of the "hamze" symbol,
borrowed from Arabic, alone takes the total of 5 out of
The regulations that are supposed to manage the use of
"hamze" are the most complex set of laws. This
symbol can be written in 5 different ways and still take
different shapes depending on its location within a word.
That is why some words like or
(mas'ul-responsible) can be written in 2 different ways.
As stated above "hamze" and "eyn"
can sometimes share the same sound like in (ra's),
20. Many borrowed Arabic words must follow Arabic spelling
which could be different from Persian rules. Yet again
adding another dimension of complexity to Perso-Arabic
The occurrence of a
(round t, pronounced just like t) in
(saqatoleslâm) which must be converted to regular
(meshkât). Or the letter "yeh" with
a short aleph inside which occurs only at the end of
a word and is pronounced as "â". Sometimes the
short aleph is dropped for various reasons. This can
cause errors in distinguishing words like
The tashdid symbol if not put can result in a double
|21. The "e'rāb" (or the floating short vowel
symbols) should be placed where and when needed.
This is not always possible. Especially, in Internet communication
where they are rarely utilized. Although they can help
the reader pronounce the words, they can add to clutter,
and several "e'rāb" signs can get mixed up together
or with the ever present dots and lines. That is why they
are almost never used in books and newspapers with small
|22. When "v" occurs before "u",
the spelling rules would require certain exceptions.
The source of this confusion is because a single letter
"vâv" is used to represent both "v"
and "u". Therefore, should 2 vâvs be written
or 1 would suffice? The answer is, most of the times both
"vâv"s must be written as in
(tāvus-peacock), and other times only one will do as in
23. Suffixes can be attached to their preceding word
or not. Depending on their first letters and the ending
letters of the word they attach to. Like the word "arjomand
(arj+omand)" must be written as attached, and "āb
bān" and "barq āsā" must not. Sometimes
the suffix "vār" is attached, like in (bozorgvār),
"sugvār", "xānevār", and other times
it is not, like in (tuti
vār), "ferdowsi vār", or "pari vār".
The logic governing these laws are based purely based
on aesthetic reasons!
More aesthetically inspired spelling rules:
"-tar" and "-tarin" should be separate
except for: behtar, mehtar, kehtar, bitar.
And "-ce" is always written separate except
in "ānce", and "cenānce".
The plural sign "-hā" should always be separate
except in "inhā" and "ānhā".
|24. The prefix "bol", borrowed from the Arabic, can
be written in two ways. The simplified Persian way, or
according to Arabic spelling rules, but not always. Both
(bolhavas, which follows the simplified Persian rule)
(bu'lhavas, which follows the complicated Arabic rule)
are correct. On the other hand the simplified
(bolqāsem) is incorrect, and has to follow Arabic spelling
rules as (bualqāsem).
Major difficulty with the prefix "be-". It should
be attached to the verb that follows like in "begoftam",
"beravam", "benamāyad", "bexrad",
"bejā", etc., but not when the following word
is a noun, adverb, or adjective like "be sarbordan",
"be āvāze boland", "be dast āvard",
"be saxti", "be haq".
Other prefix rules with exceptions:
"Bi-" is always written separately except for:
"bihude", "bixod", "birāh",
"Ham-" should always be attached like in "hamāye",
"hamsang", "hamsāye" except in words
that include a "hamze" like: "ham ārzu",
and "ham ārmān".
"Ce-" should be separate other than in "cerā"
|25. Combination words must be attached or not. For example,
the words: "hoshyār", "sekanjabin",
and "behnām" should be conjoined, but "del
angiz", "aqab oftādegi", "kam ehsās",
have to be written separately because of the occurrence
of the letters a, o, or e. Words like "āyin nāme",
"pāk kon", "cub bori" must appear
separate because of the occurrence of the same letters.
The combination words that are repetitions of the same
word like, "kam kam", and "tak tak",
should not be conjoined. Same with "sangin rangin".
Also combination words that are made up of one or more
European words combined with Persian words, like "xo
poz", "ik pu" must be written
separately. Arabic combo words made up of 2 or more words
like "ma' zālek", "men ba'd", "en
ā allāh", "ma' hāzā" must also appear
separate. Same rule applies to combination words which
contain a number, like in "panj tan", "haft
gonbad", "hat behet", "noh
falak". Exceptions to this rule are "yeksuye",
"Yekanbe", "yekabe", "yeksare"
which may be conjoined.
|26. Due to the existence of several consonants
that share the same sound, or consonants that are exclusive
to Arabic but occur in the Perso-Arabic alphabet without
their original sounds, there are a huge number of words
that can be spelled in more than one way. This lack of
standard spelling has contributed to a chaos in spelling
Examples of words with more than one allowed spelling:
āzuqe, āqā, establ, emperātur, belit, boqce, tanbur, tāq,
tabar, tapidan, tarāz, tufān, qaltidan, qāti, qobād, howle,
luti, estaxr, tāb, toxārestān, tus, tahmāsb (or sometimes
tahmāseb), moqān, etc..
|27. There is ample confusion sorrounding
using the letter "y" as a buffer, as in "zibāyi
(with a y)" or not using it as in "zibāi (with
A huge number of examples exist, e.g., tanhāyi, Orupāyi,
|28. A symbol has to be used for the conjunction "-e"
Again, such a rule has proven impractical for both electronic
and print media. The conjunction symbol is used only in
whether it is "asbsavāri" or "asbe savāri"
cannot be deduced without knowing the context in which
|29. The Arabic tanvin is one of the 9 symbols. It can
be written with unique characters like "an",
"on" or spelled out as "-an" and "-on".
This results in duplicate spelling problems.
| See http://students.washington.edu/irina/persianword/zwnj.htm
 Excerpts from a booklet published (2000) by the Iranian
Academy of Persian Language and Literature on spelling rules
and conventions of the current Persian script-(The purpose of
this booklet was to bring some order to the chaos of Perso-Arabic
spelling rules, but all it reveals, ironically, is the extent
of the existing anarchy).