Homepage Join the discussion forum and share your views The UniPers alphabet
  View articles The Perso-Arabic Alphabet
  Submit your articles UniPers<->Perso-Arabic conversion table
View this page in UniPers
View this page in Persian
UniPers versus Perso-Arabic - A Collection of Various Viewpoints:
Major problems and shortcomings of the Perso-Arabic script
Main arguments against changing the current Persian alphabet and responses
Main arguments for changing the Persian alphabet
Major problems and shortcomings of the Perso-Arabic script:
1. A student interested in mastering the Perso-Arabic writing will need to memorize the convoluted spellings of almost all words, and their complex rules and many exceptions. To master reading and writing in Perso-Arabic takes at least 9 years of dedicated daily practice. Yet, a large percentage of the educated adult population of Iran has difficulty correctly reading the literary works of the great writers and poets such as Golestan and Boustan of Sa'di, Masnavi of Balkhi (aka Rumi), or Shahnameh of Ferdowsi. Another significant percentage of the educated adult population has difficulty reading through a newspaper article without pronunciation errors or writing essays without making spelling mistakes.
2. Officially there are 34 primary letters in the Perso-Arabic alphabet and 9 secondary symbols. The 34 letters can take on a total of 118 different shapes altogether, depending on their location inside a word. To add to this confusing multitude of letters and symbols is the fact that there are several renegade letters and symbols that are not officially counted, but do exist in some Arabic loan words. So the total number of letters and symbols can vary anywhere from 42 to 44 to 46 to 131.

Examples of the renegade letters are:
short alef (or alefe maqsure) like in "Isā", or "Musā". Round t "" as in "dâeratolmaâref".
3. Lack of short vowels: the short vowels: a, e, and o are not part of the body of a word. In rare occasions, usually reserved for beginners, three floating symbols are used in their place. These in addition to 6 other symbols, make up the 9 secondary symbols of the Perso-Arabic alphabet. Without the short vowels, exact or correct pronunciation of the words is difficult and only possible with prior knowledge. New or unfamiliar words such as foreign ones are impossible to pronounce correctly. The lack of these vowels results in many examples of words with different pronunciations sharing the same spelling. The only way to distinguish these words is by the context of the sentence in which they appear.

Examples of words with equivalent spellings but differing pronunciations:
(sar or ser or sor), (dar or dorr), (rastam or rostam).

4. The existence of exclusively Arabic consonants. Unlike Arabic where the sounds of these consonants are different, in Persian no such difference exists. This makes spelling of words a challenge for ones memory. The consonants that share the same sounds are:

s: ,
h: ,
z: ,
t: ,
' :,

Examples of words that use exclusively Arabic letters are:

(sad), (râzi), (towr), (ra'd).

5. Out of the total of 33 letters of the alphabet there are only 17 distinguishable shapes. The complete letters of the alphabet are created by adding dots and lines as diacritics to the different shapes. Dots and lines are thus the only differentiating symbols between many consonants. This is a source of many errors.

6. Some letters of the alphabet take on different shapes depending on their occurrence at the beginning, middle, or the end of a word or if they occur separately.

The number of different shapes varies from a maximum of 4 for some letters like


to 2, like in


7. Within a word, some letters have to be attached to their preceding and/or following letters while others do not.

8. The frequently occurring letters (vāv) and (yeh) represent several sounds.

The letter (vâv) can represent the sounds "o", "u", "v", "vu", and "ow" or fall silent altogether. This characteristic alone is the source of many pronunciation errors. An ancient word such as "Faravahr" is commonly mispronounced as "Foruhar", "mozdvar" has become "mozdur", and "ranjvar" has turned into "ranjur". A dispute has recently arisen whether words like "xosrow" or "now" ought to be pronounced with a "w" or not, since "ow" and "o" are represented by the same single letter "vâv". As expected, one letter representing many sounds can contribute to many words with different pronunciations sharing the same spelling.

Examples: ("to" or "tu"), ("jow" or "ju"), ("jur" or "jowr"), ("shur" or "showr"), etc.

An example of the the silent "vâv" is:
("xvâb" pronounced "xâb") and many others.

The letter (yeh) can represent the sounds "y", "i" and "iy". This multiple representation has further exacerbated the problem of words with different pronunciations sharing the same spelling.

("sir" or "seyr"), ("dir" or "deyr"), etc.

9. A great number of Persian words end with the letter "h". The sound of this letter remains the same, regardless of its location within a word, unless it occurs at the end of a word. The consonant "h" could then retain its normal sound or convert into the vowels "e" or "a". Yet again we are confronted with a phenomenon that results in many different sounding words sharing a common spelling.

("be" or "beh" or "bah"), ("na" or "noh" or "neh"), ("ke" or "keh" or "kah" or "koh"), etc.

10. The conjunction "-e" is one of the most frequently occurring sounds in the Persian language. Yet, it is either not written, because of the absence of the short vowels, or can appear as a (yeh) or (hamze), following a word that ends with an "h" that is pronounced as "e" or "a".

As in two ways of writing or (xāneye man - my house).
11. Existence of many very different styles of writing like: Shekaste, Naskh, Nasta'liq, etc. . One should emphasize that these are not names of different fonts, as in Latin-based alphabets, but different styles of writing altogether. The Shekasteh or Nasta'liq styles cannot be rendered as text on the WWW, but have to appear as images.
12. Unlike words written in the Latin alphabet that appear as one integrated grouping of letters, words written in Perso-Arabic can appear as scattered, uncorrelated set of letters. This is due to reason 7 above where some letters could be attached to others and some not.

(Ābādān), (kashāle), (adab), etc.
13. In Perso-Arabic, words are written from right to left, while numbers are written from left to right. This inconsistency can be as small as a mere irritation, or as large as a source of errors and unnecessary complications in case of scientific, mathematical, or other documents composed of words and numbers.
14. E-mail communications using Perso-Arabic require special keyboards, and text editors or word-processors that can be cumbersome to use. One could not conveniently use any standard keyboard and inscribe an e-mail message using ones favorite web-based e-mail service such as HOTMAIL or YAHOO!.
15. Perso-Arabic cannot be used with the enormous number of software applications being developed daily that can only work with the Latin alphabet. These include Optical Character Recognition ( OCR ) packages. OCR software packages for Perso-Arabic script are extremely rare, inaccurate, and expensive. Also included are text to speech and speech to text software which are utterly useless with the Perso-Arabic script.

16. Lack of a standard in electronic representation of Perso-Arabic text.

At least 4 different ways exist in digitally inscribing Perso-Arabic using:

1) Unicode
2) Windows Code 1256 (Arabic) (The XP version is not backward compatible) - Not available on Netscape
3) A multitude of distinct web fonts
4) Unix fonts.

In addition, the absence of a single standard for database storage and display of Persian text has created a multitude of puzzles for engineers and developers, and become a nuisance for ordinary users.


  • The need to download fonts in order to read the Perso-Arabic text on many web sites. Such online text would most likely not get stored correctly by major search engines.
  • Dependence on Unicode and Windows code characters primarily used for Arabic, with minor additions for Persian. The resulting online text has the appearance of Arabic since the focus has been on developing letters closely resembling fonts common to that language and not Persian. The Persian-specific letters can be difficult to distinguish when using smaller size fonts.
  • The task of database and search engine development for Persian text has become enormously more complex without such a standard

17. The complex and convoluted rules regarding Perso-Arabic spacers. These rules have presented, and will continue to present major challenges to users interested in correctly typing or displaying Persian words.

In digital orthography of Perso-Arabic words, 3 different types of special spacers in addition to the normal word spacer are needed. These are [1]:

  • Zero-Width Non-Joiner (ZWNJ):
    In Persian, many of the letters of the alphabet naturally connect with the following letter when written in a word. However, when writing certain prefixes, suffixes and compound words, we override this natural behavior of joining letters, by inserting a Unicode ZWNJ character, and prevent them from joining the following letter but without adding a space between the two.

    Incorrect (normal space)
    Incorrect (no space)
    Correct (ZWNJ space)

  • Zero-Width Joiner (ZWJ):
    In Persian, a letter written in isolation (not joined with the preceding or following letter) should revert to its "isolated" form. There are certain situations where it is desirable to override this natural behavior. Of course, this discussion only concerns those letters which have different forms when connected and standing in isolation.

    Example (Abbreviation of "Hejriye Shamsi"):
    Incorrect "heh" (normal space)
    Correct "heh" (ZWJ space)

  • Narrow No-Break Space (NNBS):
    It has been suggested that it would be desirable to have a space for Perso-Arabic compound words, wider than the Zero-Width Non-Joiner but narrower than the normal word space. If such compound words occur at the end of a line they would resist word-wrap and would not break up. Unfortunately, at the present time there is no font that contains this space.

    Normal Word Space
    Zero-Width Non-Joiner

The rules regarding using ZWNJ versus normal word spacers have to be strictly obeyed in both digital and regular orthography. If not followed, in addition to the problems indicated above, a word could be mistaken as two or more words, or several words could appear as a single compound word. The inconsistency of spacing is a source of parsing problems that lead to many inaccuracies in software technologies such as the OCR and Machine Translation.

(dāruxāne), (pordarāmad).

A Latin-based alphabet will do away with all these complexities by requiring only the normal word spacer.

18. To master the spelling rules of Perso-Arabic requires practical knowledge of a very large number of complex rules. Almost every spelling rule has one or more exceptions. It is therefore impossible for any one person to know, let alone obey all these rules. To enumerate most of the Perso-Arabic spelling rules and exceptions requires a 35 page booklet. Such booklet was created after years of debate by experts, but has produced more questions than solutions[2]. The rest of the items on this page are taken directly from this booklet.
19. The rules governing the use of the "hamze" symbol, borrowed from Arabic, alone takes the total of 5 out of 35 pages.

The regulations that are supposed to manage the use of "hamze" are the most complex set of laws. This symbol can be written in 5 different ways and still take different shapes depending on its location within a word. That is why some words like or (mas'ale-problem) and or (mas'ul-responsible) can be written in 2 different ways. As stated above "hamze" and "eyn" can sometimes share the same sound like in (ra's), and (ra'd).

20. Many borrowed Arabic words must follow Arabic spelling which could be different from Persian rules. Yet again adding another dimension of complexity to Perso-Arabic spelling.

The occurrence of a (round t, pronounced just like t) in (saqatoleslâm) which must be converted to regular "t" in (zakât) or (meshkât). Or the letter "yeh" with a short aleph inside which occurs only at the end of a word and is pronounced as "â". Sometimes the short aleph is dropped for various reasons. This can cause errors in distinguishing words like (Ali) and (Alā) or (avvali) and (owlā).

The tashdid symbol if not put can result in a double pronunciation.

(moayyan) and (moin).

21. The "e'rāb" (or the floating short vowel symbols) should be placed where and when needed.
This is not always possible. Especially, in Internet communication where they are rarely utilized. Although they can help the reader pronounce the words, they can add to clutter, and several "e'rāb" signs can get mixed up together or with the ever present dots and lines. That is why they are almost never used in books and newspapers with small prints.
22. When "v" occurs before "u", the spelling rules would require certain exceptions.
The source of this confusion is because a single letter "vâv" is used to represent both "v" and "u". Therefore, should 2 vâvs be written or 1 would suffice? The answer is, most of the times both "vâv"s must be written as in (tāvus-peacock), and other times only one will do as in (Dāvud-David).

23. Suffixes can be attached to their preceding word or not. Depending on their first letters and the ending letters of the word they attach to. Like the word "arjomand (arj+omand)" must be written as attached, and "āb bān" and "barq āsā" must not. Sometimes the suffix "vār" is attached, like in (bozorgvār), "sugvār", "xānevār", and other times it is not, like in (tuti vār), "ferdowsi vār", or "pari vār".

The logic governing these laws are based purely based on aesthetic reasons!

More aesthetically inspired spelling rules:

"-tar" and "-tarin" should be separate except for: behtar, mehtar, kehtar, bištar.
And "-ce" is always written separate except in "ānce", and "cenānce".
The plural sign "-hā" should always be separate except in "inhā" and "ānhā".

24. The prefix "bol", borrowed from the Arabic, can be written in two ways. The simplified Persian way, or according to Arabic spelling rules, but not always. Both (bolhavas, which follows the simplified Persian rule) and (bu'lhavas, which follows the complicated Arabic rule) are correct. On the other hand the simplified (bolqāsem) is incorrect, and has to follow Arabic spelling rules as (bualqāsem).

Major difficulty with the prefix "be-". It should be attached to the verb that follows like in "begoftam", "beravam", "benamāyad", "bexrad", "bejā", etc., but not when the following word is a noun, adverb, or adjective like "be sarbordan", "be āvāze boland", "be dast āvard", "be saxti", "be haq".

Other prefix rules with exceptions:

"Bi-" is always written separately except for: "bihude", "bixod", "birāh", "bicāre", "binavā".

"Ham-" should always be attached like in "hamāyeš", "hamsang", "hamsāye" except in words that include a "hamze" like: "ham ārzu", and "ham ārmān".

"Ce-" should be separate other than in "cerā" and "cegune".
25. Combination words must be attached or not. For example, the words: "hoshyār", "sekanjabin", and "behnām" should be conjoined, but "del angiz", "aqab oftādegi", "kam ehsās", have to be written separately because of the occurrence of the letters a, o, or e. Words like "āyin nāme", "pāk kon", "cub bori" must appear separate because of the occurrence of the same letters. The combination words that are repetitions of the same word like, "kam kam", and "tak tak", should not be conjoined. Same with "sangin rangin". Also combination words that are made up of one or more European words combined with Persian words, like "xoš poz", "šik puš" must be written separately. Arabic combo words made up of 2 or more words like "ma' zālek", "men ba'd", "en šā allāh", "ma' hāzā" must also appear separate. Same rule applies to combination words which contain a number, like in "panj tan", "haft gonbad", "hašt behešt", "noh falak". Exceptions to this rule are "yeksuye", "Yekšanbe", "yekšabe", "yeksare" which may be conjoined.
26. Due to the existence of several consonants that share the same sound, or consonants that are exclusive to Arabic but occur in the Perso-Arabic alphabet without their original sounds, there are a huge number of words that can be spelled in more than one way. This lack of standard spelling has contributed to a chaos in spelling such words.

Examples of words with more than one allowed spelling:

āzuqe, āqā, establ, emperātur, belit, boqce, tanbur, tāq, tabar, tapidan, tarāz, tufān, qaltidan, qāti, qobād, howle, luti, estaxr, tāb, toxārestān, tus, tahmāsb (or sometimes tahmāseb), moqān, etc..
27. There is ample confusion sorrounding using the letter "y" as a buffer, as in "zibāyi (with a y)" or not using it as in "zibāi (with a hamze)".

A huge number of examples exist, e.g., tanhāyi, Orupāyi, ....
28. A symbol has to be used for the conjunction "-e" where needed.

Again, such a rule has proven impractical for both electronic and print media. The conjunction symbol is used only in rare accasions.


whether it is "asbsavāri" or "asbe savāri" cannot be deduced without knowing the context in which they occur.
29. The Arabic tanvin is one of the 9 symbols. It can be written with unique characters like "an", "on" or spelled out as "-an" and "-on".

This results in duplicate spelling problems.

[1] See http://students.washington.edu/irina/persianword/zwnj.htm
[2] Excerpts from a booklet published (2000) by the Iranian Academy of Persian Language and Literature on spelling rules and conventions of the current Persian script-(The purpose of this booklet was to bring some order to the chaos of Perso-Arabic spelling rules, but all it reveals, ironically, is the extent of the existing anarchy).
Contact Us | ©2003-2005 UniPers.com. All rights Reserved.