The nukta / ़  / is a diacritic that is attached to a few characters in Hindi/Devanagari. However, there are separate unicode characters that come with nukta as a separate character. Thus, Unicode provides for adding nukta to almost any character, probably to allow for use of nukta in even in letters like य and व which is not required in Hindi but may be required to distinguish other sounds from languages other than Hindi or several other unwritten languages where Government of India recommends use of Devanagari script with necessary variations.

Below is a list two ways how the prominent valid nukta-based letters can be written in Hindi. One way is to write the letter with two characters combined, e.g. क + ़ = क़. Another way is to simply the single code-point unicode provided character क़ . Using the latter version of writing has an advanatage (and is preferable) because it uses only one character while the former uses two characters.Below is the list of all the nukta-based characters used in Hindi (written in two ways.


Character with Separate Nukta

Character with nukta-embedded

क + ़ = क़

ख + ़ =  ख़

ग + ़ = ग़

ज + ़ = ज़

फ + ़ = फ़

ड + ़ = ड़

ढ + ़ = ढ़

How to find whether nukta is separate or embedded?
Given that both ways of writing nukta character is visibly different to the eyes, finding whether the character as a separate character for nukta or whether it is embedded one is impossible to find with bare yes. The easiest way out to find out is to copy the character and paste in a text editor, preferably Notepad++ or Notepad (if you are on a Windows machine) (MS Word or RTF or Open Office does not work).
After pasting it, just move cursor one by one and add space. If you see the cursor sitting inside a nukta based character, it means there is a separate character which you can see for yourself as the nukta comes out separately. Another way is to simply do a Ctrl+F on any editor or application and look for the nukta character i.e. ़ . Whichever word matches it, it contains a nukta separately inserted character.

Rules for a Spell Checker to check spelling errors of Nukta
A spell checker, working out a lexicon of valid words in Hindi, may often not have both the variations of the word. For example, a lexicon may have the word सिर्फ़ (containing single code point character फ़) but not सिर्फ़ (containing two characters of फ+ ़) . The case may also be vice-versa. In such a case, it is imperative that a rule matches the single code point nukta character with the dual code point nukta character.

Therefore, a rule may be formulated simply as such that the two separater characters as shown in the table above matches with each other.

Examples of Nukta based characters that can be used as test cases to check whether the spell-checker is working correctly: काग़ज़, ख़िलाफ़, ग़ज़ल, अल्फ़ाज़, इत्तेफ़ाक़, कर्ज़, फ़र्ज़, काग़ज़, ख़िलाफ़, ग़ज़ल, अल्फ़ाज़, इत्तेफ़ाक़, कर्ज़, फ़र्ज़

Add comment

Security code