identifying if the character is a digit or Unicode character within a word in python

I want to find if a word contains digit and characters and if so separate the digit part and the character part. I want to check for tamil words, ex: ரூ.100 or ரூ100. I want to seperate the ரூ. and 100, and ரூ and 100. How do i do it in python. I tried like this:

    for word in 
      for word1, word2, word3 in zip(word,word[1:],word[2:]): 
        if word1 == "ர" and word2 == "ூ " and word3.isdigit(): 
           print word1 
           print word2 
        if word1.decode('utf-8') == unichr(0xbb0) and word2.decode('utf-8') == unichr(0xbc2): 
           print word1 print word2


You can use (.*?)(\d+)(.*) regular expression, that will save 3 groups: everything before digits, digits and everything after:

>>> import re
>>> pattern = ur'(.*?)(\d+)(.*)'
>>> s = u"ரூ.100"
>>> match = re.match(pattern, s, re.UNICODE)
>>> print
>>> print

Or, you can unpack matched groups into variables, like this:

>>> s = u"100ஆம்"
>>> match = re.match(pattern, s, re.UNICODE)
>>> before, digits, after = match.groups()
>>> print before

>>> print digits
>>> print after

Hope that helps.

Use unicode properties:

\pL stands for a letter in any language
\pN stands for a digit in any language.

In your case it could be:



 ? how to traverse a unicode tamil word character by charcter in python?
 ? Regex (grep) matching words made of exactly these letters
 ? How to use Google transliteration API in my java web application?
 ? Help unable to understand Google Translate API v2 for PHP?
 ? Android google translate API
 ? How to pass Whole paragraph to Google Translate API v2?
 ? Transliteration from Hindi to English on Android without using Google API
 ? Offline language translator API for Android?
 ? Transliteration with Android
 ? Javascript / jQuery replace last word in input box in tamil language?