str() method adds whitespace characters

How do you prevent python from adding the whitespace characters when calling str()? I have a screen scraper/web crawler that uses urllib.request. I'm calling str() on my content. Here is the code I have.

req = urllib.request.Request(national_url, headers={'User-Agent' : "Magic Browser"})
con = urllib.request.urlopen( req )

#grab html
html =
my_str = str(html)

the problem is i use regex to parse this html for some patterns, and str adds all the whitespace characters like \n and \t.

My question is how do I prevent the str() function from adding these additional characters literals.


I was using urllib2 before in a python 2.7 script I wrote. I brought it over to a new pc but started using python 3.6 on this pc. These regular expressions no longer worked. I was getting an error when I passed to this function. I wrapped the in a call to str() as shown above, and I noticed in 3.6, with the function added a whole bunch of \t's and n's. My question is how do I either make my expressions work, or prohibit all of the "charater literals" otherwise known as '\t\n' (I will admit that I'm probably using the wrong term for those characters).

This was working in Python 2.7. I switched to python 3.6.

def parse_html_doc(str='', poke_id = 0):
    if len(str) > 0:

        poke = MyClass()
        poke.dex_num ='\d+(?=<\/strong>)', str).group(0) ='[A-Za-z]+(?=<\/h1>)', str).group(0)
        poke.hp ='\d+','<th>HP<\/th>\s+<td class="num">\d+<\/td>', str).group(0)).group(0)
        poke.atk ='\d+','<th>Attack<\/th>\s+<td class="num">\d+<\/td>', str).group(0)).group(0)
        poke.bdef ='\d+','<th>Defense<\/th>\s+<td class="num">\d+<\/td>', str).group(0)).group(0)
        poke.spatk ='\d+','<th>Sp\. Atk<\/th>\s+<td class="num">\d+<\/td>', str).group(0)).group(0)
        poke.spatk ='\d+','<th>Sp\. Def<\/th>\s+<td class="num">\d+<\/td>', str).group(0)).group(0)
        poke.spd ='\d+','<th>Speed<\/th>\s+<td class="num">\d+<\/td>', str).group(0)).group(0)
        poke.des ='<p>.*<\/p>', str).group(0).replace('"', '""') = poke_id
        return poke


str() doesn't create those characters, they were already there.

If you want to remove \n and whitespace at the beginning and the end of a string, then you can just use

s = '\n     bla 123\n 1235\n ...\n'
result = s.strip()

> 'bla 123\n 1235\n ...'

Looks like you want to do this:

eq = urllib.request.Request(national_url, headers={'User-Agent' : "Magic Browser"})
con = urllib.request.urlopen( req )

#grab html
html =
my_str = str(html.replace("\n", "")
my_str = my_str.replace("\t", "")

This should remove all the white space characters from your string.


 ? opening a tarfile from an ftp site using python using tarfile and urllib
 ? urllib post request not working
 ? Python replace url lib.requests with requests and BeautifulSoup
 ? Python Download file with Pandas / Urllib
 ? Phase 3 Binary Bomb Assembly
 ? Linking Error - C++ Clang MacOs
 ? How to solve compile error for the llvm example?
 ? Function pointer to global operator compiles on VC++ while clang gives an error
 ? Why am I getting an unused lambda capture warning?
 ? Why does VC++ compile this code while Clang won't