Google Search Result Encoding in Chinese

all

I've been working on the R program for google search data mining.
So far, my codes run well, except for the Traditional Chinese encoding problem. I'm working under the linux environment...

Google <- function(input)
  {
   require(XML)
   require(stringr)
   require(RCurl)
   hits <- GoogleHits(input)
   if( hits >= 1000){
      start.num = seq(0,900,100) 
  }else if(hits < 1000){
      start.num = seq(0,hits,100) 
  }
  for(i in 1:length(start.num)){
       start = start.num[i] 
       url <-paste("https://www.google.com/search?as_epq=",input,
       "&as_occt=title&num=100&ie=UTF-8&start=",start, sep = "")   


   CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")
   script <- getURL(url, followlocation = TRUE, cainfo = CAINFO)

   # using htmlParse() to re-organize the whole structure
   # the Chinese encoding shows quite well here 
   # (but the structure is not a vector)     
   doc <- htmlParse(script)

   # Wanna extract out the searched keyword
   # which is tagged by "<b>keyword</b>"
   # here, I take the keyword "統計" for example
   extract <- str_extract_all(html_str, "<b>統計</b>")

   # here is the problem... which extract only takes a vector as an argument
   # so below will return an error
   print (extract)

}
  }

So, the problems that I encountered are all included in the comments.

1) if not using htmlParse(), the extracted data can not be presented into recognized Chinese characters

2) if I've tried to convert the data into a vector (by applying script <- lapply(url, getURL)), though the str_extract_all() method can be used, the encoding problem arises...

In addition, the Chinese here I meant is the Traditional Chinese

Any comments or suggestions are truly appreciated!
Thanks in advance.


ANSWERS:


I found the bug! Myself!
So I am here to answer the question.

The problem is the parameter in the given url link!

Since the encoding of the Chinese is UTF-8,
there are two parameters that are necessarily needed,
oe=UTF-8&ie=UTF-8 !!
(which in the google's developers website,
it says that the oe=UTF-8 do not need to be specified,
and that's why I've skipped the part...)



 MORE:


 ? Java Encoding: why the output is always the same?
 ? i got different result with same code for converting byte to string function running in different JRE version (jre7 and jre8)
 ? Why does the encoding go wrong?
 ? Does specifying the encoding in javac yield the same results as changing the active code page in Windows CMD and then compiling directly?
 ? Character not displaying in html
 ? Character not displaying in html
 ? Character not displaying in html
 ? Chinese character in URL with Java
 ? Can't make (UTF-8) traditional Chinese character to work in PHP gettext extension (.po and .mo files created in poEdit)
 ? special character in php soap request wrongly displayed in xml out package