Error trying to read a file in scrapy

I am trying to read a file which contains a list of domains to crawl. The results are added to a file(filename corresponding to the domain). When I try to run the code scrapy crawl apple

It throws an error:

def __init__(self)                 
SyntaxError: invalid syntax

domain.txt contains: anything.com hello.com

This is my code.

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item, Field
from scrapy.utils.response import get_base_url
from scrapy.utils.url import urljoin_rfc


class MySpider(CrawlSpider):

   name = 'crawl'
   allowed_domains = []
   start_urls = []

   def __init__(self):
      for line in open('domain.txt', 'r').readlines():
        self.allowed_domains.append(line)
        self.start_urls.append(line)
        rules = [Rule(SgmlLinkExtractor(allow=()), follow=True, callback='parse_item')]
        super(MySpider,self).__init__()


   def parse_item(self, response):
     sel = HtmlXPathSelector(response)
     links = sel.select('//a[contains(@href, "anything")]/@href').extract()
     items = []
     for link in links:
      item = AnythingItem()
      item['reference_link'] = response.url
      yield item

Update 0: I forgot to add colon after init function. Sorry. But still it doesn't work


ANSWERS:


you forgot the colon (:) mark after the function definition to start a new block.

 def __init__(self):

A SyntaxError is an indication that the line of code you've written fails to parse by the Python interpreter. It simply means it's not a valid line of code.


First, this code really works??
It should not work since you are doing start_urls = ['http://www.apple.com\n'], what is not a valid link. You should remove the newline character from the end of the line.

Second, are you sure domain.txt contains apple.com?
If so, scrapy should complain there is no scheme in the url.

You probably have http://www.apple.com. In this case, you are doing that:

allowed_domains = ['http://www.apple.com']
start_urls = ['http://www.apple.com']

And you want that:

allowed_domains = ['apple.com']
start_urls = ['http://www.apple.com']


 MORE:


 ? Scrapy - Visiting nested links and grabbing meta data from each level
 ? Using Scrapy to crawl a list of urls in a section of the start url
 ? saving scrapped items to json/csv/xml file using scrapy
 ? saving scrapped items to json/csv/xml file using scrapy
 ? saving scrapped items to json/csv/xml file using scrapy
 ? python scrapy - output csv file empty
 ? Scrapy, scrapping data inside a javascript
 ? Scrapy - Getting duplicated items using JOBDIR
 ? How to create custom Scrapy Item Exporter?
 ? Running Scrapy spider from a script with scrapy arguments