Calling Scrapy from another file without threading

I have to call a crawler from another python file, for which I use :

def crawl_koovs():
    spider = SomeSpider()
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.signals.connect(reactor.stop, signal=signals.spider_closed)

On running this, I get the error:

exceptions.ValueError: signal only works in main thread

The only workaround I could find is to use

which I don't want to use as I want to call this method multiple times and want reactor to be stopped before the next call. What can I do to make this work (maybe force the crawler to start in the same 'main' thread)?


The first thing I would say to you is when you're executing Scrapy from external file the loglevel is set to INFO,you should change it to DEBUG to see what's happening if your code doesn't work

you should change the line:




To store everything in the log and generate a text file (for debugging purposes) you can do:

log.start(logfile="file.log", loglevel=log.DEBUG, crawler=crawler, logstdout=False)

About the signals issue with the log level changed to DEBUG maybe you can see some output that can help you to fix it, you can try to put your script into the Scrapy Project folder to see if still crashes.

If you change the line:

crawler.signals.connect(reactor.stop, signal=signals.spider_closed)


dispatcher.connect(reactor.stop, signals.spider_closed)

What does it say ?

Depending on your Scrapy version it may be deprecated


 ? Running scrapy from inside Python script - CSV exporter doesn't work
 ? Passing list as arguments in Scrapy
 ? Crawling local files with Scrapy without an active project?
 ? A web crawler in a self-contained python file
 ? What is the simplest way to programatically start a crawler in Scrapy >= 0.14
 ? Scrapy Spider returns None instead of Item
 ? How can I tell Scrapy to only crawl links inside an Xpath?
 ? Running scrapy from script (beginner)
 ? Unable to run scrapy for Python
 ? Scrapy run from Python