{"id":7834,"date":"2015-10-31T01:25:48","date_gmt":"2015-10-31T01:25:48","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2015\/10\/31\/how-can-i-use-different-pipelines-for-different-spiders-in-a-single-scrapy-project-open-source-projects-scrapy-scrapy\/"},"modified":"2015-10-31T01:25:48","modified_gmt":"2015-10-31T01:25:48","slug":"how-can-i-use-different-pipelines-for-different-spiders-in-a-single-scrapy-project-open-source-projects-scrapy-scrapy","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2015\/10\/31\/how-can-i-use-different-pipelines-for-different-spiders-in-a-single-scrapy-project-open-source-projects-scrapy-scrapy\/","title":{"rendered":"How can I use different pipelines for different spiders in a single Scrapy project-open source projects scrapy\/scrapy"},"content":{"rendered":"<p>Building on the solution from Pablo Hoffman, you can use the following decorator on the <code>process_item<\/code> method of a Pipeline object so that it checks the <code>pipeline<\/code> attribute of your spider for whether or not it should be executed. For example:<\/p>\n<pre><code>def check_spider_pipeline(process_item_method):\n\n    @functools.wraps(process_item_method)\n    def wrapper(self, item, spider):\n\n        # message template for debugging\n        msg = '%%s %s pipeline step' % (self.__class__.__name__,)\n\n        # if class is in the spider's pipeline, then use the\n        # process_item method normally.\n        if self.__class__ in spider.pipeline:\n            spider.log(msg % 'executing', level=log.DEBUG)\n            return process_item_method(self, item, spider)\n\n        # otherwise, just return the untouched item (skip this step in\n        # the pipeline)\n        else:\n            spider.log(msg % 'skipping', level=log.DEBUG)\n            return item\n\n    return wrapper\n<\/code><\/pre>\n<p>For this decorator to work correctly, the spider must have a pipeline attribute with a container of the Pipeline objects that you want to use to process the item, for example:<\/p>\n<pre><code>class MySpider(BaseSpider):\n\n    pipeline = set([\n        pipelines.Save,\n        pipelines.Validate,\n    ])\n\n    def parse(self, response):\n        # insert scrapy goodness here\n        return item\n<\/code><\/pre>\n<p>And then in a <code>pipelines.py<\/code> file:<\/p>\n<pre><code>class Save(object):\n\n    @check_spider_pipeline\n    def process_item(self, item, spider):\n        # do saving here\n        return item\n\nclass Validate(object):\n\n    @check_spider_pipeline\n    def process_item(self, item, spider):\n        # do validating here\n        return item\n<\/code><\/pre>\n<p>All Pipeline objects should still be defined in ITEM_PIPELINES in settings (in the correct order &#8212; would be nice to change so that the order could be specified on the Spider, too).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Building on the solution from Pablo Hoffman, you can use the following decorator on the process_item method of a Pipeline object so that it checks the pipeline attribute of your spider for whether or not it should be executed. For example: def check_spider_pipeline(process_item_method): @functools.wraps(process_item_method) def wrapper(self, item, spider): # message template for debugging msg = [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-7834","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7834","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=7834"}],"version-history":[{"count":0,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7834\/revisions"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=7834"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=7834"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=7834"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}