Python Scrapy: How to get CSVItemExporter to write columns in a specific order-open source projects scrapy/scrapy

gvb

This is related to Modifiying CSV export in scrapy

The problem is that the exporter is instantiated without any keyword parameters, so the keywords like EXPORT_FIELDS are ignored. The solution is the same: you need to subclass the CSV item exporter to pass the keyword parameters.

Following the above recipe, I created a new file xyzzy/feedexport.py (change “xyzzy” to whatever your scrapy class is named):

"""
The standard CSVItemExporter class does not pass the kwargs through to the
CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored
(EXPORT_EMPTY is not used by CSV).
"""

from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter

class CSVkwItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None
        kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')

        super(CSVkwItemExporter, self).__init__(*args, **kwargs)

and then added it into xyzzy/settings.py:

FEED_EXPORTERS = {
    'csv': 'xyzzy.feedexport.CSVkwItemExporter'
}

Now the CSV exporter will honor the EXPORT_FIELD setting – also add to xyzzy/settings.py:

# By specifying the fields to export, the CSV export honors the order
# rather than using a random order.
EXPORT_FIELDS = [
    'field1',
    'field2',
    'field3',
]