Memory error when using pandas read_csv-open source projects pydata/pandas
I am trying to do something fairly simple, reading a large csv file into a pandas dataframe.
This is what I am using to do this:
data = pandas.read_csv(filepath, header = 0, sep = DELIMITER,skiprows = 2)
The code is behaving quite erratically. It either fails with a memory error (detailed error message as P.S.), or just never finishes (Mem usage in the task manager stopped at 506 Mb and after 5 minutes of no change and no CPU activity in the process I stopped it).
I am using pandas version 0.11.0. I am aware that there used to be a memory problem with the file parser, but according to http://wesmckinney.com/blog/?p=543 this should have been fixed. The file I am trying to read is 366 Mb, the code above works if I cut the file down to something short (25 Mb). It has also happened that I get a pop up telling me that it can’t write to address 0x1e0baf93…
I am running the code in debug in Visual Studio, using Anaconda and PTVS (the step-by-step debug, F5).
A bit of background – I am trying to convince people that Python can do the same as R. For this I am trying to replicate an R script that does
data