Skip to content

Read CSV Files in Pandas with Apache Arrow: 5x Faster!



5 Tips for Reading big CSV Files in Python with Pandas and pyarrow

In this video, we’ll see how you can efficiently read large CSV files in Python with pandas and the pyarrow library. Pandas 2.0 added support for Apache Arrow and the pyarrow library.

I compare the C engine with the pyarrow engine for the read_csv function in pandas, and I also compare using the three dtype-backends: numpy, numpy_nullable, and pyarrow.

In my example, a few tweaks make reading my sample 4GB file go from 28 seconds down 6 seconds. Finally, I show how to use the Parquet file format and how this can further decrease the reading time to less than 1 second.

???? Please like if you found this video helpful, and subscribe to stay updated with my latest tutorials. ????

For written instructions and additional details, check out my blog post:

???? Chapters:
00:00 Intro
00:40 Apache Arrow
02:12 Large CSV file
03:10 Tip 1: Keep your CSV files…

Send this to a friend