Using sqlite to query CSVs

CSV is a fun format, but more importantly—it's ubiquitous. Almost every system known to man can export to this format, including petulant legacy databases like InterSystems Caché (if you know, you know).

So when I had a recent project that spat out a 28MB CSV file, I was unsurprised to see that my IDE choked while trying to do anything useful with it.

I needed to analyze the data because I was mapping and migrating that data to a new database with a new format, which included cleaning the existing data as it came into the new system.

A brief internet search informed me that I could use sqlite for this task!

You can load the CSV in-memory to perform quick one-liners on it, such as this:

sqlite3 :memory: -cmd \
   '.import -csv members.csv members' \
   'SELECT DISTINCT MembershipType FROM members'

You can perform any sort of SQL operations on your CSV:

# Duplicate emails, anyone?
sqlite3 :memory: -cmd \
   '.import -csv members.csv members' \
   'SELECT email, COUNT(*) as count FROM members GROUP BY email HAVING COUNT(*) > 1'

Of course, it's sqlite so it's pretty damn fast (pending the specs of your machine of course).

The utility of just being able to write SQL to query and work with a CSV cannot be overstated.