SQL queries on JSON files with DuckDB
A long time ago, I was impressed with harelba/q and how it made it easy to run SQL queries against CSV files. To my surprise, I recently learned that one can do the same with DuckDB, running SQL queries against JSON. There’s no need to remember the not-so-intuitive jq
syntax, just plain old SQL:
$ wget -O repos.json https://api.github.com/users/myhro/repos
$ duckdb -c "SELECT language, count(*) FROM 'repos.json' WHERE language IS NOT NULL GROUP BY 1 ORDER BY count_star() DESC"
┌──────────┬──────────────┐
│ language │ count_star() │
│ varchar │ int64 │
├──────────┼──────────────┤
│ Python │ 8 │
│ Shell │ 5 │
│ C │ 3 │
│ Ruby │ 2 │
│ HTML │ 2 │
│ Makefile │ 1 │
│ Go │ 1 │
│ Lua │ 1 │
│ PHP │ 1 │
│ CSS │ 1 │
├──────────┴──────────────┤
│ 10 rows 2 columns │
└─────────────────────────┘