Normalizing Repeated JSON Fields in FDA Drug Data Using DuckDB
A few weeks ago, I started digging into the FDA’s drug event dataset—curious to see what insights I could uncover. It didn’t take long to hit a wall. Healthcare datasets like this one come with a hidden performance cost: Moderdate to high cardinality nested fields buried in JSON. Take pharm_class_epc as an example—each record can … Read more