David J. Lee

Research Engineer at Scale • GitHubLinkedInEmail • New York, NY

david.jpeg

I'm an incoming ML research engineer joining the MLDG team at Scale.

Prior to Scale, I was a graduate student at Cornell, where I worked with Kevin Ellis on code generation, focusing on synthetic data generation for LLM coding applications. In particular, I designed algorithms to generate diverse training datasets for code generation models, using ideas from novelty search. I also worked on probabilistic program synthesis for the ARC-AGI benchmark.

As an undergrad at Williams, I worked on adaptive quotient filters, concurrent program analysis, and knot theory. My thesis was advised by Shikha Singh and Sam McCauley.