Get in touch at email@example.com
Health data pipeline
HealthRex, Stanford Department of Medicine (2020 - 2021)HealthRex's goal is to build a better clinical decision support system by training models on as much real clinical data as possible. Health data is notoriously difficult to obtain, but nascent techniques like multi-party computing (MPC) enable training a model without getting access to the data being used to train it.
As part of a larger experimental MPC platform, I built a Python package for transforming medical data into a format usable by models. The package lets the researchers declare what data they want from where, and let the data importer worry about how it gets that data. It also ensured that multiple researchers running it in tandem would have consistently formatted data. I deployed it along with the multi-party computing ML framework on a few Google Cloud virtual machines, kicked the tires, and wrote up some documentation for the researchers.
Great Firewall Watch: censorship monitoring
Firewall Cafe (2020 - 2021)Firewall Cafe is a project that began as a gallery installation that allowed people to simultaneously do an image search on Baidu and Google as a way of exploring censorship of the results. All searches and images that are discovered are saved into a dataset. But having a dataset that was collected in a systematic way, tracking terms of interest over time, might prove to be interesting for researchers and journalists looking into censorship. So I build Great Firewall Watch, which runs at a regular interval on a specified set of terms. I prototyped the process in a series of notebooks, and built a portal for researchers to browse its results.
Naturalization data + analysis
FiveThirtyEight article, freelance data journalism (2020)I was a contributing researcher to an article that appeared in FiveThirtyEight on COVID's effects on naturalized citizen voting, and published the data I collected as a GitHub repository. I created the dataset itself with a series of Jupyter notebooks, outputting to Excel for easier analysis. We also looked at what, if anything, we could confidently say about how the closures of naturalization offices (and the consequent prevention of immigrants becoming US citizens) might plausibly affect the outcome of elections in states and districts. But we came to realize that we couldn't say anything conclusive given the data we had.
Theater recommendation systems
Thespie (2019 - 2020)We wanted to sort shows on Thespie by how popular they were, based on how much they were being clicked on. So I built a recommendation system on an AWS stack, with a Lambda being invoked daily to aggregate the user analytics per show and cache the results in the API's database. I also build a similarity ranking system, which we used to recommend shows similar to the one the user is browsing. This all happened as a daily job run by Django, which computed a cosine similarity matrix based on genre tags. Thespie has hundreds of these genre tags, which meant that the system could recommend fairly specific types of shows.
Mapping the metabolome
Rheos Medicines (2018)Tests on the human metabolome create lots of data about tens of thousands of metabolites, which means we want to visualize it somehow. Rheos Rx approached me to design and build a version of ReconMap, which they could use to visualize information from metabolomics tests they were performing.
Drug discovery dashboard
NuMedii (2017 - 2018)Scientists at NuMedii needed a way to explore relationships between drugs, genes, and diseases at a high level, informed by NuMedii's drug enrichment analysis software. Testing drugs for treating diseases is expensive and time consuming, and the potential exists for existing drugs that have already undergone some testing to be repurposed for treating another disease.
Evergreens of Washington map
personal project (2016)This map project is a love letter both to D3's wonderful mapping tools and the biodiversity in the Pacific Northwest. Read the longer making-of here, but in short, I built a dashboard that uses tree range maps to indicate the ecosystems where various evergreen species grow in Washington state.
Nanocrafter, a synthetic biology game
Center for Game Science, University of Washington (2012 - 2015)I was part of a small team of research engineers at the Center for Game Science that built Nanocrafter, a game which challenged players to build machines out of RNA to solve puzzles which was funded by DARPA and the National Science Foundation. I also spearheaded the design of its peer scoring, which later won the game an award at the Serious Games Showcase and Challenge. Sadly, the game was shut down a few years ago, but is hosted by the Internet Archive.