subsampwinner
Feature selection tooling built around the Subsampling Winner Algorithm, with an emphasis on stability under repeated resampling.
Research-oriented data science
I am Wei Dai, a statistician and data scientist with experience in feature selection, mixture models, stochastic simulation optimization, and applied work on electronic health records. This site brings together selected projects, lightweight research notes, and interactive explainers built to clarify difficult ideas.
Selected work
The project list is intentionally compact: one public package, one applied data artifact, and one durable archive for reusable research materials.
Feature selection tooling built around the Subsampling Winner Algorithm, with an emphasis on stability under repeated resampling.
A compact walkthrough of data preprocessing decisions for heart-transplant studies, focused on cohort construction and modeling-ready tables.
A public repository for reading notes, slide materials, and small teaching artifacts across statistics and machine learning.
Research map
I tend to return to the same questions across different projects: how assumptions are encoded, how uncertainty is surfaced, and how workflows stay usable once they leave the whiteboard.
Kernel choice, resampling behavior, and selection rules all encode preferences that should be inspectable rather than implicit.
EHR and observational settings demand careful preprocessing, explicit limitations, and tools that respect imperfect measurement.
Explanatory software should do more than execute an algorithm. It should help others see what the algorithm is assuming and where it can fail.
Interactive lab
The lab is where I turn abstract modeling ideas into compact visual tools. The first two demos focus on kernels and Gaussian-process posteriors because they reward visual exploration and benefit from direct manipulation.
Interactive explainer
Compare how different kernels change covariance structure and prior sample paths before fitting any data.
Kernels are often introduced abstractly, but the modeling consequences become clearer once you can see the covariance matrix and sampled functions change together.
Interactive explainer
Place noisy observations directly on the plot and watch the posterior mean and uncertainty band update in real time.
Posterior intuition is easiest to build when the model reacts immediately to new observations and hyperparameter changes.
Open to new collaborations