Académique Documents
Professionnel Documents
Culture Documents
1/3
RESEARCH STATEMENT
Susan Wei (susanwe@live.unc.edu)
Ninety percent of the data stored in the world today has been created in the past two years. This data revolution, which some have dubbed Big Data, has opened up many exciting areas in statistics. The challenge of Big Data is not simply that it is big. While classical statistics deals with lists of numbers, modern data comes to us as texts, videos, images, and tweets. My research is in the development of statistical methodology which brings together techniques from machine learning, high dimensional low sample size asymptotics, and empirical processes to tackle the challenges presented by modern data types. Below I discuss my dissertation research and ongoing work, followed by an outline of my longer-term agenda.
http://www.swei.web.unc.edu/
Susan Wei
2/3
and estimation of treatment eect heterogeneity in personalized medicine. This is the common phenomenon in medicine where the same treatment can have dierent eects on dierent patients. To discover subgroups of patients who respond dierently to treatment, I considered a model where a linear hyperplane in the covariate space separates the subjects into two groups with dierent treatment responses (Wei and Kosorok 2013c). For example, suppose the outcome of interest is survival and the covariates are gene expressions. The method I developed can be applied to discover a classication rule based on gene expression that divides the population into two subgroups for whom treatment has dierent eects on survival. The hyperplane gives a clear classication rule that can be used in future studies. Furthermore, the coecients of the linear classier roughly indicate a genes importance. The work on discovering and estimating treatment eect heterogeneity is an extension of a machine learning concept I developed called Latent Supervised Learning (Wei and Kosorok 2013a). Latent Supervised Learning is the task of inferring a binary classication rule from input data where only a continuous surrogate variable is available for training. This type of learning falls in the middle of the machine learning spectrum that has clustering (unsupervised learning) at one end and classication (supervised learning) at the other. I rst developed Latent Supervised Learning in the context of continuous surrogate variables that are Gaussian distributed (Wei and Kosorok 2013a). A following paper extends the original methodology to the context of censored survival outcomes (Wei and Kosorok 2013b). The method for estimating treatment eect heterogeneity builds upon the tools used in these previous papers and further extends the possible distributions of the outcome variable to include any distribution in the exponential family. Tools from empirical processes provide an elegant approach for studying the theoretical properties of general machine learning objects. Despite its wide applicability, there remains a high barrier of entry to this theoretical branch of statistics. In my work on Latent Supervised Learning, tools from empirical processes have proved crucial in establishing the consistency properties of various estimators. Work on establishing weak convergence results is currently under progress. The extension of Latent Supervised Learning to ultra-high dimensional settings is another future research goal.
http://www.swei.web.unc.edu/
Susan Wei
3/3
Long-term goals
My long-term research goals, beyond continuing to work on the aforementioned extensions and applications, are to develop statistical models for modern data types. Thus far statisticians have been very successful at modeling entities on the real line. However, salient models for modern data types, such as medical images, are still lacking. This is not surprising given the fact that we have few distributions to describe even multivariate data besides the multivariate Gaussian distribution. My plan is to invent new paradigms of statistical modeling, either mathematical or simulated in nature, that can be used for a wide-range of modern scientic inquiries. Another long term goal I have is to contribute to the advancement of statistics by elevating the statistical rigor of scientic applications. I will accomplish this through better statistical outreach and dissemination of research results. For the former, I will oer consulting not only on statistical data analysis but also on how to use statistics to plan studies. For the latter, I will focus on developing good strategies for data management and study reproducibility in my collaborations. This includes making my software freely available and publishing all code to reproduce data results.
References
Bradford, Blair U, Lock, Eric F, Kosyk, Oksana, Kim, Sungkyoon, Uehara, Takeki, Harbourt, David, DeSimone, Michelle, Threadgill, David W, Tryndyak, Volodymyr, Pogribny, Igor P, Bleyle, Lisa, Koop, Dennis R, and Rusyn, Ivan (2011). Interstrain dierences in the liver eects of trichloroethylene in a multistrain panel of inbred mice. Toxicological Sciences 120 (1), pp. 20617. Clement, Meagan E. (2012). Analysis Techniques for Diusion Tensor Imaging Data. PhD thesis. The University of North Carolina at Chapel Hill. Miedema, Jayson, Marron, James Stephen, Niethammer, Marc, Borland, David, Woosley, John, Coposky, Jason, Wei, Susan, Reisner, Howard, and Thomas, Nancy E (2012). Image and statistical analysis of melanocytic histology. Histopathology 61 (3), pp. 436444. Segall, S.K., Nackley, A.G., Diatchenko, L., Lariviere, W.R., Lu, X., Marron, J.S., Grabowski-Boase, L., Walker, J.R., Slade, G., Gauthier, J., Bailey, J.S., Stey, B.M., Maynard, T.M., Tarantino, L.M., and Wiltshire, T. (2010). Comt1 genotype and expression predicts anxiety and nociceptive sensitivity in inbred strains of mice. Genes, Brain, and Behavior 9 (8), pp. 93346. Shen, Dan (2012). Sparse PCA asymptotics and analysis of tree data. PhD thesis. University of North Carolina. Wei, Susan and Kosorok, Michael R (2013a). Latent Supervised Learning. Journal of The American Statistical Association 108 (503), pp. 957970. Wei, Susan and Kosorok, Michael R. (2013b). Latent Supervised Learning for Survival Data. Submitted. url: http://biostats.bepress.com/uncbiostat/art38. Wei, Susan and Kosorok, Michael R. (2013c). Latent Supervised Learning for Treatment Eect Heterogeneity. Wei, Susan, Lee, Chihoon, Wichers, Lindsay, and Marron, J.S. (2013). Direction-Projection-Permutation for High Dimensional Hypothesis tests. url: http://arxiv.org/abs/1304.0796.
http://www.swei.web.unc.edu/