Traditional data analysis methods organize data into matrix form—a two-dimensional (2D) grid of numbers wherein each column is a measurement and each row is an observation (e.g., genes by subjects). However, this approach overlooks how measurements are often systematically collected in biology. For example, measurements to understand the molecular response of cells to therapy might be collected over concentrations of drug, time, different sources of cells, and molecular features. In these cases, the data can be organized into a multidimensional (e.g., 4D) form. Generalization of statistical tools into these multidimensional/tensor forms exist, but their use has only begun to catch on in studies of biology and medicine because there is a lack of (1) knowledge about their benefits, (2) practical and useful implementations, and (3) algorithms for specific challenges that arise with biological data. By applying these techniques, developing new algorithms, and providing accessible implementations, we are making these tools available in biomedical research.
Abstract: Effective exploration and analysis tools are vital for the extraction of insights from
single-cell data. However, current techniques for modeling single-cell studies performed
across experimental conditions (e.g. samples) require restrictive assumptions or do not
adequately deconvolute condition-to-condition variation from cell-to-cell variation. Here,
we report that Reduction and Insight in Single-cell Exploration (RISE), an adaptation of the
tensor decomposition method PARAFAC2, enables the dimensionality reduction and analysis of
single-cell data across conditions. We demonstrate the benefits of RISE across distinct
examples of single-cell RNA-sequencing experiments of peripheral immune cells: pharmacologic
drug perturbations and systemic lupus erythematosus patient samples. RISE enables
associations of gene variation patterns with patients or perturbations, while connecting
each coordinated change to single cells without requiring cell type annotations. The
theoretical grounding of RISE suggests a unified framework for many single-cell data
modeling tasks, while providing an intuitive dimensionality reduction approach for
multi-sample single-cell studies across biological contexts.
Abstract: Cytokines mediate cell-to-cell communication across the immune system and therefore are
critical to immunosurveillance in cancer and other diseases. Several cytokines show
dysregulated abundance or signaling responses in breast cancer, associated with the disease
and differences in survival and progression. Cytokines operate in a coordinated manner to
affect immune surveillance and regulate one another, necessitating a systems approach for a
complete picture of this dysregulation. Here, we profiled cytokine signaling responses of
peripheral immune cells from breast cancer patients as compared to healthy controls in a
multidimensional manner across ligands, cell populations, and responsive pathways. We find
alterations in cytokine responsiveness across pathways and cell types that are best defined
by integrated signatures across dimensions. Alterations in the abundance of a cytokine’s
cognate receptor do not explain differences in responsiveness. Rather, alterations in
baseline signaling and receptor abundance suggesting immune cell reprogramming are
associated with altered responses. These integrated features suggest a global reprogramming
of immune cell communication in breast cancer.
Abstract: Recent biological studies have been revolutionized in scale and granularity by multiplex
and high-throughput assays. Profiling cell responses across several experimental parameters,
such as perturbations, time, and genetic contexts, leads to richer and more generalizable
findings. However, these multidimensional datasets necessitate a reevaluation of the
conventional methods for their representation and analysis. Traditionally, experimental
parameters are merged to flatten the data into a two-dimensional matrix, sacrificing crucial
experiment context reflected by the structure. As Marshall McLuhan famously stated, “the
medium is the message.” In this work, we propose that the experiment structure is the
medium in which subsequent analysis is performed, and the optimal choice of data
representation must reflect the experiment structure. We review how tensor-structured
analyses and decompositions can preserve this information. We contend that tensor methods
are poised to become integral to the biomedical data sciences toolkit.
Censored Least Squares for Imputing Missing Values in PARAFAC Tensor Factorization.E. C. Hung, E. Hodzic, Z. C. Tan, & A. S. Meyer. (2024). BioRxiv [Preprint].[Abstract]
Abstract: Tensor factorization is a dimensionality reduction method applied to multidimensional
arrays. These methods are useful for identifying patterns within a variety of biomedical
datasets due to their ability to preserve the organizational structure of experiments and
therefore aid in generating meaningful insights. However, missing data in the datasets being
analyzed can impose challenges. Tensor factorization can be performed with some level of
missing data and reconstruct a complete tensor. However, while tensor methods may impute
these missing values, the choice of fitting algorithm may influence the fidelity of these
imputations. Previous approaches, based on alternating least squares with prefilled values
or direct optimization, suffer from introduced bias or slow computational performance. In
this study, we propose that censored least squares can better handle missing values with
data structured in tensor form. We ran censored least squares on four different biological
datasets and compared its performance against alternating least squares with prefilled
values and direct optimization. We used the error of imputation and the ability to infer
masked values to benchmark their missing data performance. Censored least squares appeared
best suited for the analysis of high-dimensional biological data by accuracy and convergence
metrics across several studies.
Abstract: The cytokine interleukin-2 (IL-2) has the potential to treat autoimmune disease but is
limited by its modest specificity toward immunosuppressive regulatory T (Treg) cells. IL-2
receptors consist of combinations of α, β, and γ chains of variable affinity and cell
specificity. Engineering IL-2 to treat autoimmunity has primarily focused on retaining
binding to the relatively Treg-selective, high-affinity receptor while reducing binding to
the less selective, low-affinity receptor. However, we found that refining the designs to
focus on targeting the high-affinity receptor through avidity effects is key to optimizing
Treg selectivity. We profiled the dynamics and dose dependency of signaling responses in
primary human immune cells induced by engineered fusions composed of either wild-type IL-2
or mutant forms with altered affinity, valency, and fusion to the antibody Fc region for
stability. Treg selectivity and signaling response variations were explained by a model of
multivalent binding and dimer-enhanced avidity—a combined measure of the strength, number,
and conformation of interaction sites—from which we designed tetravalent IL-2–Fc fusions
that had greater Treg selectivity in culture than do current designs. Biasing avidity toward
IL2Rα with an asymmetrical multivalent design consisting of one α/β chain–binding and
one α chain–binding mutant further enhanced Treg selectivity. Comparative analysis
revealed that IL2Rα was the optimal cell surface target for Treg selectivity, indicating
that avidity for IL2Rα may be the optimal route to producing IL-2 variants that selectively
target Tregs.