Traditional data analysis methods organize data into matrix form—a two-dimensional (2D) grid of numbers wherein each column is a measurement and each row is an observation (e.g., genes by subjects). However, this approach overlooks how measurements are often systematically collected in biology. For example, measurements to understand the molecular response of cells to therapy might be collected over concentrations of drug, time, different sources of cells, and molecular features. In these cases, the data can be organized into a multidimensional (e.g., 4D) form. Generalization of statistical tools into these multidimensional/tensor forms exist, but their use has only begun to catch on in studies of biology and medicine because there is a lack of (1) knowledge about their benefits, (2) practical and useful implementations, and (3) algorithms for specific challenges that arise with biological data. By applying these techniques, developing new algorithms, and providing accessible implementations, we are making these tools available in biomedical research.
Abstract: Cytokines mediate cell-to-cell communication across the immune system and therefore are critical to immunosurveillance in cancer and other diseases. Several cytokines show dysregulated abundance or signaling responses in breast cancer, associated with the disease and differences in survival and progression. Cytokines operate in a coordinated manner to affect immune surveillance and regulate one another, necessitating a systems approach for a complete picture of this dysregulation. Here, we profiled cytokine signaling responses of peripheral immune cells from breast cancer patients as compared to healthy controls in a multidimensional manner across ligands, cell populations, and responsive pathways. We find alterations in cytokine responsiveness across pathways and cell types that are best defined by integrated signatures across dimensions. Alterations in the abundance of a cytokine's cognate receptor do not explain differences in responsiveness. Rather, alterations in baseline signaling and receptor abundance suggesting immune cell reprogramming are associated with altered responses. These integrated features suggest a global reprogramming of immune cell communication in breast cancer.
Abstract: Recent biological studies have been revolutionized in scale and granularity by multiplex and high-throughput assays. Profiling cell responses across several experimental parameters, such as perturbations, time, and genetic contexts, leads to richer and more generalizable findings. However, these multidimensional datasets necessitate a reevaluation of the conventional methods for their representation and analysis. Traditionally, experimental parameters are merged to flatten the data into a two-dimensional matrix, sacrificing crucial experiment context reflected by the structure. As Marshall McLuhan famously stated, "the medium is the message." In this work, we propose that the experiment structure is the medium in which subsequent analysis is performed, and the optimal choice of data representation must reflect the experiment structure. We review how tensor-structured analyses and decompositions can preserve this information. We contend that tensor methods are poised to become integral to the biomedical data sciences toolkit.
Censored least squares for imputing missing values in PARAFAC tensor factorization.
Hung E., Hodzic E., Tan Z., Meyer A.
bioRxiv [Preprint], 2024.
[Abstract]
Abstract: Tensor factorization is a dimensionality reduction method applied to multidimensional arrays. These methods are useful for identifying patterns within a variety of biomedical datasets due to their ability to preserve the organizational structure of experiments and therefore aid in generating meaningful insights. However, missing data in the datasets being analyzed can impose challenges. Tensor factorization can be performed with some level of missing data and reconstruct a complete tensor. However, while tensor methods may impute these missing values, the choice of fitting algorithm may influence the fidelity of these imputations. Previous approaches, based on alternating least squares with prefilled values or direct optimization, suffer from introduced bias or slow computational performance. In this study, we propose that censored least squares can better handle missing values with data structured in tensor form. We ran censored least squares on four different biological datasets and compared its performance against alternating least squares with prefilled values and direct optimization. We used the error of imputation and the ability to infer masked values to benchmark their missing data performance. Censored least squares appeared best suited for the analysis of high-dimensional biological data by accuracy and convergence metrics across several studies.
Abstract: The cytokine interleukin-2 (IL-2) has the potential to treat autoimmune disease but is limited by its modest specificity toward immunosuppressive regulatory T (Treg) cells. IL-2 receptors consist of combinations of α, β, and γ chains of variable affinity and cell specificity. Engineering IL-2 to treat autoimmunity has primarily focused on retaining binding to the relatively Treg-selective, high-affinity receptor while reducing binding to the less selective, low-affinity receptor. However, we found that refining the designs to focus on targeting the high-affinity receptor through avidity effects is key to optimizing Treg selectivity. We profiled the dynamics and dose dependency of signaling responses in primary human immune cells induced by engineered fusions composed of either wild-type IL-2 or mutant forms with altered affinity, valency, and fusion to the antibody Fc region for stability. Treg selectivity and signaling response variations were explained by a model of multivalent binding and dimer-enhanced avidity---a combined measure of the strength, number, and conformation of interaction sites---from which we designed tetravalent IL-2--Fc fusions that had greater Treg selectivity in culture than do current designs. Biasing avidity toward IL2Rα with an asymmetrical multivalent design consisting of one α/β chain--binding and one α chain--binding mutant further enhanced Treg selectivity. Comparative analysis revealed that IL2Rα was the optimal cell surface target for Treg selectivity, indicating that avidity for IL2Rα may be the optimal route to producing IL-2 variants that selectively target Tregs.
Abstract: Cell-cell communication (CCC) mediates coordinated cellular activities that vary dynamically across time, location, and biological context. While various tools exist to infer CCC, they typically aggregate data according to pre-defined cell types, obscuring critical single-cell heterogeneity. Furthermore, because signaling pathways and cell populations operate in a coordinated manner, an integrative analytical approach is essential. To address these challenges, we developed CCC-RISE, an extension of the tensor-based method Reduction and Insight in Single-cell Exploration (RISE). CCC-RISE identifies integrative patterns of single-cell variation by deconvolving communication into interpretable modules defined by unique sender cells, receiver cells, ligands, and condition associations. We applied this framework to a COVID-19 cohort with varying disease severity and a lung transplant cohort with acute allograft dysfunction. In both contexts, CCC-RISE successfully identified disease-relevant communication programs and traced them to specific cellular subpopulations, often crossing conventional cell-type boundaries. This approach offers a robust pipeline enabling the identification of disease-relevant signaling subpopulations that are invisible to aggregate methods.
Abstract: Effective exploration and analysis tools are vital for the extraction of insights from single-cell data. However, current techniques for modeling single-cell studies performed across experimental conditions (e.g. samples) require restrictive assumptions or do not adequately deconvolute condition-to-condition variation from cell-to-cell variation. Here, we report that Reduction and Insight in Single-cell Exploration (RISE), an adaptation of the tensor decomposition method PARAFAC2, enables the dimensionality reduction and analysis of single-cell data across conditions. We demonstrate the benefits of RISE across distinct examples of single-cell RNA-sequencing experiments of peripheral immune cells: pharmacologic drug perturbations and systemic lupus erythematosus patient samples. RISE enables associations of gene variation patterns with patients or perturbations, while connecting each coordinated change to single cells without requiring cell type annotations. The theoretical grounding of RISE suggests a unified framework for many single-cell data modeling tasks, while providing an intuitive dimensionality reduction approach for multi-sample single-cell studies across biological contexts.