Research in data science /

Saved in:
Bibliographic Details
Author / Creator:Gasparovic, Ellen.
Imprint:Cham : Springer, 2019.
Description:1 online resource (302 pages)
Language:English
Series:Association for women in mathematics series ; v. 17
Association for Women in Mathematics Series.
Subject:
Format: E-Resource Book
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/11913430
Hidden Bibliographic Details
Other authors / contributors:Domeniconi, Carlotta.
ISBN:9783030115661
3030115666
3030115658
9783030115654
9783030115678
3030115674
Digital file characteristics:text file PDF
Notes:Includes bibliographical references.
Print version record.
Summary:This edited volume on data science features a variety of research ranging from theoretical to applied and computational topics. Aiming to establish the important connection between mathematics and data science, this book addresses cutting edge problems in predictive modeling, multi-scale representation and feature selection, statistical and topological learning, and related areas. Contributions study topics such as the hubness phenomenon in high-dimensional spaces, the use of a heuristic framework for testing the multi-manifold hypothesis for high-dimensional data, the investigation of interdisciplinary approaches to multi-dimensional obstructive sleep apnea patient data, and the inference of a dyadic measure and its simplicial geometry from binary feature data. Based on the first Women in Data Science and Mathematics (WiSDM) Research Collaboration Workshop that took place in 2017 at the Institute for Compuational and Experimental Research in Mathematics (ICERM) in Providence, Rhode Island, this volume features submissions from several of the working groups as well as contributions from the wider community. The volume is suitable for researchers in data science in industry and academia.
Other form:Print version: Gasparovic, Ellen. Research in Data Science. Cham : Springer, ©2019 9783030115654
Standard no.:10.1007/978-3-030-11566-1
Table of Contents:
  • Intro; Preface; The Cross-Disciplinary Field of Data Science; Project Descriptions; Contributed Papers; Acknowledgments; Contents; Sparse Randomized Kaczmarz for Support Recovery of Jointly Sparse Corrupted Multiple Measurement Vectors; 1 Introduction; 1.1 Problem Formulation; 2 Related and Existing Work; 2.1 Sparse Randomized Kaczmarz; 2.2 SRK for MMV; 3 Main Results; 3.1 Corrupted MMV; 4 Experiments; 5 Conclusion; References; The Hubness Phenomenon in High-Dimensional Spaces; 1 Introduction; 2 Background and Related Work; 2.1 The Hubness Phenomenon; 2.2 Intrinsic Dimensionality; 3 Datasets
  • 3.1 Synthetic Data; 3.1.1 Data in the Global Space; 3.1.2 Data in Subspaces; 3.2 Real Data; 4 Intrinsic Dimensionality via Hubness; 4.1 Skewness vs. Feature Ranking: How to Rank Features?; 4.2 Hubs and Subspaces; 5 Hubs, Density, and Clustering; 5.1 Hubness and Data Density; 5.2 Distances Between Points; 5.2.1 Results on Synthetic Data; 5.2.2 Results on Real Data; 5.2.3 Class Separation of Histograms; 5.3 Hubness and Purity; 5.3.1 Density vs. Purity; 5.4 Hubs and Seed Subspace Samples; 6 Conclusion and Proposed Research Directions; References
  • Heuristic Framework for Multiscale Testing of the Multi-Manifold Hypothesis; 1 Introduction; 1.1 Contributions; 1.2 Outline; 2 Related Work; 2.1 Manifold Learning; 2.2 The (Multi- )Manifold Hypothesis; 2.3 Quantitative Rectifiability; 2.4 Stratified Space Construction; 2.5 Intrinsic Dimension; 3 Methodology; 4 Implementation; 4.1 Variance-Based Local Intrinsic Dimension; 4.2 Nearest Neighbors-Based Methods: Local GMST; 4.3 Dyadic Linear Multi-Manifolds; 4.4 Estimating the Sum of Squared Distances Function: SQD; 5 Experimental Validation; 5.1 Use Case: Sphere-Line; 5.2 Use Case: LiDAR Data
  • 3.4 Prediction, Feature Selection, and Classification; 4 Statistical Analysis of Survey Data; 4.1 Data Pre-processing; 4.2 Clustering Analysis; 4.3 Identify Important Variables; 5 Conclusion and Future Research; Appendix; References; The ∞-Cophenetic Metric for Phylogenetic Trees As an Interleaving Distance; 1 Introduction; 2 Categorical Structures; 2.1 Categories with a Flow; 2.2 The Interleaving Distance Associated to a Category with a Flow; 2.3 Interleaving Distances on Thin Categories; 2.4 The ∞-Distance on Rn Is an Interleaving Distance; 3 Combinatorial Structures; 3.1 Merge Trees; 3.2 Merge Trees As Posets