Librería Portfolio Librería Portfolio

Búsqueda avanzada

TIENE EN SU CESTA DE LA COMPRA

0 productos

en total 0,00 €

STATISTICAL FOUNDATIONS OF DATA SCIENCE
Título:
STATISTICAL FOUNDATIONS OF DATA SCIENCE
Subtítulo:
Autor:
FAN, J
Editorial:
CRC PRESS
Año de edición:
2020
Materia
BASES DE DATOS - OTROS TEMAS
ISBN:
978-1-4665-1084-5
Páginas:
774
123,76 €

 

Sinopsis

Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications.

The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.



Table of Contents

I. Introduction

Rise of Big Data and Dimensionality

Biological Sciences

Health Sciences

Computer and Information Sciences

Economics and Finance

Business and Program Evaluation

Earth Sciences and Astronomy

Impact of Big Data

Impact of Dimensionality

Computation

Noise Accumulation

Spurious Correlation

Statistical theory

Aim of High-dimensional Statistical Learning

What big data can do

Scope of the book

2. Multiple and Nonparametric Regression

Introduction

Multiple Linear Regression

The Gauss-Markov Theorem

Statistical Tests

Weighted Least-Squares

Box-Cox Transformation

Model Building and Basis Expansions

Polynomial Regression

Spline Regression

Multiple Covariates

Ridge Regression

Bias-Variance Tradeo

Penalized Least Squares

Bayesian Interpretation

Ridge Regression Solution Path

Kernel Ridge Regression

Regression in Reproducing Kernel Hilbert Space

Leave-one-out and Generalized Cross-validation

Exercises

3. Introduction to Penalized Least-Squares

Classical Variable Selection Criteria

Subset selection

Relation with penalized regression

Selection of regularization parameters

Folded-concave Penalized Least Squares

Orthonormal designs

Penalty functions

Thresholding by SCAD and MCP

Risk properties

Characterization of folded-concave PLS

Lasso and L Regularization

Nonnegative garrote

Lasso

Adaptive Lasso

Elastic Net

Dantzig selector

SLOPE and Sorted Penalties

Concentration inequalities and uniform convergence

A brief history of model selection

Bayesian Variable Selection

Bayesian view of the PLS

A Bayesian framework for selection

Numerical Algorithms

Quadratic programs

Least angle regression_

Local quadratic approximations

Local linear algorithm

Penalized linear unbiased selection_

Cyclic coordinate descent algorithms

Iterative shrinkage-thresholding algorithms

Projected proximal gradient method

ADMM

Iterative Local Adaptive Majorization and Minimization

Other Methods and Timeline

Regularization parameters for PLS

Degrees of freedom

Extension of information criteria

Application to PLS estimators

Residual variance and refitted cross-validation

Residual variance of Lasso

Refitted cross-validation

Extensions to Nonparametric Modeling

Structured nonparametric models

Group penalty

Applications

Bibliographical notes

Exercises

4. Penalized Least Squares: Properties

Performance Benchmarks

Performance measures

Impact of model uncertainty

Bayes lower bounds for orthogonal design

Minimax lower bounds for general design

Performance goals, sparsity and sub-Gaussian noise

Penalized L Selection

Lasso and Dantzig Selector

Selection consistency

Prediction and coefficient estimation errors

Model size and least squares after selection

Properties of the Dantzig selector

Regularity conditions on the design matrix

Properties of Concave PLS

Properties of penalty functions

Local and oracle solutions

Properties of local solutions

Global and approximate global solutions

Smaller and Sorted Penalties

Sorted concave penalties and its local approximation

Approximate PLS with smaller and sorted penalties

Properties of LLA and LCA

Bibliographical notes

Exercises

5. Generalized Linear Models and Penalized Likelihood

Generalized Linear Models

Exponential family

Elements of generalized linear models

Maximum likelihood

Computing MLE: Iteratively reweighed least squares

Deviance and Analysis of Deviance

Residuals

Examples

Bernoulli and binomial models

Models for count responses

Models for nonnegative continuous responses

Normal error models

Sparest solution in high confidence set

A general setup

Examples

Properties

Variable Selection via Penalized Likelihood

Algorithms

Local quadratic approximation

Local linear approximation

Coordinate descent

Iterative Local Adaptive Majorization and Minimization

Tuning parameter selection

An Application

Sampling Properties in low-dimension

Notation and regularity conditions

The oracle property

Sampling Properties with Diverging Dimensions

Asymptotic properties of GIC selectors

Properties under Ultrahigh Dimensions

The Lasso penalized estimator and its risk property

Strong oracle property

Numeric studies

Risk properties

Bibliographical notes

Exercises

6. Penalized M-estimators

Penalized quantile regression

Quantile regression

Variable selection in quantile regression

A fast algorithm for penalized quantile regression

Penalized composite quantile regression

Variable selection in robust regression

Robust regression

Variable selection in Huber regression

Rank regression and its variable selection

Rank regression

Penalized weighted rank regression

Variable Selection for Survival Data

Partial likelihood

Variable selection via penalized partial likelihood and its properties

Theory of folded-concave penalized M-estimator

Conditions on penalty and restricted strong convexity

Statistical accuracy of penalized M-estimator with

folded concave penalties

Computational accuracy

Bibliographical notes

Exercises

7. High Dimensio