Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data
Issue:
Volume 9, Issue 2, June 2023
Pages:
49-61
Received:
9 March 2023
Accepted:
2 April 2023
Published:
15 April 2023
Abstract: With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the selection of variables for main effects. However, when modeling many important practical problems, the main effects alone may not be enough to describe the relationship between the response variable and the predictor variable. Therefore, the variable selection problem with interaction terms under high-dimensional data is more meaningful. Based on this, this article focus on the robust estimation for semi-parametric models with interactions in high-dimensional data under the framework of mode regression. And the two-stage regularization method is applied to implement variable selection with high-dimensional data. At Stage 1, using the B-spline basic function to approximate the non-parametric function. Both parametric and non-parametric components were selected simultaneously based on mode regression and the adaptive least absolute shrinkage and selection operator (LASSO) estimation. At Stage 2, the model variables are composed of the selected variables at Stage 1 and interaction terms are derived from the main effects. To maintain the heredity structure between main effects of linear part and interaction effects, we only selected the interaction terms to obtain important interaction effects. Then, under proper regularization conditions, oracle properties of variable selection and the consistency of the hierarchical structure are proved. Numerical results are also shown to demonstrate performance of the methods.
Abstract: With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the ...
Show More