Scholar Data

Inference for Low-Rank Models Without Estimating the Rank

This article studies the inference about linear functionals of high-dimensional low-rank matrices. While most existing inference methods would require consistent estimation of the true rank, our procedure is robust to rank misspecification, making it a promising approach in applications where rank estimation can be unreliable. We estimate the low-rank spaces using pre-specified weighting matrices, known as diversified projections. A novel statistical insight is that, unlike the usual statistical wisdom that overfitting mainly introduces additional variances, the over-estimated low-rank space also gives rise to a non-negligible bias due to an implicit ridge-type regularization. We develop a new inference procedure and show that the central limit theorem holds as long as the pre-specified rank is no smaller than the true rank. In one of our applications, we study multiple testing with incomplete data in the presence of confounding factors and show that our method remains valid as long as the number of controlled confounding factors is at least as large as the true number, even when no confounding factors are present. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

Authors

Choi, Jungjun ;
Kwon, Hyukjun ;
Liao, Yuan

1 Citation0 Mentions15% FAIR0.7 Dataset Index

10.6084/m9.figshare.29723099January 2025

Inference for Low-rank Models without Estimating the Rank

This paper studies the inference about linear functionals of high-dimensional low-rank matrices. While most existing inference methods would require consistent estimation of the true rank, our procedure is robust to rank misspecification, making it a promising approach in applications where rank estimation can be unreliable. We estimate the low-rank spaces using pre-specified weighting matrices, known as diversified projections. A novel statistical insight is that, unlike the usual statistical wisdom that overfitting mainly introduces additional variances, the over-estimated low-rank space also gives rise to a non-negligible bias due to an implicit ridge-type regularization. We develop a new inference procedure and show that the central limit theorem holds as long as the pre-specified rank is no smaller than the true rank. In one of our applications, we study multiple testing with incomplete data in the presence of confounding factors and show that our method remains valid as long as the number of controlled confounding factors is at least as large as the true number, even when no confounding factors are present.

Authors

Choi, Jungjun ;
Kwon, Hyukjun ;
Liao, Yuan

1 Citation0 Mentions13% FAIR0.7 Dataset Index

10.6084/m9.figshare.29723099.v1January 2025

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models

This article develops an inferential framework for matrix completion when missing is not at random and without the requirement of strong signals. Our development is based on the observation that if the number of missing entries is small enough compared to the panel size, then they can be estimated well even when missing is not at random. Taking advantage of this fact, we divide the missing entries into smaller groups and estimate each group via nuclear norm regularization. In addition, we show that with appropriate debiasing, our proposed estimate is asymptotically normal even for fairly weak signals. Our work is motivated by recent research on the Tick Size Pilot Program, an experiment conducted by the Security and Exchange Commission (SEC) to evaluate the impact of widening the tick size on the market quality of stocks from 2016 to 2018. While previous studies were based on traditional regression or difference-in-difference methods by assuming that the treatment effect is invariant with respect to time and unit, our analyses suggest significant heterogeneity across units and intriguing dynamics over time during the pilot program. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

Authors

Choi, Jungjun ;
Yuan, Ming

1 Citation0 Mentions13% FAIR0.5 Dataset Index

10.6084/m9.figshare.26319010January 2024

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models*

This paper develops an inferential framework for matrix completion when missing is not at random and without the requirement of strong signals. Our development is based on the observation that if the number of missing entries is small enough compared to the panel size, then they can be estimated well even when missing is not at random. Taking advantage of this fact, we divide the missing entries into smaller groups and estimate each group via nuclear norm regularization. In addition, we show that with appropriate debiasing, our proposed estimate is asymptotically normal even for fairly weak signals. Our work is motivated by recent research on the Tick Size Pilot Program, an experiment conducted by the Security and Exchange Commission (SEC) to evaluate the impact of widening the tick size on the market quality of stocks from 2016 to 2018. While previous studies were based on traditional regression or difference-in-difference methods by assuming that the treatment effect is invariant with respect to time and unit, our analyses suggest significant heterogeneity across units and intriguing dynamics over time during the pilot program.

Authors

Choi, Jungjun ;
Yuan, Ming

1 Citation0 Mentions13% FAIR0.7 Dataset Index

10.6084/m9.figshare.26319010.v1January 2024

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models

Authors

Choi, Jungjun ;
Yuan, Ming

1 Citation0 Mentions13% FAIR0.7 Dataset Index

10.6084/m9.figshare.26319010.v2January 2024

Automated Author Profile
Choi, Jungjun

Choi, Jungjun

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Inference for Low-Rank Models Without Estimating the Rank

Inference for Low-rank Models without Estimating the Rank

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models*

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models

Automated Author ProfileChoi, Jungjun

Choi, Jungjun

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Inference for Low-Rank Models Without Estimating the Rank

Inference for Low-rank Models without Estimating the Rank

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models*

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models

Automated Author Profile
Choi, Jungjun