Research & Code

**Manifold Learning for Data-Driven Risk Management**

New supervised learning framework for high-dimensional, nonlinear processes that does not impose restrictive parametric assumptions.

**Graph Machine Learning for Asset Pricing**

Graph Neural Network formulation to incorporate supply chain information into asset pricing and stock return prediction. 

**Non-Stationarity in Financial Return Prediction**

New model selection methodology for return prediction in non-stationary environments and introduction of the fundamental non-stationarity-complexity tradeoff.

**Graph Machine Learning for Asset Pricing**

(with Agostino Capponi & Jiacheng Zou)
Revise and Resubmit, Journal of Financial Economics

We propose a nonparametric method to aggregate rich firm characteristics over a large supply chain network to explain the cross-section of expected returns. Each target firm receives a nonlinearly constructed pricing signal passed from neighboring firms that are within d-hops on the supply chain network. Analyzing all US-listed stocks with supply chain data, our model achieves over 50% higher out-of-sample Sharpe ratios compared to models using only direct suppliers and consumers, outperforming Fama-French five-factor and principal component models. Through a graph-Monte Carlo experiment, we demonstrate the interplay between d and degree centrality, showing that the most central firms are twice as sensitive as peripheral firms. Our recommended d= 6 balances bias-variance and ensures robustness.

Paper
Code

**Data-Driven Dynamic Factor Modeling via Manifold Learning**

(with Graeme Baker & Agostino Capponi)

We propose a data-driven dynamic factor framework where a response variable depends on a high-dimensional set of covariates, without imposing any parametric model on the joint dynamics. Leveraging Anisotropic Diffusion Maps, a nonlinear manifold learning technique introduced by Singer and Coifman, our framework uncovers the joint dynamics of the covariates and responses in a purely data-driven way. We approximate the embedding dynamics using linear diffusions, and exploit Kalman filtering to predict the evolution of the covariates and response variables directly from the diffusion map embedding space. We generalize Singer's convergence rate analysis of the graph Laplacian from the case of independent uniform samples on a compact manifold to the case of time series arising from Langevin diffusions in Euclidean space. Furthermore, we provide rigorous justification for our procedure by showing the robustness of approximations of the diffusion map coordinates by linear diffusions, and the convergence of ergodic averages under standard spectral assumptions on the underlying dynamics. We apply our method to the stress testing of equity portfolios using a combination of financial and macroeconomic factors from the Federal Reserve's supervisory scenarios. We demonstrate that our data-driven stress testing method outperforms standard scenario analysis and Principal Component Analysis benchmarks through historical backtests spanning three major financial crises, achieving reductions in mean absolute error of up to 55% and 39% for scenario-based portfolio return prediction, respectively.

Paper
Code

**The Nonstationarity-Complexity Tradeoff in Return Prediction**

(with Agostino Capponi, Chengpiao Huang, Kaizheng Wang & Jiacheng Zou)

We investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff: complex models reduce misspecification error but require longer training windows that introduce stronger nonstationarity. We resolve this tension with a novel model selection method that jointly optimizes model class and training window size using a tournament procedure that adaptively evaluates candidates on non-stationary validation data. Our theoretical analysis demonstrates that this
approach balances misspecification error, estimation variance, and non-stationarity, performing close to the best model in hindsight.
Applying our method to 17 industry portfolio returns, we consistently outperform standard rolling-window benchmarks, improving out-of-sample R^2 by 14–23% on average. During NBER-designated recessions, improvements are substantial: our method achieves positive R^2 during the Gulf War recession while benchmarks are negative, and improves R^2 in absolute terms by at least 80bps during the 2001 recession as well as superior performance during the 2008 Financial Crisis. Economically, a trading strategy based on our selected model generates 31% higher
cumulative returns averaged across the industries.

Paper
Code
(Coming Soon)

Contact Info

  • Email: j.sidaoui@columbia.edu
  • Address: 500 W 120th St, New York, NY 10027
  • © 2025 J. Antonio Sidaoui

Mobirise