Research & Code

**Manifold Learning for Data-Driven Risk Management**

New supervised learning framework for high-dimensional, nonlinear processes that does not impose restrictive parametric assumptions.

**Graph Machine Learning for Asset Pricing**

Graph Neural Network formulation to incorporate supply chain information into asset pricing and stock return prediction. 

**Data-Driven Dynamic Factor Modeling via Manifold Learning**

(with Graeme Baker & Agostino Capponi)

We propose a data-driven dynamic factor framework where a response variable depends on a high-dimensional set of covariates, without imposing any parametric model on the joint dynamics. Leveraging Anisotropic Diffusion Maps, a nonlinear manifold learning technique introduced by Singer and Coifman, our framework uncovers the joint dynamics of the covariates and responses in a purely data-driven way. We approximate the embedding dynamics using linear diffusions, and exploit Kalman filtering to predict the evolution of the covariates and response variables directly from the diffusion map embedding space. We generalize Singer's convergence rate analysis of the graph Laplacian from the case of independent uniform samples on a compact manifold to the case of time series arising from Langevin diffusions in Euclidean space. Furthermore, we provide rigorous justification for our procedure by showing the robustness of approximations of the diffusion map coordinates by linear diffusions, and the convergence of ergodic averages under standard spectral assumptions on the underlying dynamics. We apply our method to the stress testing of equity portfolios using a combination of financial and macroeconomic factors from the Federal Reserve's supervisory scenarios. We demonstrate that our data-driven stress testing method outperforms standard scenario analysis and Principal Component Analysis benchmarks through historical backtests spanning three major financial crises, achieving reductions in mean absolute error of up to 55% and 39% for scenario-based portfolio return prediction, respectively.

Paper
Code

**Graph Machine Learning for Asset Pricing**

(with Agostino Capponi & Jiacheng Zou)

We propose a nonparametric method to aggregate rich firm characteristics over a large supply chain network to explain the cross-section of expected returns. Each target firm receives a nonlinearly constructed pricing signal passed from neighboring firms that are within d-hops on the supply chain network. Analyzing all US-listed stocks with supply chain data, our model achieves over 50% higher out-of-sample Sharpe ratios compared to models using only direct suppliers and consumers, outperforming Fama-French five-factor and principal component models. Through a graph-Monte Carlo experiment, we demonstrate the interplay between d and degree centrality, showing that the most central firms are twice as sensitive as peripheral firms. Our recommended d= 6 balances bias-variance and ensures robustness.

Paper
Code

Contact Info

  • Email: j.sidaoui@columbia.edu
  • Address: 500 W 120th St, New York, NY 10027
  • © 2025 J. Antonio Sidaoui

Free AI Website Creator