Tomasz Kacprzak
I am a Humboldt Experienced Research Fellow at the University Observatory Munich, Faculty of Physics at the Ludwig-Maximilians-University. Previously, I was a Senior Data Scientist at the Swiss Data Science Center at the Paul Scherrer Institute and a Senior Scientist at ETH Zurich. I obtained my PhD in Physics and Astronomy from the University College London, as well as previously a MSc in Machine Learning from the same university.
Scalable Approximate Algorithms for Optimal Transport Linear Models, under review in Journal of Machine Learning Research, 2504.04609.
SHAM-OT: Rapid Subhalo Abundance Matching with Optimal Transport, Monthly Notices of the Royal Astronomical Society Letters, Volume 542, Issue 1, 2502.17553.
DeepLSS: breaking parameter degeneracies in large scale structure with deep learning analysis of combined probes, published in Physical Review X, Phys. Rev. X 12 031029, 2022, and promoted in the APS Physics Magazine.
Optimal transport (OT) is a powerful framework for solving differenty types of data science problems, such as matching, domain shifts, generative modelling, and comparison of shapes. Despite its complexity, simple and scalable algorithms exist for solving it. This makes OT suitable for large-scale problems and modern HPC architectures. In my work, I introduced OT to several application areas in applied physics, as well as proposed a new model for OT-based regression.
I lead the collaborative project “Robust and scalable Machine Learning algorithms for Laue 3-Dimensional Neutron Diffraction Tomography” at the PSI. Our novel approach to indexing of polycrystalline diffraction patterns from neutron tomography is based on optimal transport. It is capable of analyzing 10x larger samples with 100x gain in speed, compared to previous methods. The paper Laue Indexing with Optimal Transport is under review in IEEE PAMI. The upcoming package LaueOT will avalilable at Github.
I am the lead data scientist for the collaborative project between SDSC and the PSI Center for Neutron and Muon Research, called “Smart Analysis of MUonic x-Rays with Artificial Intelligence”. This project analyses muonicy spectra obtained by the state-of-the-art moun-induced X-ray emmission instrument MIXE to infer the chemical composition of various samples, such as alloys, batteries, archeological artifacts. As part of this project, I developed novel scalable Sinkhorn-like algorithms for linear regression with optimal transport cost functions 2504.04609.
I proposed optimal transport algorithms for the matching galaxies and dark matter halos in cosmological simulations (SHAM-OT). In this work, we re-formulated the Subhalo Abundance Matching problem as optimal transport and solved it using fast and scalable OT solvers.
Artificial Intelligence methods, such as deep convolutional neural networks, have the capacity to model the complex patterns contained in the cosmic web. I have introduced the deep learning approaches to constraining cosmological parameters and generating large scale structure simulations. I demonstrated that the AI-based analysis can achieve 40% improvement in measurement precision, a gain equivalent to using 2x more survey data with conventional methods.
I performed the first cosmological analysis with deep learning, on KiDS-450 dataset (Phys. Rev. D 2019, 100, 063514, with Janis Fluri), which was promoted by ETH News (18/09/2019) and MIT Techology Review (19/09/2019).
I was the PI of the production project “Measuring Dark Energy with Deep Learning” at the Swiss Supercomputing Center (CSCS), producing the CosmoGrid simulations, available at www.cosmogrid.ai.
I was the PI of the “Deep Learning for Observational Cosmology” programme at the Swiss Data Science Center (SDSC).
I was the lead organizer for the workshop “Artificial Intelligence Methods in Cosmology”, held in Monte Verita, Ascona, 9-12 June 2019.
I am a Builder of the Dark Energy Survey, the largest ground-based cosmological observational survey to date. This program has delivered the most precise cosmological parameter measurements from large scale structure of the universe to date. I have been involved in DES since 2012, with the following contributions:
As the Simulations Working Group coordinator and a memeber of the Science Committee (2022-), I organize new projects and collaboration in the area of simulation-based inference and provide CosmoGridV1 simulations, where we aim to further increase the precision of measurements from the expensive DES dataset with simulations-based inference, as well as making it more robust to systematic errors.
I led the first simulation-based inference cosmology measurement with DES, using shear peaks and the Science Verification dataset (MNRAS 2016, 463, 4), followed by the full survey area analysis in DES Year 3 (MNRAS 2022, 511, 2). This analysis improved on the main DES measurement by 30%, while using low-resolution maps.
I worked extensively on galaxy shape measurements for weak gravitational lensing, image simulations, and noise biases in shear calibrations, and contributed significantly to DES Science Verification (SV) weak lensing analysis (MNRAS 2016, 460, 2). This work enabled reliable shape measurements for DES-SV cosmology.
In photometric surveys, the distances to galaxies are inferred from galaxy colors by matching them to galaxies found in previous spectroscopic surveys. While this approach has many successes for closeby galaxies, where spectroscopic data is available in abundance, it can be difficult to reliably apply to far-away galaxies. This is due to our lack of understanding of the population these high-redshift galaxies, as well as their evolution over cosmic time. Difficulties with modelling selection functions for spectroscopic surveys further complicates this problem. An alternative is to use a Monte Carlo Control Loop (MCCL) approach, inspired by approaches in particle physics. MCCL uses physically-motivated parametric models for galaxy properties evolution, as well as very precise simulations of the telescope and its selection functions. This allows us to achieve the same precision of redshift measurement without using spectroscopy of high-redshift galaxies.
I developed first forward-modelling, simulations-based approach to measuring redshifts of galaxy samples from wide-band photometry alone (JCAP 2017, 08, 035, with Joerg Herbel), which uses Approximate Bayesian Computation (ABC) and utilizes high-performance computing platforms.
I led the team at the ETH Zurich that applied the simulations-based inference methodology to the Dark Energy Survey Year 1 cosmology. We performed first simulation-based joint measurement of cosmological lensing shear power spectra and the redshift distributions of the galaxy samples. The ETH team created a full analysis pipeline, from image pixels to cosmology parameter constraints (Phys. Rev. D 2020, 101, 082003).
I gave >20 invited talks at international conferences, workshops, and university colloquia. Some of them are available online.
Invited keynote talk at Bayesian deep learning for cosmology and time domain astrophysics, Paris, France, June 20-24 2022 (AstroDeep22)
Other recorded talks include:
I am producing and maintaining a number of datasets and software packages.
I co-created DeepSphere, a graph-based neural network architecture for analysing data on the sphere (Astr. Comp. 2019, 27).
I published the DeepLSS code from the PRX paper Phys. Rev. X 12 031029, 2022.
I am developing the LaueOT package, a toolbox for fast analysis of tomographic diffraction patters from polycrystalline samples.
The code for optimal transport linear models is available in the OTLM repository.
If you are a masters student in cosmology, computer science or statistics, and are interested in a project, please send me an email.