Laboratoire des Méthodes Formelles (LMF)
Inria team QuaCS
Email: marc.baboulin [at] upsaclay.fr
Research Interests
 Quantum computing and simulation
 Highperformance scientific computing
 Numerical algorithms and software
Papers

O. Koska, M. Baboulin, A. Gazda
A treeapproach Pauli decomposition algorithm with application to quantum computing.
To appear in the Proceedings of the International Supercomputing Conference (ISC 2024).

T. Goubault de Brugière, M. Baboulin, B. Valiron, S. Martiel, C. Allouche
Decoding techniques applied to the compilation of CNOT circuits for NISQ architectures.
Science of Computer Programming, Vol. 214:102726 (2022).
 G. He, S. Vialle, M. Baboulin
Parallel and accurate kmeans algorithm on CPUGPU architectures for spectral clustering.
Concurrency and Computation: Practice and Experience , Vol. 34,
No 14 (2022),
PDF File.

T. Goubault de Brugière, M. Baboulin, B. Valiron, S. Martiel, C. Allouche
Gaussian elimination versus greedy methods for the synthesis of linear reversible circuits.
ACM Transactions on Quantum Computing, Vol.2, No 3, pp. 126 (2021).

T. Goubault de Brugière, M. Baboulin, B. Valiron, S. Martiel, C. Allouche
Reducing the depth of linear reversible quantum circuits.
IEEE Transactions on Quantum Engineering, Vol. 2, pp. 122 (2021).
 G. He, S. Vialle, N. Sylvestre, M. Baboulin
Scalable Algorithms Using Sparse Storage for Parallel Spectral Clustering on GPU.
Lecture Notes in Computer Science, SpringerVerlag, Vol. 13152, pp. 4052 (2021),
 G. He, S. Vialle, M. Baboulin
Parallelization of the kmeans Algorithm in a Spectral Clustering Chain on CPUGPU Platforms.
Lecture Notes in Computer Science, SpringerVerlag, Vol. 12480, pp. 135147 (2020),
PDF File.

T. Goubault de Brugière, M. Baboulin, B. Valiron, S. Martiel, C. Allouche
Quantum CNOT circuits synthesis for NISQ architectures using the syndrome decoding problem.
Proceedings of the 12th Conference on Reversible Computation
(RC 2020).
Lecture Notes in Computer Science, Springer, Vol. 12227, pp. 189205 (2020), PDF File.

T. Goubault de Brugière, M. Baboulin, B. Valiron, C. Allouche
Quantum circuits synthesis using Householder transformations.
Computer Physics Communications, Vol. 248, p. 107001 (2020),
PDF File.

T. Goubault de Brugière, M. Baboulin, B. Valiron, C. Allouche
Synthesizing quantum circuits via numerical optimization.
Proceedings of the International Conference on Computational Science (ICCS 2019).
Lecture Notes in Computer Science, Springer, Vol. 11537, pp. 316 (2019), PDF File.
 I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, J. Dongarra
Algorithms and optimization techniques for highperformance matrixmatrix multiplications of very small matrices.
Parallel Computing , Vol. 81, pp. 121 (2019),
PDF File.
 Gary W. Howell and Marc Baboulin
Iterative Solution of Sparse Linear Least Squares using LU Factorization.
Proceedings of the International Conference on High Performance Computing in AsiaPacific Region (HPC Asia 2018).
ACM digital library, pp. 4753 (2018),
PDF File.

C. Allouche, M. Baboulin, T. Goubault de Brugière, B. Valiron
Reuse method for quantum circuit synthesis.
Recent Advances in Mathematical and Statistical Methods, SpringerVerlag, Vol. 259, pp. 312 (2018),
PDF File.
 Evan Coleman, Aygul Jamal, M. Baboulin, Amal Khabou, Masha Sosonkina
A Comparison of SoftFault Error Models in the Parallel Preconditioned Flexible GMRES.
Proceedings of the 12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017).
Lecture Notes in Computer Science, SpringerVerlag, Vol. 10777, pp. 3646 (2017),
PDF File.
 M. Baboulin, J. Dongarra, A. Rémy, S. Tomov, I. Yamazaki
Solving Dense Symmetric Indefinite Systems using GPUs.
Concurrency and Computation: Practice and Experience , Vol. 29, No 9 (2017),
PDF File.
 H. Anzt, M. Baboulin, J. Dongarra, Y. Fournier, F. Hulsemann, A. Khabou, Y. Wang
Accelerating the conjugate gradient algorithm with GPU in CFD Simulations.
Proceedings of the International Conference on Vector and Parallel Processing (VecPar 2016).
Lecture Notes in Computer Science, SpringerVerlag, Vol. 10150, pp. 3543 (2016),
PDF File.
 I. Masliah, M. Baboulin, J. Falcou
Metaprogramming and multistage programming for GPGPUs.
Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Manycore SystemsonChip (MCSOC 2016).
IEEE Xplore Digital Library, pp. 369376 (2016),
PDF File.
 A. Jamal, M. Baboulin, A. Khabou, M. Sosonkina
A hybrid CPU/GPU approach for the parallel algebraic recursive multilevel solver pARMS.
Proceedings of the 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2016).
IEEE Xplore Digital Library, pp. 411416 (2016),
PDF File.
 I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, J. Dongarra
HighPerformance MatrixMatrix Multiplications of Very Small Matrices.
Proceedings of EuroPar 2016.
Lecture Notes in Computer Science, SpringerVerlag, Vol. 9833, pp. 659671 (08/2016), PDF File.

A. Abdelfattah, M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, S. Tomov
HighPerformance Tensor Contractions for GPUs.
Proceedings of the International Conference on Computational Science, ICCS 2016.
Procedia Computer Science, Elsevier, Vol. 80, pp. 108118 (06/2016),
PDF File.
 M. Baboulin, J. Dongarra, A. Rémy, S. Tomov, I. Yamazaki
Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures.
Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015).
Lecture Notes in Computer Science, SpringerVerlag, Vol. 9573, pp. 8695 (2016),
PDF File.
 G. W. Howell, M. Baboulin
LU Preconditioning for Overdetermined Sparse Least Squares Problems.
Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015).
Lecture Notes in Computer Science, SpringerVerlag, Vol. 9573, pp. 128137(2016),
PDF File.
 M. Baboulin, A. Jamal, M. Sosonkina
Using Random Butterfly Transformations in Parallel Schur ComplementBased Preconditioning.
Proceedings of the 2015 Federated Conference on Computer Science and Information Systems (FedCSIS 2015).
Vol. 5, pp. 649654 (2015),
PDF File.
 M. Baboulin, A. Khabou, A. Rémy
A randomized LUbased solver using GPU and Intel Xeon Phi accelerators.
Proceedings of the EuroPar 2015 workshop
``HeteroPar  Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms''.
Lecture Notes in Computer Science, SpringerVerlag, Vol. 9523, pp. 175184 (2015),
PDF File.
 I. Masliah, M. Baboulin, J. Falcou
Metaprogramming dense linear algebra solvers.
Applications to multi and manycore architectures.
Proceedings of the 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA15).
IEEE Xplore Digital Library, Vol. 3, pp. 6976 (2015),
PDF File.
 M. Baboulin, J. Dongarra, R. Lacroix
Computing least squares condition numbers on hybrid multicore/GPU systems.
Interdisciplinary Topics in Applied Mathematics, Modeling and Computational Science, Vol. 117, pp. 3541 (2015),
PDF File.
 M. Baboulin, X. S. Li, FH. Rouet
Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods.
Proceedings of the International Conference on Vector and Parallel Processing (VecPar 2014).
Lecture Notes in Computer Science, SpringerVerlag, Vol. 8969, pp. 135144 (2014),
PDF File.
 G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. Malony, Z. Chamski, D. Novillo, D. Del Vento
Collective mind: Towards practical and collaborative autotuning.
Scientific Programming, IOS Press, Vol. 22, No 4, pp. 309329 (2014),
PDF File.
 M. Baboulin, D. Becker, G. Bosilca, A. Danalis, J. Dongarra
An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems.
Parallel Computing , Vol. 40, No 7, pp. 213223 (2014),
PDF File.
 Y. Wang, M. Baboulin, K. Rupp, O. Le Maître, Y. Fraigneau
Solving 3D incompressible NavierStokes equations on hybrid CPU/GPU systems.
Proceedings of the 22nd High Performance Computing Symposium (HPC'14).
ACM digital library, article 12 (2014),
PDF File.
 Adrien Rémy, M. Baboulin, M. Sosonkina, B. Rozoy
Locality optimization on a NUMA architecture for hybrid LU factorization.
Proceedings of the International Conference on Parallel Computing, PARCO 2013.
Advances in Parallel Computing, IOS Press, Vol. 25, pp. 153162 (2014),
PDF File.
 M. Baboulin, S. Gratton, R. Lacroix, A. J. Laub
Statistical estimates for the conditioning of linear least squares problems.
Proceedings of the 10th International Conference on Parallel Processing and Applied Mathematics, PPAM 2013.
Lecture Notes in Computer Science, SpringerVerlag, Vol. 8384, pp. 124133 (2014),
PDF File.
 Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y. Fraigneau, O. Le Maître
A parallel solver for incompressible fluid flows.
Proceedings of the International Conference on Computational Science, ICCS 2013.
Procedia Computer Science, Elsevier, Vol. 18, pp. 439448 (06/2013),
PDF File.
 M. Baboulin, J. Dongarra, J. Herrmann, S. Tomov
Accelerating linear system solutions using randomization techniques.
ACM Transactions on Mathematical Software (TOMS),Vol. 39, No 2 (2013),
PDF File.
 M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, S. Tomov
A class of communicationavoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines.
Inria Research Report 7854
(02/2012).
Proceedings of the International Conference on Computational Science, ICCS 2012.
Procedia Computer Science, Elsevier, Vol. 9, pp. 1726 (2012),
PDF File.
 M. Baboulin, D. Becker, J. Dongarra
A parallel tiled solver for dense symmetric indefinite systems on multicore architectures.
Inria Research Report 7762
(12/2011),
also appeared as LAPACK Working Note 261.
Proceedings of IEEE International Parallel & Distributed Processing Symposium, IPDPS 2012,
PDF File.
 D. Becker, M. Baboulin, J. Dongarra
Reducing the amount of pivoting in symmetric indefinite systems.
Inria Research Report 7621
(05/2011),
University of Tennessee Technical Report ICLUT1106.
Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics, PPAM 2011.
Lecture Notes in Computer Science, SpringerVerlag, Vol. 7203, pp. 133142 (2012),
PDF File.
 M. Baboulin, S. Gratton
A contribution to the conditioning of the total least squares problem.
Inria Research Report 7488
(12/2010),
also appeared as LAPACK Working Note 236.
SIAM Journal on Matrix Analysis and Applications,Vol. 32, No 3, pp. 685699 (2011),
PDF File.
 S. Tomov, J. Dongarra, M. Baboulin
Towards dense linear algebra for hybrid GPU accelerated manycore systems.
Parallel Computing , Vol. 36, No 5&6, pp. 232240 (2010),
PDF File.
 M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou,
P. Luszczek, S. Tomov
Accelerating scientific computations with mixed precision algorithms.
Computer Physics Communications , Vol. 180, No 12, pp. 25262533 (2009),
PDF File.
 M. Baboulin, J. Dongarra, S. Gratton, J. Langou
Computing the conditioning of the components of a linear least squares solution.
Numerical Linear Algebra with Applications , Vol. 16, No7, pp. 517533 (2009), PDF File.
 M. Baboulin, S. Gratton
Using dual techniques to derive componentwise and mixed condition numbers
for a linear function of a linear least squares solution.
BIT Numerical Mathematics , Vol. 49, No1, pp. 319 (2009),
PDF File.
 M. Baboulin, J. Dongarra, S. Tomov
Some issues in dense linear algebra for multicore and special
purpose architectures.
Proceedings of the 9th International Workshop on StateoftheArt
in Scientific and Parallel Computing (PARA'08) .
Lecture Notes in Computer Science, vol. 61266127, SpringerVerlag (2008),
PDF File.
 M. Baboulin, L. Giraud, S. Gratton, J. Langou
Parallel tools for solving incremental dense least squares problems. Application to space geodesy.
Journal of Algorithms and Computational Technology, Vol. 3, No 1, pp. 117133 (2009),
PDF File.
 M. Baboulin, L. Giraud, S. Gratton, J. Langou
A distributed packed storage for large dense parallel incore calculations.
Concurrency and Computation: Practice and Experience, Vol. 19, No 4, pp. 483502 (2007),
PDF File.
 M. Arioli, M. Baboulin, S. Gratton
A partial condition number for linear least squares problems.
SIAM Journal on Matrix Analysis and Applications,Vol. 29, No 2, pp. 413433 (2007),
PDF File.
 M. Baboulin, L. Giraud, S. Gratton
A parallel distributed solver for large dense symmetric systems:
applications to geodesy and electromagnetism problems.
International Journal of
High Performance Computing Applications, Vol. 19, No 4, pp. 353363 (2005),
PDF File.
Theses
Title: Fast and reliable solutions for numerical linear algebra solvers in highperformance computing.
Habilitation à Diriger des Recherches (HDR) from
University ParisSud, defended December 5, 2012.
Committee: J.C. Bajard (Université Paris 6), P. Dague (Université ParisSud), F. Desprez (Inria/Ecole Normale Supérieure de Lyon, referee), Jack Dongarra (University of Tennessee, USA), S. Gratton (ENSEEIHT), P. Langlois (Université de Perpignan, referee), J. Roman (Universitat Politècnica de València, Spain, referee), B. Rozoy (Université ParisSud).
HDR dissertation
Title: Solving large dense linear least squares problems on parallel
distributed computers. Application to the Earth's gravity field computation.
Ph.D. in Computer Science from
Institut National Polytechnique de Toulouse, defended March 21 2006.
Committee: G. Balmino (CNES/CNRS), J. Dongarra (University of Tennessee, USA, referee), I.S. Duff (RAL/CERFACS), L. Giraud (ENSEEIHT), S. Gratton (CERFACS), N.J. Higham (University of Manchester, UK, referee), J. Noailles (ENSEEIHT).
Ph.D. dissertation
This thesis was awarded the Léopold Escande Prize by
Institut National Polytechnique de Toulouse.
Conferences
 Optimizing quantum algorithms using matrix factorizations.
Sparse Days, SaintGirons, France, June 20, 2022.
 Optimizing quantum circuits via matrix factorizations.
SIAM Conference on Applied Linear Algebra (LA'21), Online, May 19, 2021.
 Quantum circuit synthesis using linear algebra and optimization algorithms.
SIAM Conference on Parallel Processing for Scientific Computing (PP'20), Seattle, USA, Feb. 15, 2020.
 Iterative Solution of Sparse Linear Least Squares using LU Factorization.
6th IMA Conference on Numerical Linear Algebra and Optimization, Birmingham, UK, June 2729, 2018.
 Enhancing a Parallel Iterative Solver Through Randomization and GPU Computing.
SIAM Conference on Parallel Processing for Scientific Computing (PP'18), Tokyo, Japan, Mar. 9, 2018.
 Using randomization in the solution of sparse linear systems.
Workshop on Recent Topics in High Performance Computing, Kagaku Kaikan, Tokyo, Japan, Sept. 21, 2017.
 The story of the butterflies.
5th IMA Conference on Numerical Linear Algebra and Optimization, Birmingham, UK, Sept. 79, 2016.
 LU preconditioning for overdetermined sparse least squares problems.
20th International Linear Algebra Society Conference (ILAS 2016), Leuven, Belgium, Jul. 1115, 2016.
 Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures.
SIAM Conference on Applied Linear Algebra, Atlanta, USA, Oct. 2630, 2015.
 Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures.
11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Sept. 69, 2015.
 Invited plenary speaker: The story of the butterflies.
High Performance Computing in Science and Engineering (HPCSE 2015), Solan, Czech Republic, May 2528, 2015.
 Using condition numbers to assess numerical quality in least squares HPC applications.
5th International Conference on Numerical Algebra and Scientific Computing (NASC 2014), Tongji University, Shanghai, P.R. China, Oct. 2529, 2014.
 Using condition numbers to assess numerical quality in least squares HPC applications.
Minisymposium: Linear least squares and applications. Coorganizer with Yimin Wei (Fudan University, Shanghai, P.R. China).
19th International Linear Algebra Society Conference (ILAS 2014), Seoul, Korea, Aug. 0609, 2014.
 Randomized Algorithms for Dense Linear Algebra.
Minisymposium: Randomized algorithms in parallel matrix computations. Coorganizer with Sherry Li (Lawrence Berkeley National Laboratory, USA).
SIAM Conference on Parallel Processing for Scientific Computing, Portland (OR), USA, Feb. 1821, 2014.
 Statistical estimates for the conditioning of linear least squares problems.
10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Warsaw, Poland, Sept. 811, 2013.
 Computing least squares condition numbers on hybrid multicore/GPU systems.
International Conference: Applied Mathematics, Modeling and Computational Science (AMMCS 2013), Waterloo (Ontario), Canada, Aug. 2630, 2013.
 Accelerating linear system solutions using randomization.
The 18th Conference of the International Linear Algebra Society, Providence (RI), USA, June 37, 2013.
 Fast and reliable linear system solutions on new parallel architectures.
Séminaire Aristote  Ecole Polytechnique, Palaiseau, France, May 15, 2013.
 Computing least squares condition numbers.
Minisymposium: Numerical and reliability issues in high performance computing. Coorganizer with Ilse Ipsen (NC State).
SIAM Conference on Computational Science and Engineering, Boston, USA, Feb 25  March 1, 2013.
 Fast linear system solvers based on randomization techniques.
Minisymposium: Application of statistics to linear algebra algorithms. Coorganizer with Haim Avron (IBM Watson, USA).
SIAM Conference on Applied Linear Algebra, Valencia, Spain, June 1822, 2012.
 A class of communicationavoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines.
International Conference on Computational Science, Omaha (NE), USA, June 46, 2012.
 A parallel tiled solver for dense symmetric indefinite systems on multicore architectures.
26th IEEE International Parallel & Distributed Processing Symposium, Shanghai, China, May 2125, 2012.
 Invited plenary speaker: A parallel tiled solver for dense symmetric indefinite systems on multicore architectures.
Workshop on ''Recent developments in the solution of indefinite systems'', Eindhoven, Netherlands, Apr. 17, 2012.
 Invited plenary speaker: Accelerating linear system solutions on new parallel architectures.
20th ACM High Performance Computing Symposium (HPC 2012), Orlando (FL), USA, March 2629, 2012.
 A class of fast solvers for dense linear systems on hybrid GPUmulticore machines.
SIAM Conference on Parallel Processing for Scientific Computing, Savannah (GA), USA, Feb. 1517, 2012.
 A parallel tiled solver for dense symmetric indefinite systems on multicore architectures.
The sixth workshop of the INRIAIllinois Joint Laboratory for Petascale Computing, UrbanaChampaign (IL), USA, Nov. 2123, 2011.
 Getting fast linear system solutions on new parallel architectures.
The fifth workshop of the INRIAIllinois Joint Laboratory for Petascale Computing, Grenoble, France, June 2729, 2011.
 Accelerating linear algebra calculations using statistical techniques.
Minisymposium: Innovative algorithms for dense linear algebra. Coorganizer with Azzam Haidar (University of Tennessee).
SIAM Conference on Computational Science and Engineering, Reno (NV), USA, Feb. 28  March 4, 2011.
 Accelerating linear algebra computations with hybrid GPUmulticore systems.
The fourth workshop of the INRIAIllinois Joint Laboratory for Petascale Computing, UrbanaChampaign (IL), USA, Nov. 2224, 2010.
 Computational issues in least squares conditioning.
Parallel Matrix Algorithms and Applications (PMAA'10), Basel, Switzerland, June 29  July 2, 2010.
 Invited speaker: Summer school on escience with manycore CPU/GPU processors.
Lecture on "Dense linear algebra for hybrid GPUMulticore systems", Braga, Portugal, June 1418, 2010.
 Dense linear algebra for hybrid GPU accelerated manycore systems.
Numerical Methods in Engineering (METNUM 09), Barcelona, Spain, June 29  July 2, 2009.
 Deriving componentwise condition numbers using dual techniques.
Application to linear least squares.
SIAM Annual Meeting, San Diego, USA, July 711, 2008.
 Computing the conditioning of the components of a linear least squares solution.
VECPAR'08, Toulouse, France, June 2427, 2008.
 Some issues in dense linear algebra for multicore.
Minisymposium: Recent developments in dense linear algebra
Organizers: Marc Baboulin and Jack Dongarra
SIAM Conference on Parallel Processing for Scientific Computing, Atlanta, USA, March 1214, 2008.
 Computing the conditioning of dense linear least squares with (Sca)LAPACK.
SIAM Conference on Parallel Processing for Scientific Computing, Atlanta, USA, March 1214, 2008.
 Very large leastsquares for parameter estimation:
Algorithm and application.
SciDAC workshop on libraries and algorithms, Snowbird (Utah), USA,
July 30  Aug. 2, 2007.
 HPC tools for solving accurately the large dense linear least squares problems arising in gravity field calculations.
PARA'06, Workshop on StateoftheArt in Scientific and Parallel Computing, Umeå, Sweden, June 1821, 2006.
 A distributed packed storage for large parallel calculations.
SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, USA, Feb. 2224, 2006.
 Parallel distributed solvers for accurate and efficient gravity field computation.
SIAM Conference on Mathematical and Computational Issues in the Geosciences, Avignon, France, June 710, 2005.
 Solveur parallèle pour moindres carrés.
Séminaire Mécanique Orbitale, Centre National d'Etudes Spatiales, Toulouse, France, Sept. 30, 2004.
 Partial condition number for linear least squares problems.
International Congress on Computational and Applied Mathematics, Katholieke Universiteit Leuven, Belgium, July 2630, 2004.
 Parallel distributed Cholesky factorization for incore large dense problems.
SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, USA, Feb. 2527, 2004.
