I've been chasing for a few weeks now a bug where particles in a simulation “germinate” from nowhere, where “nowhere” actually means: two decies in a multi-GPU context suddenly start disagreeing about the velocity/position of a particle, so they each let it evolve its own way.

(Why are particles managed by multiple devices at the same time? Because of a thing called the “halo” in the domain parts that are at the boundary between two different devices.)

I've finally bitten the bullet, and started printing the values each devices see for the specific particle, and indeed something very weird happens: at one point, one of the devices starts to see zeroes instead of the last value they themselves wrote.

This is a relatively well-tested area of the #GPUSPH codebase. While I can't rule out a subtle bug, I'm starting to suspect that the issue may lie elsewhere. And if I'm hitting a hardware issue, I'm not going to be happy …

The number of people who don't know about #GPUSPH within #INGV is too damn high (.jpg).

Memes aside, I've had several opportunities these days to talk with people both within the Osservatorio Etneo and other branches of the Institute, and most of them had no idea something like that was being developed within INGV.

On the one hand, this is understandable, especially for teams that have never had a direct need to even look for #CFD code because of the focus of their research.

On the other hand, this also shows that I should have been much more aggressive with marketing the project internally. (And don't even get me started on who had the actual managerial power to do so before me, but that would put me on a rant that I'd rather avoid for now.)

I'm glad I've finally started working on this aspect, but also I can't say I'm too happy about having to do so.

Hopefully this is something that will help bring mass to it.

Our most recent paper on #SPH / #FEM coupling for offshore structures modeling with #GPUSPH has been published:

https://authors.elsevier.com/c/1m3VB_hNWk2tT

These kinds of works, with validation against experimental results, is always a challenging task, even for the simpler problems. Lab experiments and numerical simulations have each their own set of problems that need to be addressed, and the people working on the two sides of the fence often have a very different perspective on what should be considered trivial and not worth measuring, and what is instead crucial to the success of the experiment.

Getting these two sides to talk to each other successfully is no walk in the park, and I wish to extend my deepest gratitude to Vito Zago, who has gone to incredible lengths both during the “science making” to make things work out, and during the manuscript submission and review process, a nearly Sisyphean task in itself.

#SmoothedParticleHydrodynamics #FiniteElements #FiniteElementMethods

Today I introduced a much-needed feature to #GPUSPH.

Our code supports multi-GPU and even multi-node, so in general if you have a large simulation you'll want to distribute it over all your GPUs using our internal support for it.

However, in some cases, you need to run a battery of simulations and your problem size isn't large enough to justify the use of more than a couple of GPUs for each simulation.

In this case, rather than running the simulations in your set serially (one after the other) using all GPUs for each, you'll want to run them in parallel, potentially even each on a single GPUs.

The idea is to find the next avaialble (set of) GPU(s) and launch a simulation on them while there are still available sets, then wait until a “slot” frees up and start the new one(s) as slots get freed.

Until now, we've been doing this manually by partitioning the set of simulations to do and start them in different shells.

There is actually a very powerful tool to achieve this on the command, line, GNU Parallel. As with all powerful tools, however, this is somewhat cumbersome to configure to get the intended result. And after Doing It Right™ one must remember the invocation magic …

So today I found some time to write a wrapper around GNU Parallel that basically (1) enumerates the available GPUs and (2) appends the appropriate --device command-line option to the invocation of GPUSPH, based on the slot number.

#GPGPU #ParallelComputing #DistributedComputing #GNUParallel

I just realized that in my quest to port #GPUSPH to other POSIX-like OSes, I've never actually tried something like Alpine or other non-glibc Linux system.

Talking about dependencies: one thing we did *not* reimplement in #GPUSPH is rigid body motion. GPUSPH is intended to be code for #CFD, and while I do dream about making it a general-purpose code for #ContinuumMechanics, at the moment anything pertaining solids is “delegated”.

When a (solid) object is added to a test case in GPUSPH, it can be classified as either a “moving” or a “floating” object. The main difference is that a “moving” object is assumed to have a prescribed motion, which effectively means the user has to also define how the object moves, while a “floating” object is assumed to move according to the standard equations of motion, with the forces and torques exerted on the body by the fluid provided by GPUSPH.

For floating objects, we delegate the rigid body motion computation to the well-established simulation engine #ProjectChrono
https://projectchrono.org/

Chrono is a “soft dependency” of GPUSPH: you do not need it to build a generic test case, but you do need it if you want floating objects without having to write the entire rigid body solver yourself.

1/n

#SmoothedParticleHydrodynamics #SPH #ComputationalFluidDynamics

Project Chrono - An Open-Source Physics Engine

Project Chrono is a physics-based simulation infrastructure based on a platform-independent, open-source design.

That first implementation didn't even support the multi-GPU and multi-node features of #GPUSPH (could only run on a single GPU), but it paved the way for the full version, that took advantage of the whole infrastructure of GPUSPH in multiple ways.

First of all, we didn't have to worry about how to encode the matrix and its sparseness, because we could compute the coefficients on the fly, and operate with the same neighbors list transversal logic that was used in the rest of the code; this allowed us to minimize memory use and increase code reuse.

Secondly, we gained control on the accuracy of intermediate operations, allowing us to use compensating sums wherever needed.

Thirdly, we could leverage the multi-GPU and multi-node capabilities already present in GPUSPH to distribute computations across all available devices.

And last but not least, we actually found ways to improve the classic #CG and #BiCGSTAB linear solving algorithms to achieve excellent accuracy and convergence even without preconditioners, while making the algorithms themselves more parallel-friendly:

https://doi.org/10.1016/j.jcp.2022.111413

4/n

#LinearAlgebra #NumericalAnalysis

I've just reviewed a manuscript about the recent progresses made to introduce #GPU support in a classic, large #CFD code with existing good support for massive simulations on traditional #HPC settings (CPU clusters).

I'm always fascinated by the stark difference between the kind of work that goes into this process, and what went into the *reverse* process that we followed for #GPUSPH, which was developed for GPU from the start, and was only ported to CPU recently, through the approach described in this paper I'm sure I've already mentioned here:

https://doi.org/10.1002/cpe.8313

When I get to review this kind of articles, I always feel the urge to start a dialogue with the authors about these differences, but that's not really my role as the reviewer, so I have to hold back and limit my comments to what's required for my role.

So I guess you get to read about the stuff I couldn't write in my reviewer comments.

1/n

By Tesler's law of conservation of complexity
https://en.wikipedia.org/wiki/Law_of_conservation_of_complexity
there's a lower bound to which you can reduce complexity. Beyond that, you're only moving complexity from one aspect to another.

In the case of #GPUSPH, this has materialized in the fact that the exponential complexity of variant support has been converted in what is largely a *linear* complexity of interaction functions. You can find an example in my #SPHERIC2019 presentation:
https://www.gpusph.org/presentations/spheric/2019/bilotta-spheric2019/#9.0

Those slides (if you want you can start at the beginning here <https://www.gpusph.org/presentations/spheric/2019/bilotta-spheric2019/>) also give you an idea of what happens to the code. And probably also give you a hint about what the issue is.

10/

Law of conservation of complexity - Wikipedia

I'm not going to claim that we found the perfect balance in #GPUSPH, but one thing I can say is that I often find myself thanking my past self for insisting on pushing for this or that abstraction over more ad hoc solutions, because it has made a lot of later development easier *and* more robust.

AFAIK, our software is the SPH implementation that supports the widest range of SPH formulations. Last time I tried counting how many variants were theoretically possible with all options combinations, it was somewhere in the whereabouts of 9 *billion* variants, taking into account the combinatorial explosion of numerical and physical modeling options —and even if not all of them are actually supported in the code, the actual number is still huge, and the main reason why we switched from trying to compile all of them and then let the user choose whatever they wanted at runtime to forcing the user to make some compile-time choices when defining a test case.

8/