Remember when I mentioned we had ported our #fire propagation #cellularAutomaton from #Python to #Julia, gaining performance and the ability to parallelize more easily and efficiently?

A couple of days ago we had to run another big batch of simulations and while things progressed well at the beginning, we saw the parallel threads apparently hanging one by one until the whole process sat there doing who know what.

Our initial suspicion was that we had come across some weird #JuliaLang issue with #multithreading, which seemed to be confirmed by some posts we found on the Julia forums. We tried the workarounds suggested there, to no avail. We tried a different number of threads, and this led to the hang occurring after a different percent completion. We tried restarting the simulations skipping the ones already done. It always got stuck at the same place (for the same number of threads).

So, what was the problem?

1/n

The simulations were of the kind “try every starting point and see what happens”: we launch them all together because most of the initial data is the same (topography, fuel distribution, etc) and loading it once only is more efficient. The parallelization is done trivially (one work-item per ignition point). Of course, even if the domain is something like 500x500 cells, not every cell is a valid ignition point: for example, you don't want to start a fire from a sea cell (we're simulating volcanic fires, not oil spills catching fire …), so we filter out “unburnable” cells out of the initial starting grid. So far, so good.

Still, I noticed that there were a number of cells that were being ignited with a warning about them not being burnable cells. Even worse, some of them had NaN fuel! How could this be possible?

The NaN actually gave me a hint about why the threads were getting stuck, and it had nothing to do with deadlocks in Julia: as part of the loop, a burning cell will decrease its fuel and become completely burned out when the fuel drops to 0 (or below). But if the fuel is NaN, NaN minus a constant is still NaN, and NaN <= 0 is never satisfied (NaNs are not comparable with other floating-point values, all comparisons are false).

2/n

Once the deadlock mystery was solved (it wasn't deadlocking: simply put, the fire from those starting points never went out, so the simulation continued forever!), the question remained on *why* those starting point (1) were being picked even if they were unburnable and (2) had NaN fuel.

The NaN fuel was actually easy to know: as part of the initialization routine, the points outside of the island topography were not only marked as UNBURNABLE, but their fuel was also set to NaN in case of a “freak accident” (read: bug in the code) led to their ignition. But if these points had been marked unburnable, why were they being picked as starting point in the first place?

As it turns out, the issue was with a new feature we had implemented recently: firebreaks. (In fact, these simulations were being run *specifically* to show the effects of firebreaks on the volcanic fire hazard on the island.)

3/n

Firebreaks are specified separately from the other input data, and can be realized in many different forms: linear firebreaks, annular firebreaks, or even arbitrarily-shaped firebreaks loaded from a raster file. In this specific case, we were testing the latter case, where the firebreaks were obtained from the hike trails on the island. Firebreaks are implemented by marking the relevant cells as unburnable too, but to differentiate them from other unburnable cells (for visualization purposes) we used a different code for them. And while we had tested the firebreaks for single runs, we hadn't yet tested it for “grid” runs, like the one we were doing now.

As it happens, the filtering code for the initial grid had *not* been updated to take into account the new mark, so firebreak cells were being considered as starting points for the fires, *and* some of them were actuall right were the topography ends —leading to the NaN fuel and subsequent stalls.

The fix for this was trivial once the issue was discovered, but it took *a lot* of time, because the stalls always happened beyond the 80% mark for the (filtered) grid.

4/n

What fascinates me the most about this is that even the bug discovering was serendipitous. If we hadn't chosen to invalidate the fuel for cells where the topography was not positive, or there hadn't been firebreak points just at the border of the valid cells, we might have not ever spotted the grid cell filtering bug, meaning that the full-grid simulations with the firebreak would have been *slightly* wrong, leading to a slight overestimate for the probability of fire on the burnable cells in the domain. Possibly it wouldn't even have mattered in the grand scheme of things, since the difference would have still been less than the uncertainty that is coming from other sources. But still, there's some satisfaction in knowing that with this issue fixed we're at least not introducing additional sources of error.

5/5

@giuseppebilotta sounds cool. What are you trying to accomplish with these computations?
@tnoisu we're working on volcanic fire hazard assessment on Stromboli island. These simulations specifically were done to assess the hazard mitigation potential of hike trails if properly maintained.

@giuseppebilotta I wish I had your job 

So you take the map of a real place and then run simulations? How do you determine if a given cell is burnable or not? Is that a yes or no or a ranged value?

@tnoisu we use topographic information about the place, and vegetation and humidity indices for the fuel and burnability. The results are probabilistic. For each ignition point we run multiple (probabilistic) fire spread scenarios, and then compute the probability for each cell to catch fire as a ratio between the number of times it got burned and the total runs.

You can find details about how the model works in this paper:

https://www.mdpi.com/2571-6255/7/3/70

Modeling Fire Hazards Induced by Volcanic Eruptions: The Case of Stromboli (Italy)

We hereby present VolcFire, a new cellular automaton model for fire propagation aimed at the creation of fire hazard maps for fires of volcanic origin. The new model relies on satellite-derived input data for the topography, land-use, fuel, and humidity information, and produces probabilistic maps of fire propagation simulating fire spread. The model contains several simplifications compared to the current state-of-the-art, limiting its usability to plan fire-fighting interventions during an event in favour of a reduced computational load. The accuracy and reliability of the model are also discussed by presenting its ability to reproduce two recent fires on Stromboli island, with good spatial fit (Brier score of 0.146±0.002 for the 3 July 2019 volcanic fire, and of 0.073±0.001 for the 25 May 2022 anthropogenic fire) and less than 1.5% variation across multiple simulations for the same event.

MDPI
@giuseppebilotta thank you for your time and information
@giuseppebilotta Interesting read. It sounds like it might have been even better to set the fuel to missing instead of NaN because then it wouldn't have lead to a deadlock but an error in the comparison code. And is the fire propagation model openly available somewhere?

@felixcremer thanks for the comment. my understanding is that using missing would change the data arrays type, and may have a (however small) performance impact. Still, it would be interesting to see how expensive it would be compared to e.g. a better refactoring of the ignition that just checked for NaN fuel when a cell is marked for ignition.

We do intend to release it as open source, unless there is some objection from the stakeholders that commissioned the work.

@giuseppebilotta True, you would change the eltype of your array but that might not be that much of an impact.

@giuseppebilotta Encoding more logic into the type system should help with that situation. I would create an abstract type Cell and then have two types of Cell, one that burns and one that does:

abstract type Cell end

struct UnburnableCell <: Cell end

struct BurnableCell <: Cell
fuelAmmount
end

@giuseppebilotta This totally removes the possibility of these kind of errors and is super cheap (no overhead) in Julia. It also makes some parts of the code cleaner as you can dispatch on the types of cell and you don't have to rely on if statements (which are easy to omit).
@owiecc thank you very much for the precious suggestions. We are just getting started with Julia and any recommendation on how to best leverage the language features is very welcome. The code is currently an almost direct port from a Python script, so there's definitely still a lot of room to improve on.