#SwiftLang #concurrency program stall question:

Given an array of urls to files the code does this:

// some prep code
await withTaskGroup { group in
var limit = min(urls.count, maxConcurrentTasks)
for ix in 0..<limit {
group.addTask { TypeInit(url: urls[ix]) }
}
for await type in group {
await doSomethingWith(type)
if limit < urls.count {
let url = urls[limit]
limit += 1
group.addTask { TypeInit(url: url }
}
}
}
// some other code never reached when the hang occurs

The issue is that when testing the code sometimes stops. No processor activity. The program is blocked waiting for something. I don't know what.

maxConcurrentTasks is currently 128 and is picked to get reasonable UI feedback. doSomethingWith(type) is not async but is on the MainActor. I've also used an explicit

await MainActor.run { doSomethingWith(type) }

No difference. (1/2)

Adding debug in the withTaskGroup code either hides the problem or doesn't help debug the cause. Adding debug elsewhere to check progress shows that the blockage occurs before all items have been processed, but never at the same spot. It is as if the `for await type…` sometime waits for a value that never appears. Note that TypeInit() is not async, but may involve file I/O or the exec of an external process.

Any hints on how to track down the possible cause? I’m tired of throwing random bits of code around hoping for a change. (2/2)

@marchyman what does “doSomethingWith” do? I bet that’s it.

@mattiem It appends item to an array that is a property of a struct owned by an Observable. That’s why the switch to MainActor.

I’ve reduced the amount of delay in the item constructor by eliminating running an external process. The problem hasn't yet re-occurred. I think I've only changed the timing and not really addressed the issue. The code I eliminated did not signal any failures.

@marchyman Normally when you see behavior like you were describing, it’s an await that is never returning. That can happen when a callback hasn’t be called, and I could believe such a thing happens with an external process involved.

@mattiem Agreed. But…

I finally got the hang to occur with instrumented code called by addTask. Signposting says every start had an end. I’ll move the signposting code around and see if I can learn anything new.

The only thing I know for sure is that the failure is non-deterministic.