@mattiem what is the ideal way to do file I/O with NSFileManager and Swift Concurrency?

An actor doesn’t seem correct.

I’ve seen using withCheckedThrowingContinuation wrapping a DispatchQueue. This seems… good? I’m not sure.

How would you approach this? I guess I’m looking for “NSFileManager, but with async/await”

@jsq @mattiem You could use an actor that is backed by an own queue. You should however "never" do I/O on the shared thread pool (because forward progress cannot be guaranteed).
For guidance on Concurrency, I always find it worth looking what SwiftNIO is doing, in this case: https://swiftpackageindex.com/apple/swift-nio/main/documentation/_NIOFileSystem
swift-nio Documentation – Swift Package Index

@helge @mattiem great example of why I hate how the concurrency system works.

Background file I/O was a solved problem and now we have to reinvent obtuse solutions.

Sounds like I should just use GCD?

@jsq @mattiem That’s what I usually use. I wouldn’t say that background I/O is a solved problem, it’s surprisingly hard to get right and it’s a little disappointing that Swift doesn’t come with sth builtin for this.
@helge @jsq I suspect that very few projects actually are doing greater than ~ one I/O operation at a time. And the ones that are have solved it with some kind of fancy async I/O library and now moved on. I can only think of one person that has consistently mentioned this and they are in this thread 😀
@mattiem @jsq I don’t think hitting something like a database concurrently is that unusual. This was more about files, but even there it seems pretty common to me. Like for example loading avatar images used in say a chat client, or images in some picture like app. All kind of image or file caches.
@helge @jsq If this really is a common problem (deadlock due to thread pool exhaustion by doing I/O), then I do not understand why it does not come up more often.

@mattiem @jsq That’s not what I said, I said that having to do more than one I/O op at a time is very common. You suggested that it’s usually only one at a time.

It’s not a common issue in the happy path but the unhappy path can and will reliably deadlock the whole process for no reason. You only need n-core concurrent I/O for this, which is little.
Since unlike GCD threads do not overcommit in the async pool, you must guarantee forward progress.

@mattiem @jsq And it does come up often, it’s the reason NIOFileSystem exists 🙃
@helge @jsq well then I’m wrong!
@mattiem @jsq If you can guarantee that your app only ever uses 1 blocking I/O at a time (e.g. by using an actor), that should technically be fine. But keeping that guarantee can be difficult for anything non-trivial, e.g. involving multiple packages.
Or in other words: If you only do that in your app level code, it may be ok. If you do this in packages/libs, it is bound to be come an issue.
In any case: There is just no reason to run blocking I/O on the concurrent thread pool and risk it.
@mattiem @helge @jsq well iOS apps might not be doing much more than one I/O operation at a time. Server apps will almost be guaranteed to be running many I/O operations concurrently.
@opticalaberration @helge @jsq no argument. Do you think the language can/should add affordances here?
@mattiem @opticalaberration @jsq Not the language, either stdlib or Foundation should provide an API covering the common setups. Which I think is more difficult than it sounds, but even just async Data(contentsOfURL:) would be helpful.
@mattiem @helge @jsq non blocking file IO should be something the stdlib/foundation provide. We shouldn’t need to rely on the SwiftNIO team for that.
@opticalaberration @helge @jsq I’m absolutely not opposed. But I still think more evidence is needed to demonstrate this is a material, non-theoretical problem in client apps. I would love to see some, I think it would be super interesting.
@mattiem @opticalaberration @jsq The thumbnail cache loading example is not good enough for you? Listing a folder w/ more than 100 files, stat'ing them?
I don't actually know, but what happens if you fopen/fread/Data read an iCloud Drive file that is not cached yet? Presumably that will block? Do that 8 times and your concurrent thread pool is fully saturated doing ... nothing.
@helge @opticalaberration @jsq i’d have to think harder about the factors that would cause these situations to deadlock. You could be right.
@mattiem @opticalaberration @jsq TBF it's not really an issue produced by Swift Concurrency, the actual issue is that there are no POSIX API's for non blocking file I/O. AFAIK Windows has such. But it still has to deal with it, and that "dealing with it", is currently completely custom code per app/framework.
@mattiem @opticalaberration @jsq I think something I would blame on Swift in the context is that it tries to establish cooperative multitasking as a general purpose concept again.
We've been there (several times, Java did that in the beginning too!) and decided it's *not* a good option due to many of the issues mentioned.
If the kernel sees a process being blocked by I/O, it put's it to sleep, avoiding those issues (I think that's the reason why Posix doesn't have that, it's built into the sys).
@helge @opticalaberration @jsq on this topic I don’t feel like I can contribute in any meaningful way. The cooperative system could possibly be changeable without affecting language semantics but I don’t see it happening.
@helge @mattiem @opticalaberration @jsq this is actually my exact example of “did you check that parallel was actually needed vs serially loading the attributes.” My team is right now doing a new version of this precise problem and I’m saying “just do it serially until you can prove it’s not good enough, and if serial is too slow, let’s consider a database rather than getting clever.”
@cocoaphony @mattiem @opticalaberration @jsq But why? Resolving the avatars (loading multiple files) in parallel is not conceptually difficult (NIOFileSystem does exactly this). It's also what every web browser does.
The main point of this thread is that you shouldn't do this on the concurrent thread pool. Nothing more.
@helge @mattiem @opticalaberration @jsq I am probably over-focused on obsolete history, but my experience reading from (spinning) disk has often been that trying to access a ton of stuff at the same time can often saturate the I/O system in non-obvious ways. So, I tend to reach for a simple for-loop and not create separate threads for each. If you do that, you won't have any problem, a long as its not MainActor, right? I haven't had trouble so far on iOS, though on servers, I do other things.
@helge @mattiem @opticalaberration @jsq I don't find Swift a particularly good language for non-Apple, and particularly non-iOS problems. I know it wants to be; I just don't think it's there yet. So I'm just talking about iOS here, and maybe, occasionally, Mac.
@cocoaphony the language itself, the tooling, the standard library? Some combination?

@mattiem Mostly stdlib. A little bit of structured concurrency (but even that would probably not be a problem if stdlib had the right tools).

We're (painfully) slowly getting basic, table-stakes functionality like "run a command and get the output." Subprocess finally exists (v0.4), but…how can I use it in a simple script? Can't. Need a whole package.

You can't throw normal errors simply because there are no "normal Error" types. Few things "click together." 1/

@mattiem Compare what I believe is possibly the best stdlib ever written, one so good it covers over a mountain of … interesting language choices: Go. You want to build a small network server? Every tool you want is in the standard library. And you want to hook more stuff into it? Implement some tiny interfaces like Reader and instantly plug-into a huge ecosystem.

Swift-log is such a great counter-example. Cool. An pluggable interface. So my program can log out of the box? Uh…no, not actually.

@cocoaphony @helge @mattiem @opticalaberration @jsq Reading from spinning rust is never an issue on iOS and network I/O is definitely a candidate for simultaneous I/O, although there are different limitations such as server capacity and/or network reliability. I don't think clients can blindly create a thread for every item, otherwise they could easily become a tool for denial-of-service.

All of that is to say it seems to me I/O is a problem of tailoring to specific usage, and to be honest I've never seen any language provide a fully-baked solution that works for everything.

@mattiem @opticalaberration @helge @jsq I’m aware of an app that uses WebSockets for streaming responses, multiple HTTP requests, and local persistence.

Right now this app doesn’t even share actor executors, much less opt anything into (at)concurrent. But I feel like there’s probably some performance gains to be had given that a great deal of async calls could probably happen away from MainActor.

@pixelscience @mattiem @opticalaberration @jsq This is not about saturating the main actor, but about saturating the fixed size concurrent thread pool.
@helge @pixelscience @opticalaberration @jsq I thought we were talking not just about saturation, but deadlock specifically?
@mattiem @pixelscience @opticalaberration @jsq Deadlock may not be the right word. Nothing will execute for the foreseeable future if blocking I/O consumes the threads. That’s why they insist on guaranteed forward progress for on pool traffic.
Notably the same can happen for compute, but at least the CPUs won’t be idle during that 🙃
@opticalaberration @mattiem @jsq I already gave examples why this is very common in client apps as well.
Generally I think the diff between proper client apps and server apps is much less than what people make it.
@helge @mattiem @jsq I don’t disagree there. There will always be client apps that do a lot more I/O and need to manage it correctly. But you’d generally hit those limits a lot quicker in a server app.
@jsq @helge @mattiem what is the problem you're solving here, given nsfilemanager is thread safe. asking out of curiosity.
@krzyzanowskim @jsq @mattiem I think Jesse's original point was that the Swift Concurrency comes with 0 affordances for I/O. While previous efforts like GCD and OperationQueue's did work for that.
That's indeed an issue, even for async things like networking I/O - you can't really integrate those w/ the system right now w/o major performance implications. (GCD had all that)
Is this the SwiftScript I'm waiting for? Sounds a little like it? https://github.com/Cocoanetics/SwiftScript
GitHub - Cocoanetics/SwiftScript: A Swift Interpreter Library and CLI

A Swift Interpreter Library and CLI. Contribute to Cocoanetics/SwiftScript development by creating an account on GitHub.

GitHub

@helge Ooooh I didn't fully grasp the gravitas of this. But I have at least two projects are compiling user swift code. Have to try this out as soon as possible.

> the interpreter is what lets you run the same code inside an app without a toolchain, without compilation, and without violating App Store rules

@krzyzanowskim @helge @mattiem

Sorry, I should have provided this snippet to begin with.

Here’s the typical pattern with GCD. Just simple FileManager operations off the main thread.

My question is, what is the “correct” way to make this work with a Swift Concurrency app? Convert to Tasks? Wrap in ‘withContinuation’? Make an Actor?

There doesn’t seem to be any official guidance.

@jsq @krzyzanowskim @helge @mattiem I would put your file stuff on an async concurrent helper function and call it with await wherever you need it
@emix @jsq @krzyzanowskim @mattiem That’s what you shouldn’t usually do as discussed at length in the thread. (I/O on the concurrent thread pool)
How to run blocking I/O asynchronously is a surprisingly difficult topic (also discussed in thread), nothing is built into Foundation/Swift Concurrency, but the naive variant is something like this: https://gist.github.com/helje5/0a53bdd084131f82fdcb5f6d382aa8fa
So it's not difficult to add at all, but not builtin either (maybe due to the complexity of the topic).
@helge @jsq @krzyzanowskim @mattiem very interesting. I guess it depends on the use case.

@jsq Reading/writing from disk is one of the areas where Dispatch and Swift deviate considerably.

*In the face of long, blocking operations* naive use of dispatch global queues will just keep starting threads, which will cause very poor perf. Eventually, it will stop creating threads, and then you may deadlock. Concurrency will deadlock much sooner, but will not cause thread explosion.

@jsq So, I think this really comes down to what trade-offs you want to make.

If you just want to get a single bit of work off the main thread, it's very easy with concurrency (`@concurrent`, `async let`, or even actor). And I think this is often a totally fine thing to do.

But if the amount of work you want to do is unbounded, neither system has a true solution you can use without any thought. Wrapping a global queue will be less-likely to deadlock, more likely to perform poorly.

@jsq if you do not care about going to deep, I think wrapping up a dispatch queue in a continuation is fine. You are throwing threads at the problem, and often that's ok.

@mattiem @jsq if you have unbounded I/O, I would think the tool you’d reach for is DispatchIO, not DispatchQueue.

IMO, Swift concurrency still lacks any useful *parallelization* tools. There’s no way to queue things meaningfully, and especially no tools to manage n-way queuing.

But also, the vast majority of Swift is iOS, and it’s challenging to get a non-contrived iOS problem that exhausts the thread pool. Huge amounts of effort go into preventing problems that are hard to cause on purpose.

@mattiem @jsq by “contrived,” I’m including situations where you’ve outsmarted yourself, using complicated solutions that you didn’t actually check were faster than the simplest, non-parallel solution. For example, it’s easy to exhaust threads if you spawn one for every file (though, *still* not trivial on iOS), but I’m betting you didn’t actually demonstrate that it was better solution for “n” small enough not to exhaust your threads.

@cocoaphony @mattiem @jsq

From time to time I am getting nostalgic about NSOperation & friends…

@tuparev @cocoaphony @jsq they all still work just fine

@mattiem @cocoaphony @jsq

I know, but they feel like ugly duckling within the rest of the source code.

@tuparev @mattiem @jsq And do not play nice at all with Swift 6, so "still work just fine" doesn't feel true. I struggle quite a lot with how to integrate our Operation-based code into the rest of the system.

@cocoaphony @tuparev I interpreted the message to mean they were deprecated or something like that, sorry.

You can model how queues work with the type system, but it really isn't nice to work with. It's easier to just make those parts unchecked.

@jsq see this is what you get for asking a question

@mattiem lol that whole thread got out of control and I’m just like

But how do you make NSFileManager work with Swift Concurrency
🥺👉🏼👈🏼

@jsq but Jesse have you thought about all the spacing of silicon wafers in you NVMe SSD and how they interlock in a quantum field that is saturated on I/O in the cooperative pool of angel tears! @mattiem

@jsq I've been thinking about it, and I am unable to come up with a situation where deadlock is a concern. That doesn't mean impossible, but I just can't cook one up.

Given that, I think you are safe to treat FileManager operations as regular synchronous work and everything in this post applies. Normally I'm into async let, but it might be slightly awkward because not all FileManager functions produce a value. Still possible to use though.

https://www.massicotte.org/synchronous-work/

Synchronous Work

Synchronous Work

massicotte.org
@jsq if you just want to move work off the main thread, that'll be fine. But if you are doing multi-step file system mutations and need them to appear atomic to the rest of your system, packaging up the work into an actor could work well.
@mattiem @jsq An actor w/o a custom executor has the same issues.
The example for getting the system lock up (it's not a deadlock technically) is simple: Just load Core-Count pictures (like 8) that block. The whole concurrent system will be dead until they recover (because it doesn't grow).
BTW: The OperationQueue example I gave will *not* create new threads in an unbounded manner. Only as many as you assign it. Since it's so easy to do, I see no reason to ever do I/O on the shared pool 🤷‍♀️