hot take: people vastly overestimate how "intuitive" GUIs are because that's what they're used to
consider that the whole desktop paradigm (you can put files onto your desktop or in directories represented by folders, each file opens as a separate window, etc.) was made with an assumption that a computer user was a white-collar worker with an office job, and many stylistic decisions (for example, representing on/off options as boxes that can have a checkmark in them) relied on experience of people filling in forms
and even that paradigm gets constantly broken to either fit users' needs better, or fit the hardware available
apps no longer open in windows on smartphones, and are typically restricted to only operating on a single document at a time for performance reasons, except some aren't, like web browsers, where tabs are expected (and even on desktop, tabs are kind of breaking the one-window-per-document idea)
speaking of, what's the deal with the types of motions that smartphones and apps can recognize? some of them make sense, like swiping to switch images in a gallery or pinching to resize the view, but what about long-tapping or edge-swiping? the "language" of touchscreen user interfaces has a lot of verbs that the end user is not typically told about by either the OS or the app