Let's talk about some of the recent improvements to Unity! I'm not talking about the big, flashy things you already know about but the small changes we were able to sneak into the changelog. I'll look at 2023.1 beta stuff but most of it also backported as far back as possible :)
2023.1.0b11 - Animation: Reduced the cost of building muscle clips
This cuts out a little bit of time here and there when working with animations. The change itself is just tiny: the code is sorting some things, and I noticed that the sorting fails to inline the comparison function. That's essentially an artifact of how C++ templates work - if you pass in a function pointer it won't inline, but if you create a type with operator() or pass in a lambda it will inline just fine.

2023.1.0b11 - Editor: Reduced cost of outline rendering, which improves the frame rate of the editor when many objects are selected.

Again a tiny change; I switched some allocations to a temp allocator. It's nothing world changing, but it's noticeable - there's more to do here.

2023.1.0b11 - Kernel: Performance in heavily run code paths for NativeArray, UnsafeUtility, and AtomicSafetyHandle improved through inlining

This one isn't mine :) - Scott Williams did all the work here. But it's a good change! Scott took a close look at how NativeArray performs hen used with Mono (in Release in particular). Mono isn't the best when it comes to inlining, but you can help it using the MethodImpl attribute to force inlining.

The changes here ended up not just improving Mono performance, but also Burst! Some operations get 2x faster but it's also worth realizing that writing to a NativeArray in Mono is still slower than writing to a pointer or to a regular array (for value types). Mono has some intrinsic support there.

2023.1.0b11 - Scripting: Switched some path sorting during compilation from an invariant culture compare to an ordinal compare, speeding up C# compilation when scripts are changed.

Another tiny one! I'm always surprised by how much we pay because some places that use strings are not specifying to use ordinal comparison. That's of course sometimes not the right thing to do either, but when you aren't dealing with user input but simply want to sort some paths ordinal is just fine.

2023.1.0b11 - Shaders: Reduced the time spent in the asset post processing code for shader assets, which speeds-up the import of shaders.

Same story! More paths, more culture-sensitive string comparisons. This quickly piles up - costs seconds when importing many shaders.

2023.1.0b12 - VFX Graph: Greatly reduced the import cost of VFX Graph objects, especially when importing many at once.

This is two changes in one changelog item :) The first may sound counter-intuitive: There's some shader generation code that uses a StringBuilder and I made that faster by instead allocating a bunch of strings. Huh? It turns out that we only use the StringBuilder to replace text... and replacing text in a StringBuilder doesn't use ordinal comparison and can't be changed. :(

In my test cases, this was the bottleneck for imports and it got a nice 3x speed-up. The second change affects a post-processor from the VFX Graph package. Occasionally after imports it touches all imported VFX graphs and this ends up triggering asset database refreshes. Yes, plural - one per VFX graph asset. Luckily, you can use AssetDatabase.StartAssetEditing/StopAssetEditing to merge all of them. See some before/after below!
There are actually yet more issues with VFX graph imports, but a fix is already in flight. VFX graph importing leaks Scriptable Objects, which means that importing many VFX graphs in a session can create tens of thousands of objects that all need to be serialized and deserialized during domain reloads. In other words, importing many VFX graphs can cripple your editor until you restart it. We've found the leak and a fix is coming. I however used the leak to look at the perf of serialization!
2023.1.0b12 - Serialization: Improved performance of restoring managed objects during a domain reload.
This note exists twice with some variations in the changelog because it's two changes that I landed. Both follow the same pattern: adding caches. First, there is some validation code which does a thing for every type it sees.
It actually tries to see whether it can find some attribute on the type and emits an error if that's the case... but looking up attributes in C# is slow. So let's cache this, keyed by type.
Second, when we deserialize objects we need to construct them, which implies calling their default constructor. For every object we check its type, get all its methods, compare their name to ".ctor", and call the right one. I hope I don't need to emphasize just how slow this can be and how unnecessary it is to repeatedly search the same methods on the same types through linear searches involving string comparisons. So I added a cache :)
The cost basically disappears. My changes shaved off 40% of deserialization in my test cases.
2023.1.0b12 - Animation: Reduced the number of GC allocations when calling Animator.GetParameter(int index) and generally made it faster
This one was pointed out by someone on Twitter, so I fixed it! GetParameter(...) uses an array-typed property internally, and every access to that property will create a new copy of that array - because the property's getter actually calls into native code, then copies some data from the native side to the managed side.
It used this property three times. Three times. Now that's no longer the case, and getting a single parameter is not allocating copies of all parameters multiple times anymore. Unfortunately, it still allocates because the type it returns is a class type and I couldn't change that without really breaking API. It was still worth doing in my opinion.
2023.1.0b12 - Graphics: Reduced the time the render thread spends on Profiler.FlushRenderCounters
This might just be my favorite of the bunch. I was looking at a project that contains some logic that keeps the editor ticking even when out of focus. Switching back to the editor would sometimes cause 15 minute (!) stalls. The longer it was in the background, the longer it took. It turns out that the ticking caused frames to be queued up on the render thread.
The frames were empty, but every frame we collect profiler data and it was this repeated collection of profiler data that was happening when I switched back to the editor for all the frames we enqueued. The profiler counters we collect are stats for compute buffers. At first I thought we were leaking them (58000 seemed like a lot?) but no that wasn't the problem. The problem was that the profiler data was collected by walking a linked list (!) of all compute buffers.
Through some code-archeology I was able to determine that this linked list once had a (good?) reason to exist, but it had by now come to the point where the list was used exclusively for these profiler counters so I just deleted it and replaced it with two counters: One for the total size of compute buffers, one for their count. I later checked, and in the project the collection of compute buffer stats took 4ms every frame.
It was a good day when I got to delete this code. It wasn't just slow, it also turned out to have a race-condition if a user called an API at the wrong time, and that could lead to a use-after-free. Fun.
There's still more to do, and I have more PRs in flight as we speak. Coming soon!
This thread really reads like a grid for performance bingo: don't use strings, don't compare strings, don't use linked lists, don't allocate, cache if computation is expensive, manually inline if necessary, check your assumptions and look at generated code. Go forth and measure!

@sschoener Amazing work! Thanks so much for caring about this stuff :-)

Not sure if Visual Scripting is within your reach, but there's a doozy in the AOT generation where it loads every prefab in the game (often running out of memory in the process) when it doesn't need to: https://forum.unity.com/threads/fix-for-aot-prebuilds-insane-memory-usage.1329639/

(I did put in a bug report as well, but I can't find it now, so it might already be being addressed)

Fix for AOT Prebuild's insane memory usage

I've been looking into why the AOT Prebuild for visual scripting takes an insane amount of memory (up to 30GB for our project), and it turns out that...

Unity Forum
@RobD I think visual scripting has a lot of low hanging fruit to squash! I'm mostly not actively doing anything there because I don't have a real world project that is using Visual Scripting. I work almost exclusively in real projects :)
@RobD oh, you even have a fix! I'll get that in then!