@foone the main thing that comes to mind is doing this as a keyframe animation or something similar. With each person's location charted with at what times they leave a room and at what times they arrive at the next so you can either do a pre-determined path between the two rooms or just interpolate between them.
I could imagine doing this in Unreal Engine and setting it up so that each "person" is an object on a separate timeline within the animation with a sync to a media playback component. Just rigging the thing up as one big "animation" so it can evenly match up to the video's current time and you could jump around without causing any issue (just make it so forced updates to the playback time simultaneously set the animation's current time)
But that's obviously a bit overkill, unfortunately I really don't know any software that would be better suited to something like this. I think the concept of making a list of "keyframes" for each person's position is probably the right way to go at least.