We've hit the same problem in the #ATGeo group when it comes to location data. There are multiple (open) sources to pull data from and consensus was that it doesn't make sense to replicate off-protocol data collections in #ATProto but to rather strive for an easily replicable way to fire up a ...

RE: https://bsky.app/profile/did:plc:enu2j5xjlqsjaylv3du4myh4/post/3ly3dkwe7ec23
...drop-in instance of that off-protocol data source and then use references (not strongRefs) into that external data collection in your lexicon. But I'm totally happy to discuss this further as I also haven't quite made up my mind about this practice.