@david my first guess is that you have a lot of small files and something is causing zfs to insert a lot of padding.
Is ashift the same on both pools (zpool get ashift, I think)? My guess is the source may be 9 (512 byte minimum block size) and the destination is 12 (4k min block).
Is the source not raidz and destination is raidz?
How are you looking at total space? zpool and zfs commands look at different things?
@mgerdts I am looking at it via 'zfs list' and 'df' both show compatible information. The main difference seems to be in the refer (I am redoing the receive right now, so I am going from memory), it appears that the receive has multiple full copies.
And the zfs-receive seems to corroborate that by saying it has multiple 'full' streams ... maybe?. In ~7 more hours the receive will be finished
@david I'm not sure what to make of ashift=0: that's surely not the real value of ashift. Based on https://openzfs.github.io/openzfs-docs/man/7/zpoolprops.7.html?highlight=ashift saying that ashift can be changed, there have been changes in this area since I last used zfs a lot.
If ashift is the same between the two pools, that points us back to the question of whether you are using raidz or draid and if so, do both pools have the same number of disks per raidz vdev?
@david 8k recordsize + compression could lead to a poor interaction with ashift=12 as well. Suppose an 8k block would compress to 4200 bytes. With ashift=12, the compressed 8k block will consume 2 x 4k sectors (8k total). With ashift=9, the compressed 8k block will consume 9 x 512b sectors (4.5k total).
With raidz the overhead varies by number of drives in a raidz vdev. See my explanation here:
https://github.com/openzfs/zfs/blob/master/lib/libzfs/libzfs_dataset.c#L5340-L5426
@david while we have concluded raidz is not to blame here, I figured it may be worth mentioning that I did a talk on this work while at #Joyent.
Slides: https://us-east.manta.joyent.com/Joyent_Dev/public/docs/2019-06-RAIDZ_on_small_blocks.pdf
Video: https://youtu.be/sTvVIF5v2dw
Contrary to what I predicted back then, today’s NVMe SSDs pretty much all present as 512n, not as 4Kn.
@javierk4jh My understanding from reading zfs-send and zfs-receive and online searches is that you actually cannot change recordsize that way as the stream is deltas itself.
That is if the incremental says to "set block 15 to 0xfeedface", then block 0xfeedface doesn't have the context of the rest of the block to fill in.
Granted this is a solvable problem to just read the original and write out the whole, but they opted to not have that complexity
I did check anyway, and recordsizes look good
@david I am not a #ZFS expert, but a lot of reasons come to my mind:
* No compression on the target file-system
* Different allocation sizes
* Maybe some sort of automatic snapshots generation on the target file-system
I faintly remember I had problems when using send/receiver with script generated snapshots.
@elliot The goal here is to actually have all of the snapshots mirrored,and I zpool destroy and zpool create between each attempt, so any mystery snapshots would have to be coming from the original machine that only has 4T.
That said, latest experiment was a success, and ashift was the culprit