#ActivityPub developers only please: how many items should be in a full collection page?

#EvanPoll #poll

Around 12 or fewer
10.7%
Around 20
30.4%
Around 50
32.1%
Around 100 or more
26.8%
Poll ended at .
If you implement ActivityPub on the server side, feel free to reply with your page size policy.

So, here's the trade-off: adding embedded objects can reduce the number of extra HTTP requests required to render the page of objects. For example, if showing a `followers` collection, adding each actor's name, avatar, and so on can be a real savings.

However, it puts a lot of costs on the server -- looking up cached or local data about each object.

Long story short: adding embedded objects is a pressure towards having smaller page sizes.

If you're not showing embedded objects, then filling up a collection page is usually just a couple of database queries. And adding more items to the page has very little extra cost.

The bigger your pages are, the fewer requests a client has to make to get all the data.

So, I think if you're not doing embedded objects, the pressure is towards bigger pages.

There are a couple of other confounding factors.

Adding embedded objects makes supporting HTTP Caching harder. The `ETag` header isn't too hard, but `Last-Modified` is difficult. You need to check not only what the collection page modification date is, but also each of the embedded objects (and take the max date!). It's a pain and most folks don't even implement it.

The other things is collection filtering. This is where you check each item in a page to see if the client can actually read it, and leaving it out if not. It's very important if you include embedded objects, and not that important if not. If you only include references, they can be checked when the client tries to fetch them.
Another thing is whether "pages" in your collection are real objects -- buckets that fill up with items as time goes on -- or just fixed-length offsets from the most recent item. I think having real pages is much better for caching and synchronization.
Anyway, here's my thought: I think the advantages of embedded objects are offset by the problems with caching. I think we should make collection pages real, stable objects, with fixed contents and real modification dates. Return only references, not embedded objects. Do filtering, though. And make pages big -- 100 items or more.
@evan also http/3 which is widely used currently, has multiplexing, so making multiple requests does not come with a penality anymore (if the objects are on the same host)

@[email protected] I don't support polls on NodeBB yet so can't vote, but...

Default perPage is 20*. This is configurable in the admin control panel.

* except for user outboxes. That uses a cursor, not pages.

@julian @evan

There is no option "it depends.. ". Shouldn't this depend on what you intend to use the collection for? A design choice depending on your use case and domain model.

Or is the missing context to the poll's question is "..in a typical fediverse microblogging environment"?

@julian @evan

Well.. atm all options have 25% of the vote 😅

Poll FAQ

I do a lot of polls on my account at Mastodon. I get the same questions or requests multiple times, so I made this FAQ to make it easier to reply. Q: Why do you do so many polls? A: I like to think…

Evan Prodromou's Blog
@evan Totally depends on the collection, shouldn't it?
@evan My vote was "around 20" but I went to check my code and it's set to 5. Oops.
@evan configurable 😀
@naturzukunft2026 great. What's the default? And what *should* the value be? It's an opinion poll; have an opinion.
@evan Haha, it depends....
in changinggraph.org it is currently 20

@[email protected] As an opinion that is likely to be very unpopular... the page size ideally should be set by the client. Only they know their resource makeup and ability to process the returned information.

Otherwise, we tend to run page size between 60 and 100 depending on the content. And if it's less than 100 entries (and especially when returning an id-list rather than a list of activities), we'll usually just send them out without paging.

The desire is to balance resource usage to get the highest rate of information transfer, and those are the only levers we have available, and they (currently) aren't settable by clients, so it seems the best we can do is default to "large chunks".

I admit that I'm not fond of the page size of 12 that I found in Eugen's followers list of over half a million entries (some years ago). That's a lot of network requests and makes their clients work a lot harder than they need to.

@macgirvin I'm okay with that. I just wasn't really aware of any perPage style value that could be passed to servers.
feps/9f9f/fep-9f9f.md at main

feps - My FEPs

Codeberg.org
@macgirvin @general @silverpill @julian that is a very useful thing for clients, thanks for that FEP!
@macgirvin the 12-item page size is a real kick in the teeth.
@[email protected] remind me to change pagination size to 1 on April 1st
@macgirvin oh, but: I disagree about configurable page sizes. I think pages should have stable contents, have last-modified dates, and be easily cacheable. It makes traversal and synchronization much better.
@macgirvin like, only the most recent page should be volatile. Except for deletions, older pages should not change.

@[email protected] in reverse chron every page changes when a new item is pushed to the stack.

Even in chronological sets if you need to account for deletions you've basically given up already because there's no immutability guarantee! This is why caching headers exist, no?

@julian that's the way to do reverse chron that messes with caches!

Instead, number your pages from oldest to newest. So page 1 was the first page created. After PAGE_SIZE items have been added, create page 2, make it the `first` page, and now page 1 never changes (unless one of its items gets `Remove`d). All your volatility is in the most recent page, and older pages rarely change.

You can also use UUIDs or other IDs for pages.

@julian the downside is that your `first` page is rarely full. You can get around this by having the `first` page be up to 2 * PAGE_SIZE in length, and shifting PAGE_SIZE items to a new page when the `first` page hits its max.
@[email protected] -- Thanks. Don't know how I missed that, but I'll start using it right away. Cheers.

@[email protected] @[email protected] -- Now I see why I missed it. These additional properties aren't actually defined in the AS spec but only the FEP(?) immediately after implying that the pagination properties were all specified in the AS spec. I get it now, but maybe a wee bit of word-smithing in the FEP could make this a bit clearer.

I guess the question remains how to determine if a site offers cursor-based pagination or recognises 'maxItems' per FEP-9f9f. Guess you need to just try it, maybe with an odd number like 19, and see if what you get back is "consistent" with your request, to see if you should continue trying to use client defined pagination for this site. (This might be another good use case for the server 'capability' mechanism.)