@dredmorbius On the subject of "versioned documents" (as in, Wiki or source-control type systems that publish all previous editions of a document or a set of files, annotated with discussion)

1. I'm still not sure that this is a completely new thing in history. Religious and legal communities (in ancient times, the same groups) have had extensive traditions of "texts, anf commentaries on the texts" for millennia. Eg Judaism's Midrash https://en.m.wikipedia.org/wiki/Midrash

@natecull Wikipedia: 5.4 million English articles, 40 million overall, 500 million unique monthly readers, 18 billion pageviews, 40k high-quality articles (about the same as Britannica's total), 68m registered users, 600k active (I'm presuming "editors" here), 3,500 editors with >100 edits/mo.

I challenge you to match this with /any/ other published work, particularly over the timescale (16 years).

https://en.m.wikipedia.org/wiki/Wikipedia

@dredmorbius I'm not entirely disagreeing with you.

Scale *does* matter.

But I think perhaps you're confusing two separate things here - at least your initial argument did, when you ascribed the consequences of one thing (mass public collaboration enabled by electronic communication) to a specific *form* of communication ("versioned document").

I argue that the second has in fact been among us for millennia.

It's *electronic computers* which have enabled to scale this up.

@dredmorbius I mean, sure, if you choose to define "versioned document" literally AS "Wikipedia, with its huge number of articles and editors and readers".... then yes, I suppose you could combine scale and versioning into one thing.

But there are many wikis - even Ward's Wiki, the one that *invented* the concept - that are much smaller and did not scale as Wikipedia did.

And there are many web-scale comms systems (Facebook, Amazon) that aren't especially Wiki-like.

@natecull I hope it's abundantly clear that I am not /equating/ Wikipedia to a versioned document.

But it is an /exemplar/ (and almost certainly the prime one) of the class.

What did Diderot do? Was or was that not noteable?

http://historyofinformation.com/expanded.php?id=2876

@dredmorbius Wikipedia is *an* examplar, yes. But I'm sure you know that it's hardly the "prime" example because it's not the first. This is: http://wiki.c2.com/?WardsWiki
@dredmorbius So: to me this shows that Wikipedia's scale *is not directly the result of it being a Wiki* but from some other shared goals of the community that created it.

@dredmorbius Eg, one of the things that made Wikipedia work was a deliberate commitment to "being an encyclopedia", which narrowed its scope, gave it an immediate useful purpose (which precursors like C2 or H2G2 or Everything2 - didn't have), and allowed for community judgements on what was or wasn't "in the house style" and "notable".

If we compare Wikipedia to its non-wiki precursors and rivals (whose names I've forgotten), yes the open-editing concept enabled scale...

@dredmorbius ... and versioning (and particularly, the ability to scan the history looking for vandalism, and instantly recover vandalised pages) is *one* of the tools that enabled open editing to scale to an increasingly hostile web.

It wasn't the only thing though that safeguarded Wikipedia. I think having lots of bots monitoring it and having a foundation backing it, and the support of Google, also helped navigate the transition to scale.

@natecull There are two ways to approach critical success.

One is to look at what /contributed to success/.

Another is looking at what /didn't get the fuck in the way of it/.

In /any/ mass-adoption phenomenon, there's going to be a pretty significant degree of blind luck and path dependency. But not getting in your own damned way is also hugely useful.

There are plenty of failures to consider -- most of them involve some degree of self-sabotage.

@dredmorbius Right.

I think what I'm trying to say is that the concept you keep calling 'version' is actually 'massive easy-access collaboration'.

With 'versioning', or some other form of safeguarding against bad changes, being one of several necessary *but not sufficient* enabling techniques.

because versioning has been around for millennia, but massive public collaboration hasn't.

@natecull How would you organise Wikipedia /without/ basing it on a version-control system?

What would that do to the project?

https://en.m.wikipedia.org/wiki/Nupedia

@dredmorbius Yes, I've stated several times that versioning is *one* of the things that allowed Wikipedia.

But so is electricity.

@natecull Electricity has been part of publishing since the late 19th century. Electronic data systems have existed since the 1950s. Personal and business-use systems since the 1970s. Networked since the 1980s. Public internet, since the 1990s.

And yet until 2001, putting together the Web, version control, Markdown, and hypertext, /and/ the appropriate social context and goal, you didn't have a WIkipedia.

@natecull You didn't have anything /close/ in scale to Wikipedia.

Yes, there were precursors, but ... how to say? Nothing with the social and cultural significance.

Hell, I've been reading a 2006 systems admin guide which still feels /it/ has to make the case for using Wikis /in a technical, professional workplace/.

(O'Reilly, "Time Management for Systems Administrators".)

@dredmorbius Sure! All of these are small incremental technological changes that together led to something big.

But Wikipedia isn't the only large Internet-age data project, is it?

What about the Internet Archive? Facebook? Amazon? Google itself?

What about data stored on web forums, and used to coordinate large projects?

Yes, I think versioned documents are one useful tool, but so too are threaded discussions and many others.

@dredmorbius

And in fact I think MediaWiki, Wikipedia's engine, does quite a poor job of even versioning - eg, if you delete a page it's just gone - and it also doesn't do threaded conversations AT ALL. People cope, but they'd cope better I think if they had threads available as well.

Wikipedia's also moving into Linked Data, which I think is maybe as significant as wiki pages, maybe more so.

@dredmorbius So what I'd like - what I've wanted for over ten years now - is a Weblike environment that offers:

* versioning
* linking
* threading
* realtime notification
* authentication and access control
* semantic data
* import and export
* search

all in one unified system. So we don't have to, eg, fire up a CMS, a Wiki, a web forum, a social media or chat channel, a database, and a source control system, just to manage one project.

@dredmorbius This set of requirements goes *beyond* just 'versioning', is what I'm saying.

Maybe it's just my particular view. Certainly storing all past versions, and having some kind of diff format (which, by the way, Wikipedia DOESN'T have but git DOES, so they're a fair bit different in this regard) is one of the important steps forward, as it's a great buffer against accidents and arguments and other social forces that can derail a large project.

@natecull So, Wikipedia /uses/ diffs but doesn't /expose/ them.

There's a blogging tool on Github which goes all the way there. Your blog /is/ a Git repo.

@natecull Instead of "it is! it isn't!", how about a compelling list of elements that you think /are/ salient to Wikipedia, or even a Wikipedia successor?

I keep adding it up and finding version control (and history) as a critical path.

@dredmorbius Hmm, good idea.

Required:
* widespread access to the Web (so not possible before about 2000)
* an Internet not yet jaded by troll culture to the point of turning off anonymous edits (so maybe not after 2015)
* a useful existing model to target (eg, Encyclopedia Brittannica / Nupedia)
* free content to scrape and copy (eg CIA World Factbook, out-of-copyright encyclopedias)
* Creative Commons licencing
* Linux and IETF as social organising models
* Wiki as a technology

@natecull A huge part of this is that /social/ projects require /social/ organisation, /not/ just technology.

The good news is that social organisation has a strong tendency to emerge, despite best efforts to suppress it.

A key part of Wikipedia's success (and a large source of its frictions) has been that social component. Tuning and dialing in on that has been critical.

@dredmorbius What could be required to advance the concept further:

* An equivalent to Creative Commons that covers fine-grained data
* fully distributed, fine-grained version control and authentication that works with data nodes and is much easier to use than git
* a peer-to-peer non-Web hosting technology that allows people to clone repos and share from mobile devices
* a programming language that's safe, portable, pure-functional /declarative and fine-grained
* a GUI for it

@dredmorbius and unfortunately a lot of this, if implemented correctly (ie with cryptography for authentication) will hard fail against our current Western government regimes which have outlawed all crypto and communications they can't control.

(This is also where my earlier qualms about how crypto will work with fine-grained multi-repository data come in. It may be that we *can't* use crypto safely in such a system, only one with minimum-sized documents.)

@dredmorbius Eg: could I encrypt my individual toots? Probably, as long as they were padded with random data up to, idk, what *would* be a sensible minimum?

So there would probably be a fair bit of network overhead for security.

Named Data, I think, requires mandatory encryption of all packets. Gonna be interesting to see if that one ever gets off the ground. https://named-data.net/

@dredmorbius
Ted Nelson's Xanadu vision, I think, included some kind of DRM built in, so that copyrighted data could be marked and then you'd automatically get billed each time you viewed it.

I don't like that. But we're going to end up with a corporate net and a hacker net probably because of things like copyright and anti-crypto laws.

(Xanadu was also patented; one of the reasons why I've been extremely un-motivated to study it closely.)

@dredmorbius It would be an interesting experiment, actually, to see if version control *is* like the information equivalent of fire: an enabling tool for large breakthroughs.

It might be! Like, I keep orbiting around 'pure functions and immutable data' but that is essentially the same thing, isn't it?

If we had an OS that was pervasively version-controlled in everything it did, from transactions to files to apps...

Trick is how to get 1) privacy and 2) performance.

@dredmorbius ie: when you share data, sometimes it is VERY important that you DON'T share your full change history. Collaborative projects like Wikipedia can get by with making everything open but that's because all the information is public and of narrowly defined scope to keep it legal. And even then, there have been some cases involving living celebrities with extremely alive lawyers.
@dredmorbius @natecull nixos is versioned

@alanz @natecull Not familiar with it.

VMS also had the concept of version-numbered (but not diffed) files.

@natecull @dredmorbius see nixos.org. Essentially everything in the operating system is versioned via a hash computed over it and it's inputs. So it allows exact reproduction of say a dev environment, or the state of a server at a particular time. And you can roll back and forward between them

@natecull @dredmorbius Xanadu’s patents may be an important reason why it never really took off.

That said, would whoever borrowed my copy of _Computer Lib / Dream Machines_ years ago please return it?

@natecull My view is that current copyright is fundamentally incompatible with free-flow of information. There's a substantial literature on this.
@natecull We've discussed this already. Yes. Padding.

@dredmorbius That we've discussed something doesn't mean we've rendered something irrelevant.

If, say, a door lock transmitting a one byte status update requires, say, a 1 megabyte minimum data cost because of padding - this may have important consequences to the design of a network protocol.

For instance, pervasive crypto may be a force on a system tending towards large units of data granularity rather small, and duplication may be safer *on a network* than structure sharing.

@dredmorbius The other thing that crypto can fight against is caching. I like pervasive caching, but as soon as you turn it on, you can now identify *what* document (or rather, collection kf nodes) someone accessed even if you can't see inside.

Once you start adding diffs or deltas into the data protocol, crypto starts looking increasingly unfeasible.

This might turn out to be a major problem.

@natecull On caching: I need more clarity on what you're thinking here, methods, mechanisms, threat models.

Variously:

* The entire cache store might be encrypted.
* There's the option of distributed cleartext caches. Roughly how today's Internet / Web works.
* In a distributed cache: indirection

There's also the question of cache /integrity/.

@natecull Keep in mind that padding itself can be variable (and works far better when it is).

So: 1 byte of signal + some n bytes of noise.

How much noise do you need to add to your signal to provide sufficient entropy?

And again: we've discussed this and there's ample prior art, particularly in SSH, which is small-message-based encrypted updates.

@dredmorbius Sorry, but no. *You've* discussed this, *you* think you've come to a solution, but just 'padding, done' is not the solution because you haven't understood the actual problem yet.

My apologies for not being clearer: I'm trying to articulate a network design *which does not yet exist*. So some parts may be vague.

@dredmorbius Okay. So we have a vast sea of node structure (eg: JSON objects, perhaps). We want to cache this node structure pervasively across the network (eg the entire Internet). We want to broadcast updates to this node structure, to those servers and devices interested in (ie: subscribed to) parts of it. We want to keep such broadcasts to a minimum, sending only changes.

We also want to encrypt the node structure to prevent eavesdropping.

This last may be impossible.

@dredmorbius Ie: to encrypt a collection of node structure (again, think: JSON object, perhaps very large JSON object) we have to render it opaque.

We can cache it as an opaque blob.

But to access and update parts of it we have to render it transparent.

There is a natural conflict here which may be impossible to fully resolve.

@dredmorbius The problem - to put it in the 'legibility' framework - is that *encryption forces illegibility*, I mean that's it's feature, right? While caching and diffing and random access key lookup require *legibility*.

So these two forces are at odds.

What this means is that we could (pervasively, in the protocol) encrypt *parts* of such a network, but not *the whole*. The more you let people search it the more data you reveal.

We might be able to *sign* the whole though.

@natecull One key point about caching is that, depending on how you do it, the index of the object /is/ its identity. If your cache index is a hash, then, if you've got the hash, you're a long ways to knowing the object itself. /Or at least recognising it if you see it./

I've been thinking myself about what a distributed / webfs type system might imply by this, and especially what information it might leak.

There's public vs. private information.

@natecull KFC + webfs.

Keep in mind that the tendency to hubs is in part what gave us what we've got. Market towns (or booksellers shops, or libraries) have value. Signifiers of reputation and value, /as well as/ discoverability.

If HTML + HTTP had a few more features, we could eliminate much:

* A native ID assertion capability (/with/ the ability to log out goddamnit!).
* Threading as a native construct.
* Dynamic content control: sorting, expanding, collapsing threads.

@natecull Also of tables. Why HTTP has tables *but no native capability to work with tabular content* is something ... I don't understand.

* Notifications. RSS/Atom goes a long way to providing that.

I'd also like to see independence between /content/ and /framing/. Site-specific crud. In many cases I'd like to see the concept of a /site/ fall away virtually completely.

Not sure where you're going w/ import/export.

@dredmorbius Agree on all of this too.

The import/export thing is about how to fork or merge a project, really.

So many web technologies (including wikis!) use SQL RDBMSes to store their stuff that it's quite problematic to transfer content from one site to another.

If we didn't have a concept of a 'site' and if 'files' and 'tables' were all instances of collections of the same data (with versioning, yes) and you could fork and revert sites as easily as files, that'd be useful.

@natecull The real sticker on stuff like this is probably a combination of rights and finance.

* Rights: ensuring that some asshat can't shut you down with legal threats.

* Finance: finding /either/ a way to make the activity profitable, or make it an adjunct of a beneficial activity, such that it's self-sustaining.

The flipside of rights is that if a bunch of asshats try to make others' lives miserable through publishing stuff, well, that might not be so good either.

@dredmorbius Yep! Agree with all of this.

One thing I notice with systems like Mastodon is how the natural unit of text is NOT a 'page' but a sub-page unit.

In fact the tendency is to use HTML's 'page' just as a modern VT-100 talking to a a modern mainframe.

It might be nice if HTTP/HTML naturally had some kind of 'collection of sub-page units' ability,

@natecull On threaded discussions, @woozle keeps trying to convince me to use various Wiki tools on his constellation of sites. Which ... pretty much always leave me 1) confused and 2) screaming from styling conventions.

I'm fairly happy with how my re-styled subreddit appears, generally. Though I'm still partial to Scoop's dynamic thread expansion/collapse. @rustyk5 FTW.

@dredmorbius (cc: @natecull @rustyk5 ) You need to *tell* me if there are issues.

MediaWiki can be restyled either on a per-user basis or globally by editing a page.

I'm not sure what to do about the

@rustyk5 @natecull @dredmorbius

(somehow my toot didn't get finished before sending)

[I'm not sure what to do about the] confusion; I guess I need to know how it's confusing.

@woozle @rustyk5 @natecull I've had an occasional tootaculation problem. I generally copy the bits I wanted, delete the original, and re-post, if I catch that immediately.

Life /is/ edits.

@dredmorbius @rustyk5 @natecull If I'd caught it quickly, I woulda.

Didn't see it until about half an hour later (according to timestamps, which matches my untrustworthy memory of my untrustworthy time-sense).

@natecull @dredmorbius Actually, deleted pages are visible to admins and can be recovered with a couple of clicks.

Agreed on threaded convos -- with two caveats:

1. Look on the talk page for any popular article, and you'll see threaded discussions with essentially hand-made threads.

2. I use an extension called LiquidThreads for threaded discussions. There are issues with the UI, but the underlying design seems sound. There's another I haven't tried yet.

@woozle @dredmorbius

Yes, 'essentially hand-made threads' is what I'm referring to when I say that MediaWiki doesn't have *actual* threads.

People sort of cope doing it manually instead of having the real thing, yes. But it would be nice to not set our goals at 'sort of coping around the edges of our technology'.

Not a gripe at MediaWiki especially, but that a thing doesn't yet exist which combines Wiki/Threads/Blogs/Toots into a single tech.

@dredmorbius @woozle And I was aware that there was a separate admin-only interface for deleted pages.

That just kind of highlights my point, though: *why* are pages in a wiki (any wiki) treated so differently than normal edits? Why can anyone create a page, but only admins can delete/undelete? Why not just version *page create/delete events themselves*?

Cf a source control system, where file create/deletes *are* just ordinary changes. At least I assume they are.

@woozle @dredmorbius I do agree though that MediaWiki's 'Talk' page was a huge step forward and (along with authenticated users) is a large part of what was missing in Ward Cunningham's original Wiki concept and early similar systems.

Talking the meta-discussions out of the page itself very much improved the quality of the product. As, I think, did the 'encyclopedia' concept which focused a community around *being* a product (which, eg, H2G2 and Everything2 didn't have).

@natecull @dredmorbius All that said, I bumped into some design-level MediaWiki limitations a few years ago and have been putting my wiki-design-improvement efforts into writing a new set of tools for managing content.