@dredmorbius On the subject of "versioned documents" (as in, Wiki or source-control type systems that publish all previous editions of a document or a set of files, annotated with discussion)

1. I'm still not sure that this is a completely new thing in history. Religious and legal communities (in ancient times, the same groups) have had extensive traditions of "texts, anf commentaries on the texts" for millennia. Eg Judaism's Midrash https://en.m.wikipedia.org/wiki/Midrash

@natecull Wikipedia: 5.4 million English articles, 40 million overall, 500 million unique monthly readers, 18 billion pageviews, 40k high-quality articles (about the same as Britannica's total), 68m registered users, 600k active (I'm presuming "editors" here), 3,500 editors with >100 edits/mo.

I challenge you to match this with /any/ other published work, particularly over the timescale (16 years).

https://en.m.wikipedia.org/wiki/Wikipedia

@dredmorbius I'm not entirely disagreeing with you.

Scale *does* matter.

But I think perhaps you're confusing two separate things here - at least your initial argument did, when you ascribed the consequences of one thing (mass public collaboration enabled by electronic communication) to a specific *form* of communication ("versioned document").

I argue that the second has in fact been among us for millennia.

It's *electronic computers* which have enabled to scale this up.

@dredmorbius I mean, sure, if you choose to define "versioned document" literally AS "Wikipedia, with its huge number of articles and editors and readers".... then yes, I suppose you could combine scale and versioning into one thing.

But there are many wikis - even Ward's Wiki, the one that *invented* the concept - that are much smaller and did not scale as Wikipedia did.

And there are many web-scale comms systems (Facebook, Amazon) that aren't especially Wiki-like.

@natecull I hope it's abundantly clear that I am not /equating/ Wikipedia to a versioned document.

But it is an /exemplar/ (and almost certainly the prime one) of the class.

What did Diderot do? Was or was that not noteable?

http://historyofinformation.com/expanded.php?id=2876

@dredmorbius Wikipedia is *an* examplar, yes. But I'm sure you know that it's hardly the "prime" example because it's not the first. This is: http://wiki.c2.com/?WardsWiki
@dredmorbius So: to me this shows that Wikipedia's scale *is not directly the result of it being a Wiki* but from some other shared goals of the community that created it.

@natecull How would you organise Wikipedia /without/ basing it on a version-control system?

What would that do to the project?

https://en.m.wikipedia.org/wiki/Nupedia

@dredmorbius Yes, I've stated several times that versioning is *one* of the things that allowed Wikipedia.

But so is electricity.

@natecull Electricity has been part of publishing since the late 19th century. Electronic data systems have existed since the 1950s. Personal and business-use systems since the 1970s. Networked since the 1980s. Public internet, since the 1990s.

And yet until 2001, putting together the Web, version control, Markdown, and hypertext, /and/ the appropriate social context and goal, you didn't have a WIkipedia.

@dredmorbius Sure! All of these are small incremental technological changes that together led to something big.

But Wikipedia isn't the only large Internet-age data project, is it?

What about the Internet Archive? Facebook? Amazon? Google itself?

What about data stored on web forums, and used to coordinate large projects?

Yes, I think versioned documents are one useful tool, but so too are threaded discussions and many others.

@dredmorbius

And in fact I think MediaWiki, Wikipedia's engine, does quite a poor job of even versioning - eg, if you delete a page it's just gone - and it also doesn't do threaded conversations AT ALL. People cope, but they'd cope better I think if they had threads available as well.

Wikipedia's also moving into Linked Data, which I think is maybe as significant as wiki pages, maybe more so.

@dredmorbius So what I'd like - what I've wanted for over ten years now - is a Weblike environment that offers:

* versioning
* linking
* threading
* realtime notification
* authentication and access control
* semantic data
* import and export
* search

all in one unified system. So we don't have to, eg, fire up a CMS, a Wiki, a web forum, a social media or chat channel, a database, and a source control system, just to manage one project.

@dredmorbius This set of requirements goes *beyond* just 'versioning', is what I'm saying.

Maybe it's just my particular view. Certainly storing all past versions, and having some kind of diff format (which, by the way, Wikipedia DOESN'T have but git DOES, so they're a fair bit different in this regard) is one of the important steps forward, as it's a great buffer against accidents and arguments and other social forces that can derail a large project.

@natecull Instead of "it is! it isn't!", how about a compelling list of elements that you think /are/ salient to Wikipedia, or even a Wikipedia successor?

I keep adding it up and finding version control (and history) as a critical path.

@dredmorbius Hmm, good idea.

Required:
* widespread access to the Web (so not possible before about 2000)
* an Internet not yet jaded by troll culture to the point of turning off anonymous edits (so maybe not after 2015)
* a useful existing model to target (eg, Encyclopedia Brittannica / Nupedia)
* free content to scrape and copy (eg CIA World Factbook, out-of-copyright encyclopedias)
* Creative Commons licencing
* Linux and IETF as social organising models
* Wiki as a technology

@dredmorbius What could be required to advance the concept further:

* An equivalent to Creative Commons that covers fine-grained data
* fully distributed, fine-grained version control and authentication that works with data nodes and is much easier to use than git
* a peer-to-peer non-Web hosting technology that allows people to clone repos and share from mobile devices
* a programming language that's safe, portable, pure-functional /declarative and fine-grained
* a GUI for it

@dredmorbius and unfortunately a lot of this, if implemented correctly (ie with cryptography for authentication) will hard fail against our current Western government regimes which have outlawed all crypto and communications they can't control.

(This is also where my earlier qualms about how crypto will work with fine-grained multi-repository data come in. It may be that we *can't* use crypto safely in such a system, only one with minimum-sized documents.)

@dredmorbius Eg: could I encrypt my individual toots? Probably, as long as they were padded with random data up to, idk, what *would* be a sensible minimum?

So there would probably be a fair bit of network overhead for security.

Named Data, I think, requires mandatory encryption of all packets. Gonna be interesting to see if that one ever gets off the ground. https://named-data.net/

@natecull We've discussed this already. Yes. Padding.

@dredmorbius That we've discussed something doesn't mean we've rendered something irrelevant.

If, say, a door lock transmitting a one byte status update requires, say, a 1 megabyte minimum data cost because of padding - this may have important consequences to the design of a network protocol.

For instance, pervasive crypto may be a force on a system tending towards large units of data granularity rather small, and duplication may be safer *on a network* than structure sharing.

@dredmorbius The other thing that crypto can fight against is caching. I like pervasive caching, but as soon as you turn it on, you can now identify *what* document (or rather, collection kf nodes) someone accessed even if you can't see inside.

Once you start adding diffs or deltas into the data protocol, crypto starts looking increasingly unfeasible.

This might turn out to be a major problem.

@natecull On caching: I need more clarity on what you're thinking here, methods, mechanisms, threat models.

Variously:

* The entire cache store might be encrypted.
* There's the option of distributed cleartext caches. Roughly how today's Internet / Web works.
* In a distributed cache: indirection

There's also the question of cache /integrity/.