@dredmorbius On the subject of "versioned documents" (as in, Wiki or source-control type systems that publish all previous editions of a document or a set of files, annotated with discussion)

1. I'm still not sure that this is a completely new thing in history. Religious and legal communities (in ancient times, the same groups) have had extensive traditions of "texts, anf commentaries on the texts" for millennia. Eg Judaism's Midrash https://en.m.wikipedia.org/wiki/Midrash

@dredmorbius

2. To the extent that electronic publishing is new, the newness resides in a matter of scale, as you say.

But a particular complication that the "versioned document" (or as I'd rather describe it, document *archive*) presents is that it's NOT in fact a document ( a fixed thing) - it's a PUBLISHER. Or a broadcaster.

The point being that you can take Wikipedia-the-publisher offline and lose a large part of what makes it different from ordinary documents.

@dredmorbius Git trees are theoretically more robust to central attacks, as theoretically anyone can pull and push from anywhere. In theory.

In practice, though, everyone uses Github, who have taken on the "publisher" role for a format which was designed to not require or tend towards central publishers. Which seems a bit of a problem to me.

@natecull The tendency to centralise is problematic. A multi-hubbed Git would help.

@dredmorbius The thing is that git as a protocol doesn't have the concept of 'hub' at all - every user is their own hub. At least as far as I understand git, which is not very much.

(Sidebar: Why is git so complicated to understand? Shouldn't 'apply change to document' be something roughly as understandable as 'add two integers'? If not, why not?)

So git should not have led to Github, but it did. Much like Bitcoin should not have centralised processing in China, but it did.

@natecull Git doesn't have the notion of a hub, but the concept of curating, finding, searching, and sharing content tends strongly toward that. See what I wrote earlier about large libraries and search costs.

@natecull Scale *MATTERS*.

A jet airplane is an oxcart. At scale.

A hydrogen bomb is Greek Fire. At scale.

Amazon is the corner hardware store. At scale.

Yes, Wikipedia is, in one sense of the word, a publisher. But it's a publisher like none other in the history of the world, in terms of:

* Who can participate.
* Who can access.
* How much it's written.
* On what topics.
* With what update frequency.

Earlier encyclopedias might be updated every few _decades_.

@natecull How many such works? How many communities? How many authors? With what update frequency? What degree of access?

The total number of /books/ (not individual titles) in Europe as of 1800 was < 1 billion (~970 or so). Call it, at 5 MB per book, 5 petabytes.

All of Wikipedia is 6 TB. So 1,000 copies of Wikipedia would exceed /all printed knowledge in Europe as of 1800/.

And that's updatable. By pretty much anyone, anywhere, at any time.

@natecull There were serial and annual publications. There's a Library of Congress classification for these: yearbooks. Which gives you a sense of the update frequency.

What evidence would change your mind on this?

@natecull Encyclopedia Britannica, /through all history/, (1768 - present, 249 years) has had only 15 editions (though the 15th, and current, sees annual revisions).

It's simply not comparable.

https://en.m.wikipedia.org/wiki/Encyclop%C3%A6dia_Britannica#History

@natecull Wikipedia: 5.4 million English articles, 40 million overall, 500 million unique monthly readers, 18 billion pageviews, 40k high-quality articles (about the same as Britannica's total), 68m registered users, 600k active (I'm presuming "editors" here), 3,500 editors with >100 edits/mo.

I challenge you to match this with /any/ other published work, particularly over the timescale (16 years).

https://en.m.wikipedia.org/wiki/Wikipedia

@dredmorbius I'm not entirely disagreeing with you.

Scale *does* matter.

But I think perhaps you're confusing two separate things here - at least your initial argument did, when you ascribed the consequences of one thing (mass public collaboration enabled by electronic communication) to a specific *form* of communication ("versioned document").

I argue that the second has in fact been among us for millennia.

It's *electronic computers* which have enabled to scale this up.

@dredmorbius I mean, sure, if you choose to define "versioned document" literally AS "Wikipedia, with its huge number of articles and editors and readers".... then yes, I suppose you could combine scale and versioning into one thing.

But there are many wikis - even Ward's Wiki, the one that *invented* the concept - that are much smaller and did not scale as Wikipedia did.

And there are many web-scale comms systems (Facebook, Amazon) that aren't especially Wiki-like.

@natecull I hope it's abundantly clear that I am not /equating/ Wikipedia to a versioned document.

But it is an /exemplar/ (and almost certainly the prime one) of the class.

What did Diderot do? Was or was that not noteable?

http://historyofinformation.com/expanded.php?id=2876

@dredmorbius Wikipedia is *an* examplar, yes. But I'm sure you know that it's hardly the "prime" example because it's not the first. This is: http://wiki.c2.com/?WardsWiki
@dredmorbius So: to me this shows that Wikipedia's scale *is not directly the result of it being a Wiki* but from some other shared goals of the community that created it.

@dredmorbius Eg, one of the things that made Wikipedia work was a deliberate commitment to "being an encyclopedia", which narrowed its scope, gave it an immediate useful purpose (which precursors like C2 or H2G2 or Everything2 - didn't have), and allowed for community judgements on what was or wasn't "in the house style" and "notable".

If we compare Wikipedia to its non-wiki precursors and rivals (whose names I've forgotten), yes the open-editing concept enabled scale...

@dredmorbius ... and versioning (and particularly, the ability to scan the history looking for vandalism, and instantly recover vandalised pages) is *one* of the tools that enabled open editing to scale to an increasingly hostile web.

It wasn't the only thing though that safeguarded Wikipedia. I think having lots of bots monitoring it and having a foundation backing it, and the support of Google, also helped navigate the transition to scale.

@natecull There are two ways to approach critical success.

One is to look at what /contributed to success/.

Another is looking at what /didn't get the fuck in the way of it/.

In /any/ mass-adoption phenomenon, there's going to be a pretty significant degree of blind luck and path dependency. But not getting in your own damned way is also hugely useful.

There are plenty of failures to consider -- most of them involve some degree of self-sabotage.

@dredmorbius Right.

I think what I'm trying to say is that the concept you keep calling 'version' is actually 'massive easy-access collaboration'.

With 'versioning', or some other form of safeguarding against bad changes, being one of several necessary *but not sufficient* enabling techniques.

because versioning has been around for millennia, but massive public collaboration hasn't.

@natecull How would you organise Wikipedia /without/ basing it on a version-control system?

What would that do to the project?

https://en.m.wikipedia.org/wiki/Nupedia

@dredmorbius Yes, I've stated several times that versioning is *one* of the things that allowed Wikipedia.

But so is electricity.

@natecull Electricity has been part of publishing since the late 19th century. Electronic data systems have existed since the 1950s. Personal and business-use systems since the 1970s. Networked since the 1980s. Public internet, since the 1990s.

And yet until 2001, putting together the Web, version control, Markdown, and hypertext, /and/ the appropriate social context and goal, you didn't have a WIkipedia.

@natecull You didn't have anything /close/ in scale to Wikipedia.

Yes, there were precursors, but ... how to say? Nothing with the social and cultural significance.

Hell, I've been reading a 2006 systems admin guide which still feels /it/ has to make the case for using Wikis /in a technical, professional workplace/.

(O'Reilly, "Time Management for Systems Administrators".)

@dredmorbius Sure! All of these are small incremental technological changes that together led to something big.

But Wikipedia isn't the only large Internet-age data project, is it?

What about the Internet Archive? Facebook? Amazon? Google itself?

What about data stored on web forums, and used to coordinate large projects?

Yes, I think versioned documents are one useful tool, but so too are threaded discussions and many others.

@dredmorbius

And in fact I think MediaWiki, Wikipedia's engine, does quite a poor job of even versioning - eg, if you delete a page it's just gone - and it also doesn't do threaded conversations AT ALL. People cope, but they'd cope better I think if they had threads available as well.

Wikipedia's also moving into Linked Data, which I think is maybe as significant as wiki pages, maybe more so.

@dredmorbius So what I'd like - what I've wanted for over ten years now - is a Weblike environment that offers:

* versioning
* linking
* threading
* realtime notification
* authentication and access control
* semantic data
* import and export
* search

all in one unified system. So we don't have to, eg, fire up a CMS, a Wiki, a web forum, a social media or chat channel, a database, and a source control system, just to manage one project.

@dredmorbius This set of requirements goes *beyond* just 'versioning', is what I'm saying.

Maybe it's just my particular view. Certainly storing all past versions, and having some kind of diff format (which, by the way, Wikipedia DOESN'T have but git DOES, so they're a fair bit different in this regard) is one of the important steps forward, as it's a great buffer against accidents and arguments and other social forces that can derail a large project.

@natecull So, Wikipedia /uses/ diffs but doesn't /expose/ them.

There's a blogging tool on Github which goes all the way there. Your blog /is/ a Git repo.

@natecull Instead of "it is! it isn't!", how about a compelling list of elements that you think /are/ salient to Wikipedia, or even a Wikipedia successor?

I keep adding it up and finding version control (and history) as a critical path.

@dredmorbius Hmm, good idea.

Required:
* widespread access to the Web (so not possible before about 2000)
* an Internet not yet jaded by troll culture to the point of turning off anonymous edits (so maybe not after 2015)
* a useful existing model to target (eg, Encyclopedia Brittannica / Nupedia)
* free content to scrape and copy (eg CIA World Factbook, out-of-copyright encyclopedias)
* Creative Commons licencing
* Linux and IETF as social organising models
* Wiki as a technology

@natecull A huge part of this is that /social/ projects require /social/ organisation, /not/ just technology.

The good news is that social organisation has a strong tendency to emerge, despite best efforts to suppress it.

A key part of Wikipedia's success (and a large source of its frictions) has been that social component. Tuning and dialing in on that has been critical.

@dredmorbius What could be required to advance the concept further:

* An equivalent to Creative Commons that covers fine-grained data
* fully distributed, fine-grained version control and authentication that works with data nodes and is much easier to use than git
* a peer-to-peer non-Web hosting technology that allows people to clone repos and share from mobile devices
* a programming language that's safe, portable, pure-functional /declarative and fine-grained
* a GUI for it

@dredmorbius and unfortunately a lot of this, if implemented correctly (ie with cryptography for authentication) will hard fail against our current Western government regimes which have outlawed all crypto and communications they can't control.

(This is also where my earlier qualms about how crypto will work with fine-grained multi-repository data come in. It may be that we *can't* use crypto safely in such a system, only one with minimum-sized documents.)

@dredmorbius Eg: could I encrypt my individual toots? Probably, as long as they were padded with random data up to, idk, what *would* be a sensible minimum?

So there would probably be a fair bit of network overhead for security.

Named Data, I think, requires mandatory encryption of all packets. Gonna be interesting to see if that one ever gets off the ground. https://named-data.net/

@dredmorbius
Ted Nelson's Xanadu vision, I think, included some kind of DRM built in, so that copyrighted data could be marked and then you'd automatically get billed each time you viewed it.

I don't like that. But we're going to end up with a corporate net and a hacker net probably because of things like copyright and anti-crypto laws.

(Xanadu was also patented; one of the reasons why I've been extremely un-motivated to study it closely.)

@dredmorbius It would be an interesting experiment, actually, to see if version control *is* like the information equivalent of fire: an enabling tool for large breakthroughs.

It might be! Like, I keep orbiting around 'pure functions and immutable data' but that is essentially the same thing, isn't it?

If we had an OS that was pervasively version-controlled in everything it did, from transactions to files to apps...

Trick is how to get 1) privacy and 2) performance.

@natecull @dredmorbius Xanadu’s patents may be an important reason why it never really took off.

That said, would whoever borrowed my copy of _Computer Lib / Dream Machines_ years ago please return it?

@natecull My view is that current copyright is fundamentally incompatible with free-flow of information. There's a substantial literature on this.
@natecull We've discussed this already. Yes. Padding.

@dredmorbius That we've discussed something doesn't mean we've rendered something irrelevant.

If, say, a door lock transmitting a one byte status update requires, say, a 1 megabyte minimum data cost because of padding - this may have important consequences to the design of a network protocol.

For instance, pervasive crypto may be a force on a system tending towards large units of data granularity rather small, and duplication may be safer *on a network* than structure sharing.

@natecull KFC + webfs.

Keep in mind that the tendency to hubs is in part what gave us what we've got. Market towns (or booksellers shops, or libraries) have value. Signifiers of reputation and value, /as well as/ discoverability.

If HTML + HTTP had a few more features, we could eliminate much:

* A native ID assertion capability (/with/ the ability to log out goddamnit!).
* Threading as a native construct.
* Dynamic content control: sorting, expanding, collapsing threads.

@natecull Also of tables. Why HTTP has tables *but no native capability to work with tabular content* is something ... I don't understand.

* Notifications. RSS/Atom goes a long way to providing that.

I'd also like to see independence between /content/ and /framing/. Site-specific crud. In many cases I'd like to see the concept of a /site/ fall away virtually completely.

Not sure where you're going w/ import/export.

@dredmorbius Agree on all of this too.

The import/export thing is about how to fork or merge a project, really.

So many web technologies (including wikis!) use SQL RDBMSes to store their stuff that it's quite problematic to transfer content from one site to another.

If we didn't have a concept of a 'site' and if 'files' and 'tables' were all instances of collections of the same data (with versioning, yes) and you could fork and revert sites as easily as files, that'd be useful.

@natecull The real sticker on stuff like this is probably a combination of rights and finance.

* Rights: ensuring that some asshat can't shut you down with legal threats.

* Finance: finding /either/ a way to make the activity profitable, or make it an adjunct of a beneficial activity, such that it's self-sustaining.

The flipside of rights is that if a bunch of asshats try to make others' lives miserable through publishing stuff, well, that might not be so good either.

@dredmorbius Yep! Agree with all of this.

One thing I notice with systems like Mastodon is how the natural unit of text is NOT a 'page' but a sub-page unit.

In fact the tendency is to use HTML's 'page' just as a modern VT-100 talking to a a modern mainframe.

It might be nice if HTTP/HTML naturally had some kind of 'collection of sub-page units' ability,

@natecull On threaded discussions, @woozle keeps trying to convince me to use various Wiki tools on his constellation of sites. Which ... pretty much always leave me 1) confused and 2) screaming from styling conventions.

I'm fairly happy with how my re-styled subreddit appears, generally. Though I'm still partial to Scoop's dynamic thread expansion/collapse. @rustyk5 FTW.

@dredmorbius (cc: @natecull @rustyk5 ) You need to *tell* me if there are issues.

MediaWiki can be restyled either on a per-user basis or globally by editing a page.

I'm not sure what to do about the

@rustyk5 @natecull @dredmorbius

(somehow my toot didn't get finished before sending)

[I'm not sure what to do about the] confusion; I guess I need to know how it's confusing.

@woozle @rustyk5 @natecull I've had an occasional tootaculation problem. I generally copy the bits I wanted, delete the original, and re-post, if I catch that immediately.

Life /is/ edits.

@dredmorbius @rustyk5 @natecull If I'd caught it quickly, I woulda.

Didn't see it until about half an hour later (according to timestamps, which matches my untrustworthy memory of my untrustworthy time-sense).

@natecull @dredmorbius Actually, deleted pages are visible to admins and can be recovered with a couple of clicks.

Agreed on threaded convos -- with two caveats:

1. Look on the talk page for any popular article, and you'll see threaded discussions with essentially hand-made threads.

2. I use an extension called LiquidThreads for threaded discussions. There are issues with the UI, but the underlying design seems sound. There's another I haven't tried yet.

@woozle @dredmorbius

Yes, 'essentially hand-made threads' is what I'm referring to when I say that MediaWiki doesn't have *actual* threads.

People sort of cope doing it manually instead of having the real thing, yes. But it would be nice to not set our goals at 'sort of coping around the edges of our technology'.

Not a gripe at MediaWiki especially, but that a thing doesn't yet exist which combines Wiki/Threads/Blogs/Toots into a single tech.

@dredmorbius @woozle And I was aware that there was a separate admin-only interface for deleted pages.

That just kind of highlights my point, though: *why* are pages in a wiki (any wiki) treated so differently than normal edits? Why can anyone create a page, but only admins can delete/undelete? Why not just version *page create/delete events themselves*?

Cf a source control system, where file create/deletes *are* just ordinary changes. At least I assume they are.

@woozle @dredmorbius I do agree though that MediaWiki's 'Talk' page was a huge step forward and (along with authenticated users) is a large part of what was missing in Ward Cunningham's original Wiki concept and early similar systems.

Talking the meta-discussions out of the page itself very much improved the quality of the product. As, I think, did the 'encyclopedia' concept which focused a community around *being* a product (which, eg, H2G2 and Everything2 didn't have).

@natecull @dredmorbius All that said, I bumped into some design-level MediaWiki limitations a few years ago and have been putting my wiki-design-improvement efforts into writing a new set of tools for managing content.