60 lines
14 KiB
HTML
Raw Permalink Normal View History

2024-03-15 14:52:38 +08:00
<div id="readability-page-1" class="page">
<section>
<p> I saw <a href="https://www.youtube.com/watch?v=x7drE24geUw">Martin Kleppmanns talk</a> a few weeks ago about his approach to realtime editing with <a href="https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type">CRDTs</a>, and I felt a deep sense of despair. Maybe all the work Ive been doing for the past decade wont be part of the future after all, because Martins work will supersede it. Its really good. </p>
<p> Lets back up a little. </p>
<p> Around 2010 I worked on Google Wave. Wave was an attempt to make collaboratively editable spaces to replace email, google docs, web forums, instant messaging and a hundred other small single purpose applications. Wave had a property I love in my tools that I havent seen articulated anywhere: It was a general purpose medium (like paper). Unlike a lot of other tools, it doesnt force you into its own workflow. You could use it to do anything from plan holidays, make a wiki, play D&amp;D with your friends, schedule a meeting, etc. </p>
<p> Internally, Waves collaborative editing was built on top of Operational Transform (OT). OT has been around for awhile - the algorithm we used was based on the original <a href="https://www.google.com/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=&amp;ved=2ahUKEwi3mr6CivnrAhXEfd4KHcAyBe4QFjAAegQIBBAB&amp;url=http%3A%2F%2Flively-kernel.org%2Frepository%2Fwebwerkstatt%2Fprojects%2FCollaboration%2Fpaper%2FJupiter.pdf&amp;usg=AOvVaw0HmIhcn7_VKk2h1bEeAOJS">Jupiter paper</a> from 1995. It works by storing a chronological list for each document of every change. “Type an <em>H</em> at position 0”. “Type a <em>i</em> at position 1”. Etc. Most of the time, users are editing the latest version of the document and the operation log is just a list of all the changes. But if users are collaboratively editing, we get concurrent edits. When this happens, the first edit to arrive at the server gets recorded as usual. If the second edit is out of date, we use the log of operations as a reference to figure out what the user really intended. (Usually this just means updating character positions). Then we pretend as if thats what the user meant all along and append the new (edited) operation. Its like realtime git-rebase. </p>
<p> Once Wave died, I reimplemented the OT model in <a href="https://github.com/josephg/sharejs">ShareJS</a>. This was back when node was new and weird. I think I had ShareJS working before npm launched. It only took about 1000 lines of code to get a simple collaborative editor working, and when I first demoed it I collaboratively edited a document in a browser and from a native application. </p>
<p> At its heart, OT is a <a href="https://github.com/share/sharedb/blob/c711cfcb777213d193b1f4a101125e8f6e8e6864/lib/submit-request.js#L194-L212">glorified for() loop</a> with <a href="https://github.com/ottypes/text-unicode/blob/bdcfc545c1a2eda48fe5968ae2ce80cf743b9c08/lib/type.ts#L304-L380">some helper functions</a> to update character offsets. In practice, this works great. OT is simple and understandable. Implementations are fast. (10k-100k operations per second in unoptimized javascript. 1-20M ops/sec in <a href="https://github.com/ottypes/libot">optimized C</a>.). The only storage overhead is the operation log, and you can trim that down if you want to. (Though you cant merge super old edits if you do). You need a centralized server to globally order operations, but most systems have a centralized server / database anyway, right? </p>
<h2 id="centralizedservers"> Centralized servers </h2>
<p> The big problem with OT is that dependancy on a centralized server. Have you ever wondered why google docs shows you that weird “This document is overloaded so editing is disabled” thing when a document is shared to social media? The reason (I think) is that when you open a google doc, one server is picked as the computer all the edits run through. When the mob descends, google needs to pull out a bunch of tricks so that computer doesnt becomes overwhelmed. </p>
<p> Theres some workarounds they could use to fix this. Aside from sharding by document (like google docs), you could edit via a retry loop around a database transaction. This pushes the serialization problem to your database. (<a href="https://firepad.io/">Firepad</a> and <a href="https://github.com/share/sharedb/">ShareDB</a> work this way). </p>
<p> Its not perfect though. We wanted Wave to replace email. Email is federated. An email thread can span multiple companies and it all just works. And unlike facebook messenger, emails are only be sent to the companies that are CCed. If I email my coworker, my email doesnt leave the building. For Wave to replace email, we needed the same functionality. But how can that work on top of OT? We got it working, kinda, but it was complex and buggy. We ended up with <a href="https://web.archive.org/web/20180112171345/http://www.waveprotocol.org/protocol/draft-protocol-specs/draft-protocol-spec">a scheme</a> where every wave would arrange a tree of wave servers and operations were passed up and down the tree. But it never really worked. <a href="https://www.youtube.com/watch?v=AyvQYCv6j34">I gave a talk</a> at the Wave Protocol Summit just shy of 10 years ago explaining how to get on the network. I practiced that talk, and did a full runthrough. I followed literally step by step on the day and the version I made live didnt work. I still have no idea why. Whatever the bugs are, I dont think they were ever fixed in the opensource version. Its all just too complicated. </p>
<h2 id="theriseofcrdts"> The rise of CRDTs </h2>
<p> Remember, the algorithm Wave used was invented in 1995. Thats a pretty long time ago. I dont think I even had the internet at home back in 1995. Since then, researchers have been busy trying to make OT work better. The most promising work uses CRDTs (Conflict-Free Replicated data types). CRDTs approach the problem slightly differently to allow realtime editing without needing a central source of truth. Martin lays out how they work in his talk better than I can, so Ill skip the details. </p>
<p> People have been asking me what I think of them for many years, and my answer was always something like this: </p>
<blockquote>
<p> Theyre neat and Im glad people are working on them <em>but</em>: </p>
</blockquote>
<ul>
<li>Theyre slow. Like, really slow. Eg Delta-CRDTs takes nearly 6 hours to process a real world editing session with a single user typing a 100KB academic paper. (<a href="https://github.com/dmonad/crdt-benchmarks/tree/d7f4d774a302f13f26cded6e614d44e0b5e496c9">Benchmarks - look for B4</a>.) </li>
<li>Because of how CRDTs work, documents grow without bound. The current automerge master takes 83MB to represent that 100KB document on disk. Can you ever delete that data? Probably not. And that data cant just sit on disk. It needs to be loaded into memory to handle edits. (Automerge currently grows to 1.1GB in memory for that.) </li>
<li>CRDTs are missing features that OT has had for years. For example, nobody has yet made a CRDT that supports /object move/ (move something from one part of a JSON tree to another). You need this for applications like Workflowy. OT <a href="https://github.com/ottypes/json1/">handles this fine</a>. </li>
<li>CRDTs are complicated and hard to reason about. </li>
<li>You probably have a centralized server / database anyway. </li>
</ul>
<p> I made all those criticisms and dismissed CRDTs. But in doing so I stopped keeping track of the literature. And - surprise! CRDTs went and quietly got better. <a href="https://www.youtube.com/watch?v=x7drE24geUw">Martins talk</a> (which is well worth a watch) addressed the main points: </p>
<ul>
<li>
<strong>Speed:</strong> Using modern CRDTs (Automerge / RGA or y.js / YATA), applying operations should be possible with just an log(n) lookup. (More on this below).
</li>
<li>
<strong>Size:</strong> Martins columnar encoding can store a text document with only about a 1.5x-2x size overhead compared to the contents themselves. Martin talks about this <a href="https://youtu.be/x7drE24geUw?t=3273">54 minutes into his talk</a>. The code to make this work in automerge hasnt merged yet, but Yjs implemented Martins ideas. And in doing so, Yjs can store that same 100KB document in 160KB on disk, or 3MB in memory. Much better.
</li>
<li>
<strong>Features:</strong> Theres at least a theoretical way to add all the features using rewinding and replaying, though nobodys implemented this stuff yet.
</li>
<li>
<strong>Complexity:</strong> I think a decent CRDT will be bigger than the equivalent OT implementation, but not by much. Martin managed to make a tiny, slow <a href="https://github.com/automerge/automerge/blob/a8d8b602ec273aaa48679e251de8829f3ce5ad41/test/fuzz_test.js">implementation of automerge in only about 100 lines of code</a>.
</li>
</ul>
<p> I still wasnt completely convinced by the speed argument, so I made a <a href="https://github.com/josephg/text-crdt-rust">simple proof of concept CRDT implementation in Rust</a> using a B-tree using ideas from automerge and benchmarked it. Its missing features (deleting characters, conflicts). But it can handle <a href="https://home.seph.codes/public/crdt1/user%20pair%20append%20end/report/index.html">6 million edits per second</a>. (Each <a href="https://github.com/josephg/text-crdt-rust/blob/cc3325019887ad03e89f27e26b4295d1fb2048c9/benches/benchmark.rs#L29-L42">iteration</a> does 2000 edits to an empty document by an alternating pair of users, and that takes 330µs. So, 6.06 million inserts / second). So that means weve made CRDTs good enough that the difference in speed between CRDTs and OT is smaller than the speed difference between Rust and Javascript. </p>
<p> All these improvements have been “coming soon” in automerges performance branch for a really long time now. But automerge isnt the only decent CRDT out there. <a href="https://github.com/yjs/yjs">Y.js</a> works well and kicks the pants off automerges current implementation <a href="https://github.com/dmonad/crdt-benchmarks">in the Y.js benchmarks</a>. Its missing some features I want, but its generally easier to fix an implementation than invent a new algorithm. </p>
<h2 id="inventingthefuture"> Inventing the future </h2>
<p> I care a lot about inventing the future. What would it be ridiculous not to have in 100 years? Obviously well have realtime editing. But Im no longer convinced OT - and all the work Ive done on it - will still be around. I feel really sad about that. </p>
<p> JSON and REST are used everywhere these days. Lets say in 15 years realtime collaborative editing is everywhere. Whats the JSON equivalent for realtime editing that anyone can just drop in to their project? In the glorious future well need high quality CRDT implementations, because OT just wont work for some applications. You couldnt make a realtime version of Git, or a simple remake of Google Wave with OT. But if we have good CRDTs, do we need good OT implementations too? Im not convinced we do. Every feature OT has can be put in to a CRDT. (Including trimming operations, by the way). But the reverse is not true. Smart people disagree with me, but if we had a good, fast CRDT available from every language, with integration on the web, I dont think we need OT at all. </p>
<p> OTs one advantage is that it fits well in centralized software - which is most software today. But distributed algorithms work great in centralized software too. (Eg look at Github). And I think a really high quality CRDT running in wasm would be faster than an OT implementation in JS. And even if you only care about centralized systems, remember - Google runs into scaling problems with Google Docs because of OTs limitations. </p>
<p> So I think its about time we made a lean and fast CRDT. The academic work has been mostly done. We need more kick-ass implementations. </p>
<h2 id="whatsnext"> Whats next </h2>
<p> I increasingly dont care for the world of centralized software.<br /> Software interacts with my data, on my computers. Its about time my software reflected that relationship. I want my laptop and my phone to share my files over my wifi. Not by uploading all my data to servers in another country. Especially if those servers are <a href="https://www.thesocialdilemma.com/">financed by advertisers bidding for my eyeballs</a>. </p>
<p> Philosophically, if I modify a google doc my computer is asking Google for <em>permission</em> to edit the file. (You can tell because if googles servers say no, I lose my changes.) In comparison, if I <code>git push</code> to github, Im only <em>notifying</em> github about the change to my code. My repository is mine. I own all the bits, and all the hardware that houses them. This is how I want all my software to work. Thanks to people like Martin, we now know <em>how</em> to make good CRDTs. But theres still a lot of code to write before <a href="https://www.inkandswitch.com/local-first.html">local first software</a> can become the default. </p>
<p> So Operational Transform, I think this is goodbye from me. We had some great times. Some of the most challenging, fun code Ive ever written was <a href="https://github.com/josephg/sharejs">operational</a> <a href="https://github.com/share/sharedb/">transform</a> <a href="https://github.com/ottypes/json1/">code</a>. OT - youre clever and fascinating, but CRDTs can do things you were never capable of. And CRDTs need me. With some good implementations, I think we can make something really special. </p>
<p> I mourn all the work Ive done on OT over the years. But OT is no longer fits into the vision I have for the future. CRDTs would let us remake Wave, but simpler and better. And they would let us write software that treats users as digital citizens, not a digital serfs. <a href="https://josephg.com/blog/home-is-where-the-bits-flow/">And that matters.</a>
</p>
<p> The time to build is now. </p>
<hr />
<p>
<a href="https://news.ycombinator.com/item?id=24617542#24621238">Discussion on HN</a>
</p>
</section>
</div>