dockerfile/examples/omnivore/content-fetch/readabilityjs/test/test-pages/jon.bo/expected.html

<div id="readability-page-1" class="page">
    <div>
        <p> My digital life in a nutshell: I discover relevant content I don’t have time to consume, I find time and become overwhelmed with my scattered backlog, I wish the content were in a different format, and then I’m unable to find something again once I’ve consumed it. <a href="https://andymatuschak.org/books/">Not retaining enough</a> is a valid problem but we’ll tackle that one later. </p>
        <p> There’s a lot of generalization in my summary but <strong>the core issue is an extraordinarily high level of friction in the process of finding, organizing, and sharing digital content</strong>. During the past few years I’ve noticed: </p>
        <ul>
            <li>
                <p> The more seamless the acquisition &amp; ingestion, the more engaged I am with the content </p>
            </li>
            <li>
                <p> Insights are just as likely to be found in a 400-page book as in a 40-minute podcast </p>
            </li>
            <li>
                <p> Notes and their subsequent review are essential for long-term retention </p>
            </li>
            <li>
                <p> Recommendations from other humans are as good, if not better, than algorithmic suggestions </p>
            </li>
        </ul>
        <p> In the rest of this post I attempt to explain the digital tools I wish existed, and how the the currently available tools do not suffice. What are also probably lacking are my <a href="https://www.buildingasecondbrain.com/">habits and workflows</a> around this - but I’m looking at tools specifically here. </p>
        <h2 id="queue-management-for-inbound-digital-content"> Queue management for inbound digital content&nbsp;<a href="#queue-management-for-inbound-digital-content">#</a>
        </h2>
        <p> Where to begin? Probably the most common problem I see myself and other people dealing with is processing the incoming deluge of articles to read and videos to watch. This isn’t all personal recommendations - it encompasses any and all content I think my future self would appreciate me consuming. A list of issues, roughly by order of appearance: </p>
        <ul>
            <li>
                <p> Content (or links to it) arrive from a variety of sources including text messages from friends, email conversations, tweetstorms and replies, references in books, suggestions in real-world conversations, and more. </p>
            </li>
            <li>
                <p> Every book, article, post, or tweet has the potential to lead to more content. </p>
            </li>
            <li>
                <p> Content is published in a variety of formats including but not limited to images, sound files, videos, Google Drive docs, diagrams, long-form paywalled articles, PDFs, powerpoint presentations, and base 64 encoded blobs. </p>
            </li>
            <li>
                <p> I have little visibility into required time investment and foundational context until I’ve opened it and started thinking about it. Should I sit down with a pen and paper to read this or can I skim it while waiting for my coffee? </p>
            </li>
            <li>
                <p> Learning, work, news, and entertainment all have different priorities in my life (roughly in that order). </p>
            </li>
            <li>
                <p> I would like to batch process content in different “streams” regardless of where they are stored. For example: I have two hours, let me work through interesting text content my friends sent me last week. Or: show me all the interesting/relevant videos I’ve queued over the past month. </p>
            </li>
            <li>
                <p> I’m not always connected to a stable internet connection. </p>
            </li>
            <li>
                <p> If it’s a long piece of content I want my position saved reliably so I can resume at a later point. </p>
            </li>
            <li>
                <p> I often want it in a different format than the one it was originally published in (audio → text, text → audio, pdf → ebook). Automated conversion works but is cumbersome. Listening to text articles requires sending them to a special app and converting articles to ebooks is annoying and loses a lot of formatting and navigation. </p>
            </li>
            <li>
                <p> I love to respond to a person’s recommendation - preferably before they’ve forgotten why they sent me it in the first place. </p>
            </li>
            <li>
                <p> I’d like a centralized history of content tied to my notes and annotations in case I want to find it again later. It feels like every week I’m speaking with someone and I remember a blog post I read a few months ago they might find relevant … or was it a Reddit post? Can I find it my history? Oh no, it’s been replaced with <code>[deleted]</code> … find an archived copy… rinse and repeat. </p>
            </li>
        </ul>
        <figure>
            <img src="https://imgs.xkcd.com/comics/icon_swap.png">
            <figcaption> Relevant XKCD, as is tradition </figcaption>
        </figure>
        <p> Following my curiosity feels like chasing a caffeinated bunny around while real understanding requires time, perspective, and reflection. The internet makes the former much easier - so I find myself constantly balancing the two. Additionally, my energy and attention levels vary throughout the day and it’s far easier to just open Twitter rather than continue reading a long-form article I started on my laptop two days ago. Too often I default to the lower-friction one. </p>
        <p> Honorable Mentions: <a href="https://getpocket.com/">Pocket</a>, <a href="https://www.instapaper.com/">Instapaper</a>
        </p>
        <h2 id="a-universal-book-log-recommendation-sharing-system"> A universal book log, recommendation &amp; sharing system&nbsp;<a href="#a-universal-book-log-recommendation-sharing-system">#</a>
        </h2>
        <p> I love exploring other peoples’ reading lists. Here’s <a href="http://fakehost/books">my own</a>. I find everyone keeps their reading lists in different formats on different platforms. Plaintext lists are nice but hard to parse. Spreadsheets are easy to parse but a pain to manage. Third-party services aren’t interoperable, require logins, and are not future-proof. </p>
        <p> Part of the problem here is <a href="https://people.well.com/user/doctorow/metacrap.htm">metadata is hard</a>. Someone has to sit there and fill out the author, title, subtitle, summary, page count - and they’re probably not going to do it for free. Amazon is a good at it but <a href="https://stallman.org/amazon.html#publishing">is hostile to publishers</a>. Goodreads has much potential but <a href="https://onezero.medium.com/almost-everything-about-goodreads-is-broken-662e424244d5">seems to have stagnated</a>. Linking to the book’s Wikipedia entry would be my preference but very few books have an entry. </p>
        <p> Whatever this tool for managing my ever-growing reading list will be, it should: </p>
        <ul>
            <li>
                <p> Let me compare my reading list with another to see overlap. I find this a wonderful way to spark conversation and find common interests. </p>
            </li>
            <li>
                <p> Allow me to tag books instead of placing them into static lists (think clusters or tag clouds). </p>
            </li>
            <li>
                <p> Be tied to my highlights, annotations, and bookmarks in a non-proprietary, searchable, and shareable format. Make them public if I want to. </p>
            </li>
            <li>
                <p> Save context on where and when I found this book: why I thought it was important to read, when I read it, what I wrote down while reading it, and what other content I discovered through it. </p>
            </li>
            <li>
                <p> Let me query this tool like a relational database. For example: show me all books about scaling startups recommended by people I follow on Twitter or by people they follow. The current Twitter search makes me feel like I’m using a government site created before I myself even knew what a computer was. </p>
            </li>
            <li>
                <p> Help me <a href="https://www.lesswrong.com/posts/Kmch6T2YscMyLFJD9/rational-reading-thoughts-on-prioritizing-books">deal with prioritization</a>. My reading list is a mess and I can’t be alone. Are certain books better read before others? Prerequisites? Could three of them be replaced with one? What are the other books by the this author? Are they worth reading too? Why exactly did I think reading this 800 page book was relevant when I added it? <a href="https://www.samuelthomasdavies.com/book-summaries/health-fitness/the-checklist-manifesto/">Is 80% of the content attainable from a blog post?</a> Where is that post? Has someone in my network written a rebuttal to the ideas in this book? The list goes on and on. </p>
            </li>
            <li>
                <p> Provide relevant suggestions with the typical recommender approach based on what people interested in the same topics also enjoyed reading and learning from. </p>
            </li>
        </ul>
        <p> Honorable Mentions: None :( </p>
        <h2 id="intelligent-pdf-viewers-ebook-readers-audiobook-podcast-players"> Intelligent PDF viewers, eBook readers, audiobook &amp; podcast players&nbsp;<a href="#intelligent-pdf-viewers-ebook-readers-audiobook-podcast-players">#</a>
        </h2>
        <figure>
            <img src="https://d33wubrfki0l68.cloudfront.net/5aa98bca712e33bc729c180c9a588f4d5ac7e9af/c5011/digital-tools/ebook-concept.png">
            <figcaption> Functionality I want in my document reader </figcaption>
        </figure>
        <p> Reading is incredible and I love my Kindle. But eBooks today are just a step above OCR’ing a book and slapping on a few basic features which have existed for 30+ years. While I’m reading an eBook I want to: </p>
        <ul>
            <li>
                <p> Have relevant illustrations, graphs, and tables appear for duration of their mentions so I don’t have to flip back and forth between them. </p>
            </li>
            <li>
                <p> See glossary terms and their definitions which appear on this page. Highlighting and searching a term is great but the author may have added important context to the glossary definition. </p>
            </li>
            <li>
                <p> View popular annotations and highlights across <strong>all</strong> mediums - not just by other readers who own an Amazon Kindle readers and purchased this book version and also happened to highlight it enough times. A quote was referenced in 300 blog articles? A two sentence excerpt retweeted 50,000 times? You bet I want to know! </p>
            </li>
            <li>
                <p> Follow referenced information easily. You cited a paper - great, let’s look at the footnotes. Oh, the full reference is in the back of the book. Online list of citations? Of course not! Drop a bookmark, navigate to the back of the book, pull out my laptop, find the paper. Of course, a paywall. Grab a snack. Acquire the PDF. Search for keywords to try to find the referenced information. Sigh, 2019. </p>
            </li>
            <li>
                <p> Not be hindered by the DRM system. Copyright is important and I want to support authors but it’s insane to me all these content licenses I’m acquiring can’t be donated to a library upon account closure. Yes, legal DRM-free eBooks exist but they aren’t without their own issues. </p>
            </li>
            <li>
                <p> Seamlessly switch between devices and formats while retaining my position. Something like Whispersync (<a href="https://www.amazon.com/gp/feature.html?ie=UTF8&amp;docId=1000827761">a neat idea</a> but come on, I’m not made of money. Also, see above points). </p>
            </li>
            <li>
                <p> Let me use a digital or physical keyboard instead of an e-ink keyboard to type my annotations. A possibility here is a companion app, which feels like a notes app but ties my notes to their location/text in the book I’m reading. </p>
            </li>
        </ul>
        <figure>
            <img src="https://d33wubrfki0l68.cloudfront.net/7bc95c0c51628e5ba06ddb7e688a58c600e503c8/81599/digital-tools/audiobook-player.png">
            <figcaption> What I want my audiobook player to look like </figcaption>
        </figure>
        <p> Most of these points above also apply to my experience listening to podcasts, audiobooks, and watching Youtube videos and interviews. I find myself wishing I could: </p>
        <ul>
            <li>
                <p> Navigate them more comfortably. Both Libby and Audible leave much to be desired in terms of navigation. Finding a quote I remember hearing to three days ago is basically blindly stumbling around - and I lose my current spot too. Seeing a list of chapter numbers for the book I’m <strong>listening to</strong> has been helpful a grand total of 0 times. And how cool would it be to drop a bookmark from my bluetooth-connected headphones as I’m biking down a street. </p>
            </li>
            <li>
                <p> View an auto-generated transcript of a podcast as I’m listening to it. It should have easy-to-follow links to references to other podcasts, media, books and support searching for key terms. YouTube already transcribes all of their videos and Google Meet now generates live captions as we’re talking - why can’t we do something similar with podcast apps? </p>
            </li>
        </ul>
        <p> Honorable Mentions: <a href="https://readwise.io/">Readwise</a>, <a href="https://www.weavatools.com/">Weava</a>, <a href="https://www.descript.com/">Descript</a>, <a href="https://otter.ai/">Otter.ai</a>, <a href="https://getpolarized.io/#features">Polar</a>
        </p>
        <h2 id="a-centralized-search-interface-for-my-digital-brain-memex"> A centralized search interface for my digital brain (memex)&nbsp;<a href="#a-centralized-search-interface-for-my-digital-brain-memex">#</a>
        </h2>
        <p> I want to be able to open an interface, type three words, and instantly see results from everything my digital self has interacted with. Emails, years of full-text browsing history, text messages, Slack messages across <strong>all</strong> my organizations, calendar invites and events, books, podcast transcripts I’ve consumed, Twitter and Instagram DMs, PDFs I’ve downloaded, bash commands, videos I’ve seen, my online and offline files, notes, blog post drafts - I really do mean everything. </p>
        <p> I acutely feel the need for this when I’m trying to find something I know I’ve seen online but can’t remember where I saw it. Google is wonderful for finding new information, but absolutely poor for re-finding things. Chrome’s history has so much potential - but I suspect Google would much rather have us look at their ads a few additional times rather than go direct to the source. I accept I might be in the minority on this one. Regardless, this tool should: </p>
        <ul>
            <li>
                <p> Accept and parse the following queries: </p>
                <ul>
                    <li>
                        <p> spacex announcement type:video 2016 </p>
                    </li>
                    <li>
                        <p> links from:jon@test.org topic:python </p>
                    </li>
                    <li>
                        <p> paper on temperature, productivity referenced in book:Uninhabitable Earth </p>
                    </li>
                    <li>
                        <p> type:pdf habits digital interfaces </p>
                    </li>
                    <li>
                        <p> reading comprehension type:blog post </p>
                    </li>
                    <li>
                        <p> printer ink receipt </p>
                    </li>
                    <li>
                        <p> type:book read:2017 finance </p>
                    </li>
                    <li>
                        <p> file:py datetime parse </p>
                    </li>
                </ul>
            </li>
            <li>
                <p> Respect my privacy: hosted on something I control and never mined for ads. </p>
            </li>
            <li>
                <p> Support all my devices with two-way sync so I can search and add to it wherever I am. </p>
            </li>
            <li>
                <p> Be extensible: allow me to easily ingest my own information and extend with desired functionality. </p>
            </li>
            <li>
                <p> Cluster information based on content, tags, geo-location, connected people, conversations, source, and other factors I’m not even aware of. </p>
            </li>
            <li>
                <p> Notify me about changes to documents and webpages I’ve visited. </p>
            </li>
            <li>
                <p> Allow a rough export of my research on a topic (like, a knowledge dump off everything I’ve consumed on pandas) with the ability to easily share it. </p>
            </li>
        </ul>
        <p> Honorable Mentions: <a href="https://worldbrain.io/">Memex by Worldbrain.io</a>, <a href="https://roamresearch.com/">Roam Research</a>, <a href="https://www.notion.so/">Notion</a>, <a href="https://coda.io/welcome">Coda.io</a>, <a href="https://www.alfredapp.com/">Alfred</a>, <a href="https://trovenow.com/">Trove</a>, <a href="https://localnative.app/">Local Native</a>, <a href="https://github.com/pirate/ArchiveBox">ArchiveBox</a>, <a href="https://raindrop.io/">Raindrop</a>
        </p>
        <h2 id="parting-thoughts"> Parting Thoughts&nbsp;<a href="#parting-thoughts">#</a>
        </h2>
        <p> I’m fascinated with a better bridge between our minds and our digital devices. A well-designed tool should disappear and allow complete attention to the task at hand, but digital devices today are far from this ideal - often due to arcane copyright laws or profit-seeking. These aren’t new ideas by any means. See Vannevar Bush’s <a href="https://en.wikipedia.org/wiki/As_We_May_Think">original conception</a> of a memex over 70 years ago. We are way overdue for this. I see enormous potential at combining a true memex with all of our personal data (health, fitness, biometrics) along with our habits, goals, tasks, reflections, and communication tools. </p>
        <p> It seems to me that as information becomes more abundant, the connections drawn between disparate pieces are becoming increasingly important. The easier it is to share that graph with other people, the faster we can learn from each other and understand complex relationships. I’m excited for a world where knowledge is easier to discover, validate, dispute, understand, retain, and share. </p>
        <p> I hope to cover my thoughts on processes, note-taking apps, and knowledge graphs next. Stay tuned <a href="https://mailchi.mp/0e81591ed912/jborichevskiy">here</a>. My thanks to Arthur Tyukayev, Alex Ly, <a href="https://twitter.com/davidmeh">David Heimann</a>, <a href="https://twitter.com/ylimedeg">Em deGrandpré</a>, <a href="https://twitter.com/alexeyguzey">Alexey Guzey</a>, Sam Tkachuk, and <a href="https://twitter.com/briantimar">Brian Timar</a> for reading drafts of this and providing wonderful feedback. </p>
        <p>
            <a href="https://news.ycombinator.com/item?id=21659876">HN Discussion</a>
        </p>
        <h2 id="appendix"> Appendix&nbsp;<a href="#appendix">#</a>
        </h2>
        <p>
            <a href="https://beepb00p.xyz/sad-infra.html">The sad state of personal data and infrastructure (beepb00p.xyz)</a> <a href="https://zettelkasten.de/posts/reading-web-rss-note-taking">Note-Taking when Reading the Web and RSS</a>
        </p>
    </div>
</div>