Knowledge Compression

Loss and Efficiency

Knowledge must be compressed in order for it to be efficiently shared. Depending on the compression, different amounts of information loss can happen. Information loss happens at two stages: in the compression from source to encoding and the instantiation from encoding to target. How much loss depends on the nature of the knowledge itself, the efficacy of the encoding, and the accuracy of the instantiation.

For example, an experienced dog trainer wants to share how to train dogs. She writes her knowledge (source) down in a book (encoding) and then sells the book to eager readers. A reader then reads (encoding) to gain the knowledge himself (target). How much was lost between the dog trainer's knowledge and the final knowledge obtained by the reader?

  • Nature of knowledge: the subject matter is hard. Dog training is pretty tactile. It's hard to explain exactly how to interpret dog behavior and signals without seeing the dogs in action.
  • Efficacy of encoding: let's say our dog trainer isn't great at writing. She can express her knowledge with jargon, but isn't great at communicating her ideas simply.
  • Instantiation accuracy: our reader didn't understand several topics in the book, but feels he got the general gist. If asked, he could recall a few of the general ideas in the book but not much else.

Without putting up dummy numbers, the general sense is that information loss for this example is high. But does it matter?

The extent to which knowledge compression is successful is dependent on how much more efficient it is than the alternative.

If the reader in our example manages to train his dog better after reading the book than without it, then the information loss is acceptable. The efficiency he gains in reading the book is greater than the information loss that happened in the transfer process. The compression was arguably a success. However, if he trains his dog just as well or worse after the book, then the efficiency is not great enough to compensate for the information loss. The compression was not a success.

Software is Lossless

The amount of software knowledge that has been compressed into software itself is astounding.

Knowledge compression for software is lossless because software is designed to be able to encode itself and instantiate the same for all machines. At the same time that software is eating the world, software is eating software.

The amount of knowledge you needed to set up a website back in 1995 used to be extremely high. Now, you can create websites in the same way you create a slide deck. This makes the extent to which one needs to learn to code to accomplish some action a rather dynamic problem. Fundamental and scale-based engineering problems aside, the difficulty level of building basic software is drastically reducing. Better tools are being built for both developers and consumers.

Today, you can code your own blog in minutes by using Gatsby, grabbing an open source starter, installing a UI library, and deploying and hosting your site on Netlify. The developer experience for building a blog is not unlike the one consumers have for setting one up on WordPress or Wix. Developer productivity software is compressing programming knowledge into simpler and more approachable abstractions.

On the consumer end, no-code software has also become popular. Most of these tools expose some layer of programming logic to the end user. Zapier exposes the logic of APIs to analysts, Webflow exposes the logic of HTML/CSS to designers, and Bubble exposes the logic of reading/writing to a database to end users. If any consumers ever decide to transition away from these tools into actual code, they'll likely find they somewhat intuitively understand the logic of APIs, the CSS box model, and the difference between front-end and back-end. Consumer build-your-own software is traveling down the stack, compressing programming knowledge into more powerful tools.

A future newbie developer may be able to construct relatively complex apps within a few days with the aid of new developer tools, frameworks, and libraries - something that would take a person starting to learn now a few months to manage. A future consumer may use a set of tools that vaguely resemble some level of coding to build simple apps on the fly rather than seeking a developer for assistance. Knowledge compression moves fast when its lossless. It's important to note that while software is lossless to itself, the knowledge of programming is not lossless to a person. Future developers will still need to learn programming fundamentals, but they'll need to learn less to do more.

Communication Mediums

The medium of communication affects encoding and instantiation. How well you can encode something is a skill. To some extent, those with such skill can reduce the amount of loss when encoding knowledge. For example, a good writer will more effectively encode a message than a lesser one. However, there might still be rather large information loss if the subject in question isn't suited to the medium. Video, images, drawings, sounds, chat, etc. are all mediums of encoding. The proliferation of podcasts, videos, games, essays, books, live streams, chat channels are a testimony to how producers are experimenting with different forms of encoding.

The diversity of encodings will likely only increase. Most knowledge compression is lossy with the exceptions of math and programming.

We will always be on the hunt for better knowledge compression. There isn't one "best" medium for a given topic. Everyone instantiates knowledge from encodings differently.

Two people can read the same book and walk away with different understandings. Some people will learn better with one medium for one topic and another medium for a different topic. To each their own.

Practitioner Knowledge

Universities have been the institutional provider of choice for knowledge compression as a service. Universities specialize in collecting knowledge, compressing it, and then instantiating it in students. This, however, is changing. The proliferation of activity on the internet has created a decentralized alternative especially for practitioner knowledge. With an accessible comparison for the first time, it is becoming increasingly noticeable how inefficient knowledge compression is for university-based practitioner knowledge. Aside from those who want to be researchers, most students go to university to obtain knowledge for a industry job. Information loss has direct consequences when students find they are not prepared for the job market. For more academic knowledge, universities continue to provide relatively efficient knowledge compression.

Most of the information loss in the practitioner area happens because of a discrepancy between source and target. The difference is often implicit making it harder to identify.

The crux of the issue is that professors are often not practitioners themselves - but their students wish to be.

Professors (depending on the type) are often quite good at encoding. However, encoding ability doesn't matter if the source knowledge isn't what the target was looking for. Unfortunately, the target is often unable to identify this until much later.

Let's take an oversimplified example to illustrate: you are seeking to learn about training cats. You find a very respected dog trainer. Perhaps the same one from earlier who has been writing books. She's good, but she's only ever trained dogs. You reason that both cats and dogs are pets, so it shouldn't matter much. So you learn from the dog trainer and you're a great student. You soak up everything she tells you about training dogs. Until you try to implement dog training on a cat, you won't know what differences matter.

Feedback as an Alternative

Access to feedback makes knowledge compression less relevant. The faster, cheaper, and better the feedback, the less need there is for compression. Reading books and watching videos on how to give a presentation is going to be less efficient than actually doing a presentation and getting feedback. Doing followed by corrective feedback is often more effective than knowledge compression in general, but it's costlier to implement.

Doing with corrective feedback eliminates the information loss that is commonly present in knowledge compression. Instead of trying to transfer knowledge from source to encoding and then to target, this method starts from a base and builds knowledge in recursively. The resulting knowledge is deep because it is always constructed from source. The practical reality of this method is that it is hard to scale. Knowledge compression is the more common method because it is cheaper.