• Proverbs Are Lossy Compression
  • Proverbs have inverses
  • But then, why does it work sometimes?
  • You aren’t supposed to know everything.
  • So what exactly is a Preference?
  • Can we do better?
  • Will we ever find an AGI that can solve all problems in an instant?
  • So what is a good proverb?
Proverbs Are Lossy Compression
Previous Next
1 / 147

Proverbs Are Lossy Compression

Don’t judge a book by its cover, but also, how you do one thing is how you do everything.
Alright, so what’s the point of a proverb if there is always an “anti-proverb” that contradicts it?

Previous Next
2 / 147

Proverbs have inverses

Previous Next
3 / 147

You might have heard it before:

Don’t judge a book by its cover.

Previous Next
4 / 147

Of course, it makes sense. You can’t know what’s the content of a book just by looking at its cover.

Previous Next
5 / 147

But then, which of these two books would you pick?

Book 1
Book 2
Previous Next
6 / 147

There is no correct answer there, because judging from my viewership, you guys are nuts.

But at least you have one in mind, right?

Previous Next
7 / 147

There is also a saying that how you do one thing is how you do everything.

Dress for the job you want not for the job you have.

Would you hire someone who dresses like a slob? This guy doesn’t even care about his appearance, how can he care about his work?

Previous Next
8 / 147

Every proverb seems to always have an “anti-proverb” that contradicts it.

Previous Next
9 / 147

Done is better than perfect.

Yeah, you should just start and get things done!

A house built on sand cannot stand

Or… you should take the time to plan big, build a strong foundation, and do things right the first time.

Previous Next
10 / 147

Many hands make light work.

Let’s parallelize the algorithm and make every thread share the memory.

Too many cooks spoil the broth.

Cool. Deadlocks. Race conditions. Starvation. Let’s not do that.

Previous Next
11 / 147

The early bird catches the worm.

First-mover advantage is real.

The second mouse gets the cheese.

Or… maybe we should wait and see how the first mouse got eaten by the cat, and then learn from its mistakes.

Previous Next
12 / 147

So on and so forth. What is the point? Do we not learn anything from these proverbs?

Previous Next
13 / 147

Are these some cool speech patterns just to make those who say them sound smart? Why do millionaires in an interview always say these lessons as if it wasn’t just a hindsight bias?

Previous Next
14 / 147

Sure “WhAt I wIsH I kNeW wHeN I wAs 20”

Had you done those things differently, the canon event might not have happened, and you wouldn’t stand where you are today.

Previous Next
15 / 147

Or… maybe it doesn’t matter at all, and you would have inherited the same amount of money from your parents. Sorry if that sounds cynical, but I’m just playing devil’s advocate here.

Previous Next
16 / 147

The point is:

Should we just ignore these proverbs?

Previous Next
17 / 147

But then, why does it work sometimes?

Previous Next
18 / 147

Let’s play a simple game! Try to guess where the hidden ball is. Try as many times as you want, but please try at least 10-20 times!

Previous Next
19 / 147

See how well you did.

Previous Next
20 / 147

Isn’t it weird?

I didn’t say how I chose the hidden ball’s position.

But you can still guess it correctly more often than theoretical random guessing which is 11% for 9 boxes.

(Or you didn’t. I wouldn’t know.)

Previous Next
21 / 147

Might be an overkill to build an entire game to state the obvious, but the point is, we naturally have pattern seeking tendencies.

We don’t like when the odds are against us, so we try to gain the edge by looking for patterns.

Previous Next
22 / 147

Hey every time I put all eggs in one basket, I break the basket and lose all my eggs. Maybe I should stop doing that.

Previous Next
23 / 147

Hey every time I put all eggs in one basket, I can easily carry all my eggs and take care of them, instead of having to carry a dozen baskets. Maybe I should keep doing that.

Previous Next
24 / 147

Isn’t it strange? If there are two sides to the coin every time, does it even matter?

Previous Next
25 / 147

In a tic-tac-toe game, you can easily predict your opponent’s best move that responds to your move, that responds to your opponent’s move, and so on.

X
O
X
O
X
X
O
Previous Next
26 / 147

The best move is very obvious, there is no proverb like:

Block your opponent’s move when you see two in a row.

X
O
X
O
X
X
O
Previous Next
27 / 147

Chess, on the other hand, is far more complex. Just two moves ahead, the combinations already explode exponentially.

Previous Next
28 / 147

Because of that, we have heuristics and rules of thumb such as:

Control the center of the board.

Don’t bring your queen out too early.

Keep your king safe.

Previous Next
29 / 147

To create a chess bot. There are two approaches:

  1. Use the same strategies that human players use, and try to formalize them into algorithms.

Or…

  1. Let the bot discover its own strategies by formalizing the implicit rewards (or punishments) after making a certain move.
Previous Next
30 / 147

But both of these approaches eventually lose to subverted expectations.

Previous Next
31 / 147

Subversion makes you question your assumptions and worldview.

Previous Next
32 / 147

And that might have brought you to this article in the first place.

Previous Next
33 / 147

To find out that you don’t know as much as you think you do, and that there are always two sides to every story.

Hell, even three sides to every story. Or maybe even more.

Previous Next
34 / 147

But you shouldn’t be worried.

Previous Next
35 / 147

You aren’t supposed to know everything.

Previous Next
36 / 147

Knowing everything is impossible, figuratively, and also mathematically and physically impossible.

Previous Next
37 / 147

In decision theory, there is Risk and there is Uncertainty.

If we say certainty is when we know the only possible outcome of an event,

then a risk is when we know the possible outcomes, and the probability of each outcome (even if it’s just by intuition and not mathematically calculated).

Previous Next
38 / 147

Uncertainty, on the other hand, is when we don’t even know the possible outcomes, let alone their probabilities.

Previous Next
39 / 147

In some cases, we can gather more and more information, recognize the patterns, and eventually make predictions.

Previous Next
40 / 147

In other cases, there are too many variables and factors that we can’t compress into a simple heuristic.

Previous Next
41 / 147

And that’s when we have the Belief System.

Previous Next
42 / 147

We learn that religions are born out of the unknown, and the fear of the unknown.

So, this kinda checks out.

Previous Next
43 / 147

“Judging a book by its cover” is a belief system. It is a “lossy compression” of the complex reality of how much information is actually contained in the entire book, and how much of it can be inferred from the cover.

Previous Next
44 / 147

Your belief system might as well be “Never judge a book by its cover”.

Or maybe “Always judge a book by its spoiler”.

Previous Next
45 / 147

Something that is objectively right could just mean statistically right.

Previous Next
46 / 147

We just have enough empirical evidence to support it.

Doesn’t mean it is an absolute truth, but it just doesn’t make sense to ignore it.

Previous Next
47 / 147

That makes some proverbs more popular and widely accepted than their contradictory anti-proverbs.

Previous Next
48 / 147

Some anti-proverbs are made up by people who got hurt by the subverted expectations of the original proverb, and they want to warn others about it.

Previous Next
49 / 147

Or they were lucky to find a counterexample that opened their eyes to a newfound truth that they want to share with the world.

Previous Next
50 / 147

Either way, it is just a belief system that you can choose to adopt or not.

Previous Next
51 / 147

So what exactly is a Preference?

Previous Next
52 / 147

How come we like one thing over another? Is there a reason?

Previous Next
53 / 147

Our belief system and our preferences are self-reinforcing.

Previous Next
54 / 147

Most people tend to prefer things that align with their belief system if their belief system also “prefers things” that align with their preferences.

Previous Next
55 / 147

what?

Previous Next
56 / 147

The former part is a common psychological phenomenon called confirmation bias that just makes sense. You might also agree that it’s kinda intuitive.

The latter part serves as an explanation for why we change in a non-continuous, contradictory way.

Previous Next
57 / 147

why?

Previous Next
58 / 147

The universe is lazy. Not just you.

Previous Next
59 / 147

Biologically, we are wired to spend as little energy as possible to achieve the most reward.

Previous Next
60 / 147

But did you know this is also true physically?

Something that is not alive also tends to be… lazy.

Previous Next
61 / 147

In physics, there is a principle called the Action Principles.

Previous Next
62 / 147

If you came back (or didn’t leave) after reading a few sections, you might still be wondering…

What even is Action?

Previous Next
63 / 147

Action is not energy.

It’s not even force.

It’s not time either.

Previous Next
64 / 147

Why is it an integral—a sum—of the Lagrangian over time?

(at least in Lagrangian mechanics, there are a few different religions of physics that have different definitions of action)

Previous Next
65 / 147
S=∫t1t2L dtS = \int_{t_1}^{t_2} L \, dtS=∫t1​t2​​Ldt

The Lagrangian is the difference between kinetic energy and potential energy.

L=T−VL = T - VL=T−V

(given that the energy of the system is conserved).

Previous Next
66 / 147

But why?

TTT? VVV? One subtracts the other? Why?

Previous Next
67 / 147

Why does the universe care about two arbitrary quantities that we defined?

Previous Next
68 / 147

And it also happens to trust our definition of arithmetic enough to say that

uhh, you should just subtract one from the other

Previous Next
69 / 147

It’s not even trying to minimize the action, as the true definition is the Principle of Stationary Action. It could be a minimum, it could be a maximum, it could be a saddle point.

But the infinitesimal variations around the true path will have the same action.

Previous Next
70 / 147

The reason it doesn’t make so much sense is because we are stating the law as if it is a prescriptive law.

Previous Next
71 / 147

It’s like saying that English speakers should say “An umbrella” instead of “A umbrella” because it is the law.

They should say “A university” instead of “An university” because it is the law.

Previous Next
72 / 147

But the English native speakers don’t even notice that they are following this law. It’s just that phonetically, “n” creates a discrete boundary between the article and the noun.

Previous Next
73 / 147

Saying “a umbrella” simply mushes the two words together and makes it harder to understand. This doesn’t apply to “a university” because the “y” sound already creates a boundary between the article and the noun.

Previous Next
74 / 147

Back to physics, the principle of stationary action does not dictate that the universe must follow the path of stationary action.

Previous Next
75 / 147

It just says that “this is what we observe. It describes every other thing. It is going to stay here and describe everything else until we find a counterexample”

or something that subverts our expectations.

Previous Next
76 / 147

Usually the most law-breaking counterexamples are those that are really really fast, or really really small.

Previous Next
77 / 147

Light.

Previous Next
78 / 147

When you see a beam of light bending in a cup of water, you might not be surprised.

But have you ever thought: “it doesn’t have to do that?”

Previous Next
79 / 147

Sure, everybody explains it with an analogy.

A lifeguard on the beach can run faster on the sand than in the water, so when he sees a drowning person, he will run on the sand and then dive into the water at an angle that minimizes his time to reach the drowning person.

Previous Next
80 / 147

But light is pretty chill? It doesn’t have to save anyone. It could just go straight and even if it travels slower in the water, it just doesn’t have to care.

Why does it care? Why does it follow the path of least time?

Previous Next
81 / 147

It is merely an illusion of observation.

Previous Next
82 / 147

You are right! Light doesn’t have to care.

But when it doesn’t care, you just won’t… see it.

Previous Next
83 / 147

A more accurate analogy would be a marching band line going through a boundary of concrete road and mud.

Marching band

(Yeah, typical rookie mistake to go in a mud. They are just not very good at marching)

Previous Next
84 / 147

The band practices countless times. It doesn’t have to care about the time, so it simply tries marching at every possible angle.

In reality, this is not what superposition means, but this is the limit of analogy too, so just go with it.

Intuitively, it tries to go in a straight line.

Previous Next
85 / 147

Unfortunately, the moment the leftmost line (from our perspective) steps into the mud, the entire line will be dragged down and slowed down by the mud.

Marching band going through the boundary in a straight line

The rightmost line is still on the ground, so the rightmost line lags behind

Previous Next
86 / 147

When the leftmost line stomps slower, it goes out of sync. Left, Right, Left, Right coincide with Right, Left, Right, Left.

Marching band going through the boundary in a straight line

This angle would make them cancel each other out so it never makes it out of the practice session.

Previous Next
87 / 147

If the rightmost line curves into the mud to meet the leftmost line earlier, the rows will be in sync again.

Marching band going through the boundary in a curve
Previous Next
88 / 147

There is no motive for this brainless marching band to do that.

Marching band going in sync through the boundary

It just so happens that for every angle they tried, the phases got out of sync, and made the probability of observing the those angles lower and lower—infinitesimally small.

Previous Next
89 / 147

It’s not as if the marching band is trying to save the drowning person. They are not targeting anything. They are already marching at that angle.

Marching band going in sync through the boundary

Eventually, the angle that keeps them in sync has the highest probability of being observed.

Previous Next
90 / 147

So now we know!

Light is not consciously trying to minimize the time; the minimum is the result of the interference of all the possible paths.

Previous Next
91 / 147

But why minimum? You mentioned before that it could also be a maximum or a saddle point?

Again, it is not because it wants to be a stationary point; it is rather what properties a stationary point has.

Previous Next
92 / 147

Let’s first assume my audience knows a bit of calculus.

When we say a point is “stationary”, we mean that the rate of change at that point is zero.

Oh boy, how do I connect this back to proverbs?

Previous Next
93 / 147

Meaning that if we nudge the path a little bit to the left, a teeny tiny amount, there would be no change to the action.

Previous Next
94 / 147

All the waves that are out of sync will cancel each other out, and the waves that are in sync will reinforce each other.

Previous Next
95 / 147

And then the “least action” path is merely the outcome, the descriptive law that we observe, not the prescriptive rule that the universe must follow.

Previous Next
96 / 147

So to answer the question, there is no such thing as “Preference” in the lower level of reality. It’s just that stationary points are more likely to be observed.

Previous Next
97 / 147

Physics is simply a world model. It is a belief system that we choose to adopt because it has a lot of empirical evidence to support it, and it is useful for making predictions and building things.

Previous Next
98 / 147

Classical mechanics is a belief system that is a lossy compression of the complex reality that something might be really really small, or really really fast, and then we need quantum mechanics or relativity to describe it.

Previous Next
99 / 147

But classical mechanics is “good enough” for most of the things we do in our daily life.

Previous Next
100 / 147

Proverbs, of course, are world models.

We observe patterns, and we try to compress them into a simple, catchy phrase that we can easily remember and share with others.

Previous Next
101 / 147

Can we do better?

Can’t we just formalize these patterns into a more accurate, less lossy compression?

Oh, you would be surprised by how many fields of study we’re trying to connect here.

Previous Next
102 / 147

In the world of complexity theory, there is a concept called Approximation Ratio.

Uhh, I might have skipped a few steps here, but let’s first talk about how we can measure the hardness of a problem.

Previous Next
103 / 147

Problems that are easy are those that can be solved in polynomial time. That means the time it takes to solve the problem grows at most as a polynomial function of the input size.

Previous Next
104 / 147

For example, you can find a specific item in a list of items by checking each item one by one. This is called linear time, and it is considered easy.

Previous Next
105 / 147

However, some problems are harder. For example, given a number of items with different sizes, and bins with a fixed capacity, how many bins do you need to fit all the items?

Previous Next
106 / 147

This is called the bin packing problem, and it is considered hard because there is no (known) algorithm that can solve it in polynomial time.

Previous Next
107 / 147

Nevertheless, we have found a way to approximate the solution!

Previous Next
108 / 147

So the question: can’t we find an algorithm that runs really fast, yet gives a solution that is really close to the optimal solution?

Previous Next
109 / 147

Where is the wall?

Previous Next
110 / 147

In complexity theory, there is a class of problems called NP-hard problems.

Previous Next
111 / 147

These are problems that are at least as hard as the hardest problems in NP.

Previous Next
112 / 147

If we can find a polynomial-time algorithm for any NP-hard problem, then we can solve all problems in NP in polynomial time.

Previous Next
113 / 147

That would imply that P = NP, which is one of the biggest open questions in computer science.

So everything we’re saying here is based on the thin straw assumption that P != NP.

Previous Next
114 / 147

To say that a problem is “at least as hard as” another problem, we can use the concept of reduction.

Previous Next
115 / 147

Reduction means there is a way to transform a problem A into another problem B, such that if we can solve B, then we can also solve A.

Previous Next
116 / 147

One way is to come up with a polynomial-time algorithm that transforms any instance of problem A into an instance of problem B, and then use the solution of B to solve A.

Previous Next
117 / 147

Oh you lost me at “NP”. What are you trying to say?

Previous Next
118 / 147

The takeaway is, we can say that “Bin packing problem” is at least as hard as another problem called “Partition problem”.

Previous Next
119 / 147

The partition problem is to determine whether a given set of integers can be partitioned into two subsets such that the sum of the numbers in each subset is equal.

Previous Next
120 / 147

For example, given the set {0.1,0.5,0.5,0.9}\{0.1, 0.5, 0.5, 0.9\}{0.1,0.5,0.5,0.9}, we can partition it into two subsets {0.1,0.9}\{0.1, 0.9\}{0.1,0.9} and {0.5,0.5}\{0.5, 0.5\}{0.5,0.5}, both of which sum to 111.

Previous Next
121 / 147

We can say that for a partition problem where the sum of the numbers is 222, the partition problem is to determine whether there is a subset of the numbers that sums to 111.

Previous Next
122 / 147

If we can solve the partition problem, then we can also solve the bin packing problem for two bins of size half the sum of the numbers.

It’s all equivalent!

Previous Next
123 / 147

If the partition problem has a solution, then we can put the items in the two bins according to the partition.

If the partition problem does not have a solution, then we need at least three bins to fit all the items.

Previous Next
124 / 147

We can say that the guaranteed performance of any algorithm that approximates the bin packing problem is no better than 3/2.

Meaning that if the optimal solution is 2 bins, then any algorithm that approximates the solution will require at least 3 bins.

Previous Next
125 / 147

The reason is that if there is a solution that is better than 3/2, then we can use that algorithm to solve the partition problem in polynomial time, which would imply that P = NP.

Previous Next
126 / 147

And that is something we don’t have an answer to yet, and would break our worldview if it turns out to be true.

Previous Next
127 / 147

So yeah, you can always be good enough, but there is a wall to perfection.

Previous Next
128 / 147

Will we ever find an AGI that can solve all problems in an instant?

Previous Next
129 / 147

And that’s the point!

An AGI capable of solving all problems perfectly would itself require intractable computation.

Previous Next
130 / 147

In graphics programming, sure we can just generate a ray-traced image by simulating the physics of light, hitting every single particle in the scene, and calculating the color of each pixel based on that.

Previous Next
131 / 147

But that would be really really slow.

Previous Next
132 / 147

You see, the thing is, a model (not just a machine learning model) is a lossy compression of the reality. Not being perfect means it is doing something else reasonably well, such as being efficient, or being interpretable.

Previous Next
133 / 147

Our (current) large language models are approximately solving the problem. If it’s too good then it’s breaking the wall of computation. If it’s good and fast, then it’s too big, essentially memorizing every single possible input and output pair.

Previous Next
134 / 147

So what is a good proverb?

Previous Next
135 / 147

A good proverb doesn’t mean that it has enough empirical evidence to support it.

Previous Next
136 / 147

Hell yeah, “Opportunity favors the prepared” and “The early bird catches the worm” are pretty good proverbs.

Not to offend the ornithologists, but nobody is really keeping track of whether the early bird actually catches more worms.

Previous Next
137 / 147

But what is the measure here?

Previous Next
138 / 147

A good proverb is one that is descriptive, not prescriptive.

Previous Next
139 / 147

It describes the patterns that we observe, rather than dictating how we should behave.

Previous Next
140 / 147

Sure, reality is complex and will subvert our expectations, but proverbs have approximation ratios.

Knowing everything with a little set of rules, with efficient computation, is going to break our understanding of the world.

Previous Next
141 / 147

Not just proverbs, but also your belief system.

Previous Next
142 / 147

Making mistakes is inevitable, because being perfect means your computation is intractable. You were not meant to find the optimal solution.

Previous Next
143 / 147

But you can still be good enough, and that is what a good proverb is.

Previous Next
144 / 147

I’m not saying that you should learn from mistakes by describing the patterns that you observe, instead of developing a prescriptive rule that “I should never do this again”.

Saying that would contradict the very point of this article: there is no such thing as “should”.

Previous Next
145 / 147

The takeaway is, there are infinite paths to take. There is no “right” path, but there are paths that are more constructive and likely to be observed than others.

There are rules that work for you, and there are rules that interfere with you.

Previous Next
146 / 147

Book covers.

Previous
147 / 147

Thank you for reading! If you have any feedback or suggestions for future topics, please let me know in the comments or reach out to me on social media.

Back to Home