1 / 147

Proverbs Are Lossy Compression

Don’t judge a book by its cover, but also, how you do one thing is how you do everything.
Alright, so what’s the point of a proverb if there is always an “anti-proverb” that contradicts it?

2 / 147

Proverbs have inverses

3 / 147

You might have heard it before:

Don’t judge a book by its cover.

4 / 147

Of course, it makes sense. You can’t know what’s the content of a book just by looking at its cover.

5 / 147

But then, which of these two books would you pick?

6 / 147

There is no correct answer there, because judging from my viewership, you guys are nuts.

But at least you have one in mind, right?

7 / 147

There is also a saying that how you do one thing is how you do everything.

Dress for the job you want not for the job you have.

Would you hire someone who dresses like a slob? This guy doesn’t even care about his appearance, how can he care about his work?

8 / 147

Every proverb seems to always have an “anti-proverb” that contradicts it.

9 / 147

Done is better than perfect.

Yeah, you should just start and get things done!

A house built on sand cannot stand

Or… you should take the time to plan big, build a strong foundation, and do things right the first time.

10 / 147

Many hands make light work.

Let’s parallelize the algorithm and make every thread share the memory.

Too many cooks spoil the broth.

Cool. Deadlocks. Race conditions. Starvation. Let’s not do that.

11 / 147

The early bird catches the worm.

First-mover advantage is real.

The second mouse gets the cheese.

Or… maybe we should wait and see how the first mouse got eaten by the cat, and then learn from its mistakes.

12 / 147

So on and so forth. What is the point? Do we not learn anything from these proverbs?

13 / 147

Are these some cool speech patterns just to make those who say them sound smart? Why do millionaires in an interview always say these lessons as if it wasn’t just a hindsight bias?

14 / 147

Sure “WhAt I wIsH I kNeW wHeN I wAs 20”

Had you done those things differently, the canon event might not have happened, and you wouldn’t stand where you are today.

15 / 147

Or… maybe it doesn’t matter at all, and you would have inherited the same amount of money from your parents. Sorry if that sounds cynical, but I’m just playing devil’s advocate here.

16 / 147

The point is:

Should we just ignore these proverbs?

17 / 147

But then, why does it work sometimes?

18 / 147

Let’s play a simple game! Try to guess where the hidden ball is. Try as many times as you want, but please try at least 10-20 times!

19 / 147

See how well you did.

20 / 147

Isn’t it weird?

I didn’t say how I chose the hidden ball’s position.

But you can still guess it correctly more often than theoretical random guessing which is 11% for 9 boxes.

(Or you didn’t. I wouldn’t know.)

21 / 147

Might be an overkill to build an entire game to state the obvious, but the point is, we naturally have pattern seeking tendencies.

We don’t like when the odds are against us, so we try to gain the edge by looking for patterns.

22 / 147

Hey every time I put all eggs in one basket, I break the basket and lose all my eggs. Maybe I should stop doing that.

23 / 147

Hey every time I put all eggs in one basket, I can easily carry all my eggs and take care of them, instead of having to carry a dozen baskets. Maybe I should keep doing that.

24 / 147

Isn’t it strange? If there are two sides to the coin every time, does it even matter?

25 / 147

In a tic-tac-toe game, you can easily predict your opponent’s best move that responds to your move, that responds to your opponent’s move, and so on.

X

O

X

O

X

O

26 / 147

The best move is very obvious, there is no proverb like:

Block your opponent’s move when you see two in a row.

X

O

X

O

X

O

27 / 147

Chess, on the other hand, is far more complex. Just two moves ahead, the combinations already explode exponentially.

28 / 147

Because of that, we have heuristics and rules of thumb such as:

Control the center of the board.

Don’t bring your queen out too early.

Keep your king safe.

29 / 147

To create a chess bot. There are two approaches:

Use the same strategies that human players use, and try to formalize them into algorithms.

Or…

Let the bot discover its own strategies by formalizing the implicit rewards (or punishments) after making a certain move.

30 / 147

But both of these approaches eventually lose to subverted expectations.

31 / 147

Subversion makes you question your assumptions and worldview.

32 / 147

And that might have brought you to this article in the first place.

33 / 147

To find out that you don’t know as much as you think you do, and that there are always two sides to every story.

Hell, even three sides to every story. Or maybe even more.

34 / 147

But you shouldn’t be worried.

35 / 147

You aren’t supposed to know everything.

36 / 147

Knowing everything is impossible, figuratively, and also mathematically and physically impossible.

37 / 147

In decision theory, there is Risk and there is Uncertainty.

If we say certainty is when we know the only possible outcome of an event,

then a risk is when we know the possible outcomes, and the probability of each outcome (even if it’s just by intuition and not mathematically calculated).

38 / 147

Uncertainty, on the other hand, is when we don’t even know the possible outcomes, let alone their probabilities.

39 / 147

In some cases, we can gather more and more information, recognize the patterns, and eventually make predictions.

40 / 147

In other cases, there are too many variables and factors that we can’t compress into a simple heuristic.

41 / 147

And that’s when we have the Belief System.

42 / 147

We learn that religions are born out of the unknown, and the fear of the unknown.

So, this kinda checks out.

43 / 147

“Judging a book by its cover” is a belief system. It is a “lossy compression” of the complex reality of how much information is actually contained in the entire book, and how much of it can be inferred from the cover.

44 / 147

Your belief system might as well be “Never judge a book by its cover”.

Or maybe “Always judge a book by its spoiler”.

45 / 147

Something that is objectively right could just mean statistically right.

46 / 147

We just have enough empirical evidence to support it.

Doesn’t mean it is an absolute truth, but it just doesn’t make sense to ignore it.

47 / 147

That makes some proverbs more popular and widely accepted than their contradictory anti-proverbs.

48 / 147

Some anti-proverbs are made up by people who got hurt by the subverted expectations of the original proverb, and they want to warn others about it.

49 / 147

Or they were lucky to find a counterexample that opened their eyes to a newfound truth that they want to share with the world.

50 / 147

Either way, it is just a belief system that you can choose to adopt or not.

51 / 147

So what exactly is a Preference?

52 / 147

How come we like one thing over another? Is there a reason?

53 / 147

Our belief system and our preferences are self-reinforcing.

54 / 147

Most people tend to prefer things that align with their belief system if their belief system also “prefers things” that align with their preferences.

55 / 147

what?

56 / 147

The former part is a common psychological phenomenon called confirmation bias that just makes sense. You might also agree that it’s kinda intuitive.

The latter part serves as an explanation for why we change in a non-continuous, contradictory way.

57 / 147

why?

58 / 147

The universe is lazy. Not just you.

59 / 147

Biologically, we are wired to spend as little energy as possible to achieve the most reward.

60 / 147

But did you know this is also true physically?

Something that is not alive also tends to be… lazy.

61 / 147

In physics, there is a principle called the Action Principles.

62 / 147

If you came back (or didn’t leave) after reading a few sections, you might still be wondering…

What even is Action?

63 / 147

Action is not energy.

It’s not even force.

It’s not time either.

64 / 147

Why is it an integral—a sum—of the Lagrangian over time?

(at least in Lagrangian mechanics, there are a few different religions of physics that have different definitions of action)

65 / 147

S = \int_{t_1}^{t_2} L \, dt

The Lagrangian is the difference between kinetic energy and potential energy.

L = T - V

(given that the energy of the system is conserved).

66 / 147

But why?

$T$ ? $V$ ? One subtracts the other? Why?

67 / 147

Why does the universe care about two arbitrary quantities that we defined?

68 / 147

And it also happens to trust our definition of arithmetic enough to say that

uhh, you should just subtract one from the other

69 / 147

It’s not even trying to minimize the action, as the true definition is the Principle of Stationary Action. It could be a minimum, it could be a maximum, it could be a saddle point.

But the infinitesimal variations around the true path will have the same action.

70 / 147

The reason it doesn’t make so much sense is because we are stating the law as if it is a prescriptive law.

71 / 147

It’s like saying that English speakers should say “An umbrella” instead of “A umbrella” because it is the law.

They should say “A university” instead of “An university” because it is the law.

72 / 147

But the English native speakers don’t even notice that they are following this law. It’s just that phonetically, “n” creates a discrete boundary between the article and the noun.

73 / 147

Saying “a umbrella” simply mushes the two words together and makes it harder to understand. This doesn’t apply to “a university” because the “y” sound already creates a boundary between the article and the noun.

74 / 147

Back to physics, the principle of stationary action does not dictate that the universe must follow the path of stationary action.

75 / 147

It just says that “this is what we observe. It describes every other thing. It is going to stay here and describe everything else until we find a counterexample”

or something that subverts our expectations.

76 / 147

Usually the most law-breaking counterexamples are those that are really really fast, or really really small.

77 / 147

Light.

78 / 147

When you see a beam of light bending in a cup of water, you might not be surprised.

But have you ever thought: “it doesn’t have to do that?”

79 / 147

Sure, everybody explains it with an analogy.

A lifeguard on the beach can run faster on the sand than in the water, so when he sees a drowning person, he will run on the sand and then dive into the water at an angle that minimizes his time to reach the drowning person.

80 / 147

But light is pretty chill? It doesn’t have to save anyone. It could just go straight and even if it travels slower in the water, it just doesn’t have to care.

Why does it care? Why does it follow the path of least time?

81 / 147

It is merely an illusion of observation.

82 / 147

You are right! Light doesn’t have to care.

But when it doesn’t care, you just won’t… see it.

83 / 147

A more accurate analogy would be a marching band line going through a boundary of concrete road and mud.

(Yeah, typical rookie mistake to go in a mud. They are just not very good at marching)

84 / 147

The band practices countless times. It doesn’t have to care about the time, so it simply tries marching at every possible angle.

In reality, this is not what superposition means, but this is the limit of analogy too, so just go with it.

Intuitively, it tries to go in a straight line.

85 / 147

Unfortunately, the moment the leftmost line (from our perspective) steps into the mud, the entire line will be dragged down and slowed down by the mud.

Marching band going through the boundary in a straight line

The rightmost line is still on the ground, so the rightmost line lags behind

86 / 147

When the leftmost line stomps slower, it goes out of sync. Left, Right, Left, Right coincide with Right, Left, Right, Left.

This angle would make them cancel each other out so it never makes it out of the practice session.

87 / 147

If the rightmost line curves into the mud to meet the leftmost line earlier, the rows will be in sync again.

Marching band going through the boundary in a curve

88 / 147

There is no motive for this brainless marching band to do that.

Marching band going in sync through the boundary

It just so happens that for every angle they tried, the phases got out of sync, and made the probability of observing the those angles lower and lower—infinitesimally small.

89 / 147

It’s not as if the marching band is trying to save the drowning person. They are not targeting anything. They are already marching at that angle.

Eventually, the angle that keeps them in sync has the highest probability of being observed.

90 / 147

So now we know!

Light is not consciously trying to minimize the time; the minimum is the result of the interference of all the possible paths.

91 / 147

But why minimum? You mentioned before that it could also be a maximum or a saddle point?

Again, it is not because it wants to be a stationary point; it is rather what properties a stationary point has.

92 / 147

Let’s first assume my audience knows a bit of calculus.

When we say a point is “stationary”, we mean that the rate of change at that point is zero.

Oh boy, how do I connect this back to proverbs?

93 / 147

Meaning that if we nudge the path a little bit to the left, a teeny tiny amount, there would be no change to the action.

94 / 147

All the waves that are out of sync will cancel each other out, and the waves that are in sync will reinforce each other.

95 / 147

And then the “least action” path is merely the outcome, the descriptive law that we observe, not the prescriptive rule that the universe must follow.

96 / 147

So to answer the question, there is no such thing as “Preference” in the lower level of reality. It’s just that stationary points are more likely to be observed.

97 / 147

Physics is simply a world model. It is a belief system that we choose to adopt because it has a lot of empirical evidence to support it, and it is useful for making predictions and building things.

98 / 147

Classical mechanics is a belief system that is a lossy compression of the complex reality that something might be really really small, or really really fast, and then we need quantum mechanics or relativity to describe it.

99 / 147

But classical mechanics is “good enough” for most of the things we do in our daily life.

100 / 147

Proverbs, of course, are world models.

We observe patterns, and we try to compress them into a simple, catchy phrase that we can easily remember and share with others.

101 / 147

Can we do better?

Can’t we just formalize these patterns into a more accurate, less lossy compression?

Oh, you would be surprised by how many fields of study we’re trying to connect here.

102 / 147

In the world of complexity theory, there is a concept called Approximation Ratio.

Uhh, I might have skipped a few steps here, but let’s first talk about how we can measure the hardness of a problem.

103 / 147

Problems that are easy are those that can be solved in polynomial time. That means the time it takes to solve the problem grows at most as a polynomial function of the input size.

104 / 147

For example, you can find a specific item in a list of items by checking each item one by one. This is called linear time, and it is considered easy.

105 / 147

However, some problems are harder. For example, given a number of items with different sizes, and bins with a fixed capacity, how many bins do you need to fit all the items?

106 / 147

This is called the bin packing problem, and it is considered hard because there is no (known) algorithm that can solve it in polynomial time.

107 / 147

Nevertheless, we have found a way to approximate the solution!

108 / 147

So the question: can’t we find an algorithm that runs really fast, yet gives a solution that is really close to the optimal solution?

109 / 147

Where is the wall?

110 / 147

In complexity theory, there is a class of problems called NP-hard problems.

111 / 147

These are problems that are at least as hard as the hardest problems in NP.

112 / 147

If we can find a polynomial-time algorithm for any NP-hard problem, then we can solve all problems in NP in polynomial time.

113 / 147

That would imply that P = NP, which is one of the biggest open questions in computer science.

So everything we’re saying here is based on the thin straw assumption that P != NP.

114 / 147

To say that a problem is “at least as hard as” another problem, we can use the concept of reduction.

115 / 147

Reduction means there is a way to transform a problem A into another problem B, such that if we can solve B, then we can also solve A.

116 / 147

One way is to come up with a polynomial-time algorithm that transforms any instance of problem A into an instance of problem B, and then use the solution of B to solve A.

117 / 147

Oh you lost me at “NP”. What are you trying to say?

118 / 147

The takeaway is, we can say that “Bin packing problem” is at least as hard as another problem called “Partition problem”.

119 / 147

The partition problem is to determine whether a given set of integers can be partitioned into two subsets such that the sum of the numbers in each subset is equal.

120 / 147

For example, given the set $\{0.1, 0.5, 0.5, 0.9\}$ , we can partition it into two subsets $\{0.1, 0.9\}$ and $\{0.5, 0.5\}$ , both of which sum to $1$ .

121 / 147

We can say that for a partition problem where the sum of the numbers is $2$ , the partition problem is to determine whether there is a subset of the numbers that sums to $1$ .

122 / 147

If we can solve the partition problem, then we can also solve the bin packing problem for two bins of size half the sum of the numbers.

It’s all equivalent!

123 / 147

If the partition problem has a solution, then we can put the items in the two bins according to the partition.

If the partition problem does not have a solution, then we need at least three bins to fit all the items.

124 / 147

We can say that the guaranteed performance of any algorithm that approximates the bin packing problem is no better than 3/2.

Meaning that if the optimal solution is 2 bins, then any algorithm that approximates the solution will require at least 3 bins.

125 / 147

The reason is that if there is a solution that is better than 3/2, then we can use that algorithm to solve the partition problem in polynomial time, which would imply that P = NP.

126 / 147

And that is something we don’t have an answer to yet, and would break our worldview if it turns out to be true.

127 / 147

So yeah, you can always be good enough, but there is a wall to perfection.

128 / 147

Will we ever find an AGI that can solve all problems in an instant?

129 / 147

And that’s the point!

An AGI capable of solving all problems perfectly would itself require intractable computation.

130 / 147

In graphics programming, sure we can just generate a ray-traced image by simulating the physics of light, hitting every single particle in the scene, and calculating the color of each pixel based on that.

131 / 147

But that would be really really slow.

132 / 147

You see, the thing is, a model (not just a machine learning model) is a lossy compression of the reality. Not being perfect means it is doing something else reasonably well, such as being efficient, or being interpretable.

133 / 147

Our (current) large language models are approximately solving the problem. If it’s too good then it’s breaking the wall of computation. If it’s good and fast, then it’s too big, essentially memorizing every single possible input and output pair.

134 / 147

So what is a good proverb?

135 / 147

A good proverb doesn’t mean that it has enough empirical evidence to support it.

136 / 147

Hell yeah, “Opportunity favors the prepared” and “The early bird catches the worm” are pretty good proverbs.

Not to offend the ornithologists, but nobody is really keeping track of whether the early bird actually catches more worms.

137 / 147

But what is the measure here?

138 / 147

A good proverb is one that is descriptive, not prescriptive.

139 / 147

It describes the patterns that we observe, rather than dictating how we should behave.

140 / 147

Sure, reality is complex and will subvert our expectations, but proverbs have approximation ratios.

Knowing everything with a little set of rules, with efficient computation, is going to break our understanding of the world.

141 / 147

Not just proverbs, but also your belief system.

142 / 147

Making mistakes is inevitable, because being perfect means your computation is intractable. You were not meant to find the optimal solution.

143 / 147

But you can still be good enough, and that is what a good proverb is.

144 / 147

I’m not saying that you should learn from mistakes by describing the patterns that you observe, instead of developing a prescriptive rule that “I should never do this again”.

Saying that would contradict the very point of this article: there is no such thing as “should”.

145 / 147

The takeaway is, there are infinite paths to take. There is no “right” path, but there are paths that are more constructive and likely to be observed than others.

There are rules that work for you, and there are rules that interfere with you.

146 / 147

Book covers.

147 / 147

Thank you for reading! If you have any feedback or suggestions for future topics, please let me know in the comments or reach out to me on social media.

Back to Home