• The Cost of Forgetting
  • What is remember?
  • You don’t store information.
  • Predictive Coding and “Surprise”
  • Your memory is your best guess of the reality
  • Humans subconsciously don’t like surprises.
  • The actual cost of forgetting
  • Forgetting pointlessly is even more dangerous.
  • What does it mean when we “want” to forget things?
  • Can we forget faster?
  • Take your time.
The Cost of Forgetting
Previous Next
1 / 147

The Cost of Forgetting

Why is it hard to forget something? Why is it easier to hold on to the weight than to let go?

Previous Next
2 / 147

Worrying. Suffering. Trauma. Pain. Loss. Guilt. Shame. Regret.

Honestly, if we are so smart, why don’t we just forget about all these things?

Previous Next
3 / 147

We learned in science that physical pain makes us avoid certain things that are harmful to us.

But emotional pain is only as harmful as we imagine it to be. So why do we make our lives harder?

Previous Next
4 / 147

Oh you’re depressed?

Just cheer up?

Previous Next
5 / 147

It just… doesn’t make any sense.

Why can’t we just not suffer from negative emotions?

Previous Next
6 / 147

Why can’t we only remember what we want to remember?

Previous Next
7 / 147

I’ll do you one better.

Previous Next
8 / 147

What is remember?

Previous Next
9 / 147

So what is remembering?

Previous Next
10 / 147

In a strict sense, can we say remembering is the ability to recall a piece of information after having experienced it once?

Previous Next
11 / 147

And now we made it even more complicated… what is experience, and what is recall?

But first, what is information?

Previous Next
12 / 147

Given two coins, each coin has two sides: Head (H)(H)(H) or Tail (T)(T)(T), so the sample space is {H,T}\{H, T\}{H,T}.

Previous Next
13 / 147

The coins were tossed up 30 cm, dropping on the table with volume 80dB, spinning around for 8.8 seconds before lying flat on the table, 3 and 5 cm away from the contact point.

What is the result of this coin toss?

Previous Next
14 / 147

We don’t know lol.

Previous Next
15 / 147

Sorry to disappoint, but this example is to demonstrate what happens when you don’t know something.

There is an uncertainty.

Previous Next
16 / 147

We have a measurement of how much we don’t know about the system.

This is called entropy. It describes how messy the system is.

Previous Next
17 / 147

I could give it away to you and say that information function is defined as

I=−log⁡2P(x)I = -\log_2 P(x)I=−log2​P(x)

where PPP is the probability function that describes how likely an event would happen.

Previous Next
18 / 147

But then… why log? why base 2? why negative?

Sure, so this function is merely a “model” of what it means to be information.

Previous Next
19 / 147

First, information describes how much it collapses the probability by knowing the event certainly happened.

Certainty means it has a probability of 1.

Previous Next
20 / 147

It used to be uncertain. Now it’s certain. So you must know something. You collapsed the probability!

Previous Next
21 / 147

The result of a coin toss is TTT out of {H,T}\{H, T\}{H,T}? Then we collapsed the probability from 12\frac{1}{2}21​ to 111.

This is the ratio of 2.

Previous Next
22 / 147

The result of two coin tosses was THTHTH out of {HH,HT,TH,TT}\{HH, HT, TH, TT\}{HH,HT,TH,TT}? Then we collapsed the probability from 14\frac{1}{4}41​ to 111.

This is the ratio of 4.

Previous Next
23 / 147

Next, because our computers process information in bits—0s and 1s—we want the unit of information when we confirmed 2 uniform possibilities maybe 0 or 1 to be 1 bit of information.

Previous Next
24 / 147

Knowing one more bit of information means we collapsed an event rarer by half the probability.

This is why we use logarithmic function, whose additions equal the result when inputs are multiplied.

Previous Next
25 / 147

This gives

I=log⁡21P(x)I = \log_2 \frac{1}{P(x)}I=log2​P(x)1​

which is essentially

I=−log⁡2P(x)I = -\log_2 P(x)I=−log2​P(x)

Previous Next
26 / 147

The math is there for people who want to see math so you can forget it now :) poof.

The only thing you need to know is that information means the capability to collapse uncertainty.

Previous Next
27 / 147

Entropy is the measurement of uncertainty. If an event with a probability P(x)P(x)P(x) happens, you would know −log⁡2P(x)-\log_2 P(x)−log2​P(x) bits of information. Summing that up for all events, weighted by how rare each event is, you have entropy:

H=∑xP(x)log⁡21P(x)H = \sum_{x} P(x) \log_2 \frac{1}{P(x)}H=∑x​P(x)log2​P(x)1​

Previous Next
28 / 147

For a coin toss, information you gain is 2 bits for 4 possibilities each with a probability of 14\frac{1}{4}41​.

So the entropy of this system is 2 bits! To collapse the entropy to be 0, you would need to know exactly 2 bits. The math checks out!

Previous Next
29 / 147

Cool! but what does it mean to know that information? Where do you store it?

Previous Next
30 / 147

You don’t store information.

Previous Next
31 / 147

Do you know what you ate yesterday?

Previous Next
32 / 147

Oh?

Previous Next
33 / 147

Hmm…

Previous Next
34 / 147

How do you know that?

Previous Next
35 / 147

Let’s throw you a more abstract question.

Previous Next
36 / 147

Do you know an AND gate?

Previous Next
37 / 147

An AND gate receives two inputs of 0 or 1 and determines “are both inputs 1?”

Previous Next
38 / 147

0, 0 -> are both inputs 1? No! so it returns 0.

Previous Next
39 / 147

0, 1 -> are both inputs 1? No! so it returns 0.

Previous Next
40 / 147

1, 1 -> are both inputs 1? Yes! so it returns 1.

Previous Next
41 / 147

Ok you kind of understand the idea.

How would you describe the rule?

Recall your first class on Logics…

Previous Next
42 / 147

Yes! with a truth table.

Previous Next
43 / 147
AAABBBA∧BA \land BA∧B
000000000
000111000
111000000
111111111
Previous Next
44 / 147

Nice, so you store this cheat sheet and bring it to the exam.

Knowing an AND gate, only costs you 12 bits… right?

Previous Next
45 / 147

000010100111…?

What is this gibberish lmao

Ohh to read a cheatsheet, you need to know how to read it!

Previous Next
46 / 147

Here’s how to read this cheat sheet:

  1. Take the first number-

what is first

Previous Next
47 / 147
  1. To understand first, first is-

What is understand? How do I know I’m reading in the right order? Wait, what are these texts? Do I understand English?

Previous Next
48 / 147

This is an example of Lewis Carroll Paradox or the infinite regression of logical foundation. For every rule-processing device, there must be a foundation that exists beyond that rule.

Previous Next
49 / 147

Therefore, we can hypothesize that you must already have a foundation of how you process your sensory input.

(The extent to how true this is, is really hard to prove rigorously. We will present this as “why does it make sense to say this is true” not “this is the absolute truth”)

Previous Next
50 / 147

Well, yeah obviously, otherwise what you see, hear, smell, feel, or taste would mean nothing to you.

But the paradox was important to bring up too! Right?

Previous Next
51 / 147

Please just let me say I didn’t yap random facts for nothing…

Previous Next
52 / 147

Predictive Coding and “Surprise”

I’m not gonna cite any paper so we’ll just have one wikipedia page here: Predictive coding

Previous Next
53 / 147

In neural networks, we have this concept of “backpropagation”.

Previous Next
54 / 147

To keep you on the same page, neural networks are a type of machine learning model, inspired by the structure and function of the human brain.

Previous Next
55 / 147

And… because we don’t fully understand how the brain works, a more accurate description is that they are inspired by our best guess of how the human brain works — then generalized to be versatile and computationally tractable.

Previous Next
56 / 147

A node in neural networks consists of a function that takes an input, and yields an output.

Previous Next
57 / 147

Another node then takes that output as its input, performs another computation, and sends it down the line.

Previous Next
58 / 147

Eventually, it reaches an observable layer or the result of the prediction.

Previous Next
59 / 147

For example, detecting a number 3 would be a series of nodes that ask questions such as: “Does it have a top curve?”, “Does it have a bottom curve?”, “Is it open on the side”, so on and so forth.

Previous Next
60 / 147

But instead of hard-coding questions to answer to, we just say:

You are responsible for detecting features that will help your subordinates pick the right answers

Previous Next
61 / 147

So if I see my junior being confused between “3” and “8”, my work is to measure the openness of the left side.

Previous Next
62 / 147

Your junior might be confused of “1” and “7”, so your work is a length and angle detection.

Previous Next
63 / 147

Together we comprise the neural network. We don’t assign ourselves with restricted sets of tasks; we see the problems at hand and ask what we can do.

Previous Next
64 / 147

You said you were going to explain what backpropagation is

Right, right. Please be patient!

Previous Next
65 / 147

Let’s say we have a unit of a job: “predict what this number is: 9”

The image is broken down into pixels, each layer detects a feature, sends it down the line, and eventually we have…

Previous Next
66 / 147

8

Previous Next
67 / 147

Oh that’s wrong.

Previous Next
68 / 147

How do you know who’s responsible for this????

Someone has to be responsible for this.

Previous Next
69 / 147

First we would have a measurement of how far off our prediction is.

This is called a loss function.

Previous Next
70 / 147

Hehe. Again, let’s assume my audience knows calculus.

Plugging in our goal, we can see the curve of what path we can take to minimize the loss.

Previous Next
71 / 147

It might not be zero. It might not be the absolute minimum. But we’re finding the location along the loss curve where we can’t optimize further.

Previous Next
72 / 147

We choose the next goal. Then, every department asks themselves

How would I change my contribution in order to reach that goal.

Previous Next
73 / 147

We then send this OKR (objective key results) back to upper and upper management. Readjust our system, and then move on to the next job.

This is backpropagation!

Previous Next
74 / 147

In neurology, we can’t say this is the same with 100% certainty, because the brain is literally studying itself.

Previous Next
75 / 147

But here’s the thing. Scientists claimed that human brains are unlikely to behave that way.

We don’t do back-propagation in a traditional sense.

Previous Next
76 / 147

Backpropagation requires a director to switch between “forward” and “backward” passes. But there is no evidence that our brain pauses to reflect, then predict, then reflect again.

Previous Next
77 / 147

And, again, your brain doesn’t have an arbitrary loss function to compute error, and it cannot compute the derivative chain rules. The brain signals are all-or-nothing and not precise floating points.

…at least by evidence.

Previous Next
78 / 147

In reality, my boss doesn’t wait until we see the final user interface to say that there is a bug in the server implementation.

Previous Next
79 / 147

We have CI/CD. We have unit tests. All “performance reviews” are done continuously, in a tighter loop.

Previous Next
80 / 147

Did I accidentally expose secrets into the version control history?

Of course, I’m going to screw this up. And my boss doesn’t have to wait and see the final result.

Previous Next
81 / 147

This is called “predictive coding” — there is no hard boundary between predicting and learning. The signals are asynchronous and local. Each layer predicts what the next layer will do, and readjusts continuously.

Previous Next
82 / 147

As a matter of fact, in recent years, scientists have tried to remodel deep learning with this concept.

Previous Next
83 / 147

We learn as we predict. We learn as we make errors.

And that’s when we remember not to make mistakes again.

Previous Next
84 / 147

Your memory is your best guess of the reality

Previous Next
85 / 147

From previous sections, we have our hypotheses that

  1. Information is the ability to collapse the uncertainty.
  2. There is a foundational rule processing device that makes sense of the reality.
  3. It’s a complex series of minimizing sensory surprises by adjusting the rules.
Previous Next
86 / 147

So grounding on our foundations, “memory” can’t really physically exist, the same way information doesn’t physically exist.

Previous Next
87 / 147

Knowing or not knowing the result of a coin toss, it doesn’t affect the actual coin. It doesn’t simply go back to spinning just because you didn’t know it.

Previous Next
88 / 147

It can only be your own ability to collapse the countless possibilities of a query.

Memory is your learned set of rules!

Previous Next
89 / 147

Of all the countless possibilities, there is a coffee mug on the table.

You predict: “there is still going to be a coffee mug on the table if I reach my hand out for it”

Previous Next
90 / 147

You grab the coffee mug. Your tactile sensory confirmed your prediction was accurate.

Previous Next
91 / 147

How about a memory test game?

Previous Next
92 / 147

Do you know that pairing game where the images are flipped down and you have to pick two cards to match them?

Previous Next
93 / 147

That is also testing your ability to collapse the uncertainty!

Picking 2 out of 16 cards at random and it’s a match means a 115\frac{1}{15}151​ chance and you collapse that to 111.

Previous Next
94 / 147

Okay! So before this I told you the result of two coin tosses.

What was it?

Previous Next
95 / 147

Hey, what was it?

Previous Next
96 / 147

So now the possibility of 111 exploded back to 14\frac{1}{4}41​.

Forgetting recreates surprise!

Previous Next
97 / 147

Humans subconsciously don’t like surprises.

Previous Next
98 / 147

Some people say what? speak for yourself! I do like surprises.

Previous Next
99 / 147

But that’s just because linguistically surprises have more positive nuances than negatives.

Doctors don’t tell you “surprise! you have 3 months to live.” That’d be cruel.

Previous Next
100 / 147

You just like the positive elements of surprise which usually outweigh the unexpectedness of it.

Previous Next
101 / 147

You like the change of pace because it breaks the monotony. But you don’t really choose to like the change itself — you just like what comes with it.

Previous Next
102 / 147

Subversion is still a big enemy to something that makes prediction all the time.

Previous Next
103 / 147

Think about the wildest dream you’ve ever had. Your sensory perception is completely blacked out — you can’t feel the ground while running. Your memory fills in the gaps, trying to make sense of the chaos.

Previous Next
104 / 147

That’s why in the dream you think “of course that should happen — it happened last time”, but you wake up completely confused because it clearly shouldn’t have.

Previous Next
105 / 147

The changes of neural activity in your brain are subconsciously minimizing the error of your prediction. Good or bad, it makes our expectations match reality.

Previous Next
106 / 147

The actual cost of forgetting

Previous Next
107 / 147

We’ve spent a long time discussing of what it means to remember: to adjust your rule processing device and make sense of the reality.

Previous Next
108 / 147

From the dream example, getting facts wrong is just a pretty goofy and funny moment. It doesn’t hurt you?

Previous Next
109 / 147

Well, it does. Dream has to also keep you asleep and not too excited in order to properly get you on maintenance mode.

Previous Next
110 / 147

And in general,

Forgetting pointlessly is even more dangerous.

Previous Next
111 / 147

Because knowledge doesn’t directly equate behavioral patterns, it simply passes information to the next “layer” to decide.

Previous Next
112 / 147

Suppose there is a mushroom which can be poisonous or not. The probability of eating this species of mushroom and inflicting negative damage to your body is 50%.

Previous Next
113 / 147

50% is simply the distribution of poisonous to non-poisonous mushrooms in this hypothetical universe that we defined it to be.

Reading this, you will never make the simple mistake of confusing probability with possibility. Just because there are two possibilities doesn’t mean the probability distribution is uniform.

Previous Next
114 / 147

Would you eat it?

Previous Next
115 / 147

Your body would want to actively avoid the damage. It doesn’t see that dying with a coin toss probability is a good rate.

Previous Next
116 / 147

However, good news (to some of you) the mushroom is brown!

Suppose you know, brown mushrooms are only 10% poisonous!

Previous Next
117 / 147

Would you still eat it?

Previous Next
118 / 147

Hey, what if it gives you jump boost for 10 minutes?

Would you eat it?

Previous Next
119 / 147

As you can see, knowledge doesn’t directly control behavior. If you saw my proverbs book, you would still judge a book by its cover, and say “147 f**king pages, I’m not gonna read it”

Previous Next
120 / 147

Some say, 10% is not worth the risk, regardless of whatever positive effects you’re going to throw at me.

Previous Next
121 / 147

According to your experience, you might consider the risk of dying to be ∞\infty∞ so as long as the probability is non-zero, there is no way you’re eating it.

Previous Next
122 / 147

To evaluate if an event is beneficial to you or not, your brain would listen to the sensory and chemical signals release by your body.

The “two” systems work in tandem.

(Saying two is very oversimplification but eh, anyway)

Previous Next
123 / 147

You can trace your chain of thoughts, but not why you had that first thought in the first place. It was sent from your subconscious, knowledge layer.

Previous Next
124 / 147

So… according to this model, we can’t really actively forget things. It would mean carelessly maximizing your prediction error, and lowering your chance of surviving.

Previous Next
125 / 147

(Again, not saying that this is true. It’s just hypothesis built on top of another hypothesis that is likely true… Half of science is like that anyway, and the other half is empirically proving or disproving it.

The same way when we believed our brain should be able to “back-propagate”, we did correctly predict different phenomena even though there is new evidence disproving back-propagation in human brain.

So just have fun hehe.)

Previous Next
126 / 147

Past knowledge gets updated through new surviving events. If rewriting a rule improves your survival odds, your brain is happy to forget the old one — or even hallucinate a new one.

Previous Next
127 / 147

What does it mean when we “want” to forget things?

Previous Next
128 / 147

The system glitch is that a high-stress environment — a toxic relationship, overwhelming workload, or failed social interaction — can make you over-prioritize fixing a behavior.

Previous Next
129 / 147

You re-predicted the memory over and over to prioritize fixing your behavior. Your body then reacts to the replayed memory. So your conscious behavior can be readjusted.

Previous Next
130 / 147

But again, it’s your conscious behavior. You know you’re replaying this memory just to inflict pain. What is this non-sense! Some part of you is gonna say

I want to forget it…

Previous Next
131 / 147

Forgetting here doesn’t mean erasing your memory.

It doesn’t mean overriding it with a new knowledge.

Your brain doesn’t allow invasive rewrites like that.

Previous Next
132 / 147

It’s to just be indifferent to the thoughts, and stop your overcorrection.

Previous Next
133 / 147

Consciously calming your nerves down that you are safe.

Previous Next
134 / 147

It’s not easy because that’s what you predicted, and maximizing error is not the goal.

It’s not easy because it saves your lives from danger, and maximizing the risk of not surviving is not the goal.

Previous Next
135 / 147

So the life hack (kinda?) is to replay those memories while you’re safe — to tell yourself: “that rule isn’t relevant anymore. It’s causing me pain. Let’s let it go.”

Previous Next
136 / 147

Can we forget faster?

Previous Next
137 / 147

So until now, we have found one other thing:

Forgetting is just relearning that a rule is wrong. Emotional pain is to synthetically create a comparison that “not knowing this, is way better”

Previous Next
138 / 147

Asking “can we forget faster?” is at least as hard as asking “can we learn faster?”

Previous Next
139 / 147

You can’t learn faster because your brain needs enough examples before it generalizes a rule. Rushing it means drawing the wrong conclusions.

Previous Next
140 / 147

Saying “just cheer up” is like someone handing you a physiology textbook and asking why you’re not a doctor yet.

Learning takes time. You can’t skip the repetitions.

Previous Next
141 / 147

Take your time.

Previous Next
142 / 147

Go create that false danger and subvert that expectations by saying you’re safe.

Previous Next
143 / 147

At one point, you will stop reacting to it. That’s when we forget.

Previous Next
144 / 147

And who knows, that knowledge might be rewritten entirely when you don’t need it to predict the danger anymore.

Previous Next
145 / 147

Sometimes, an event is only worth “a result of coin toss”. It’s just one bit of information that doesn’t make your life easier.

Previous Next
146 / 147

Remembering or forgetting it doesn’t inject back the uncertainty into the coin.

Previous
147 / 147

The world revolves around you, but also around something else when you’re not there.