/braindump/
TODO
s
- learn: Lean 4.
- read: “Functional Programming in Lean”
- read: “Theorem Proving in Lean 4”
- read: “Theorem Proving in Lean”
- learn: more cryptography.
- read: “Moonmath Manual”
- read: “Proofs, Arguments, and Zero Knowledge”
- read: “An Introduction to Mathematical Cryptography”
- read: Shannon’s papers.
blurbs
I can make an argument that the most important theorem in the world is Stokes’ theorem. Let me first give the intuitive statement, followed by a formal enough statement, and explain with examples. Since this is just a short form, I probably won’t stand on the soapbox to make the “greatest theorem” argument today.
Theorem (Stokes’, intuitively): Given an object of arbitrary dimension, and a vector-valued function defined on said object, it must be that summing up interior values of the changes of the vector-valued function inside the object gives the same output as if you summed up the values themselves on the edge of the object.
The formal statement is of course more precise, but at a cost of more hours of mathematical decryption. Nonetheless: Theorem (Stokes’): Let $M$ be a smooth (enough) $n$-dimensional topological manifold with boundary $\partial M$. Let $\omega$ be an $n-1$-form and $d$ the exterior derivative. Then, $$ \int_{M} d\omega = \int_{\partial M} \omega $$
Some quick remarks:
- Manifolds are objects we can make out of potentially higher dimensional playdough; dimension is just the number of unique directions you can walk on the object.
- Differential forms are a bit more than just “vector-valued functions”, though I never said what vector spaces I was referring to and any other equipment it carried. Sometimes to be intuitive to newcomers, you have to lie or withhold information a bit – sorry.
- The exterior derivative is the correct generalization of the derivative you learn about in Calc I and gradient/divergence/curl you learn in III.
- $M$ probably has to be second countable or something, but I truly don’t care to add even more stipulation here. These days I find myself thinking less like a mathematician, more like a physicist.
Okay, so what the hell does any of this mean? What are some examples of this in every day life? “I already closed this blurb and left angry.”
Let’s think of real world examples and come to a conclusion on what this theorem tells us. Consider a line segment $[a,b]$ and consider a function $f\colon [a,b] \to \mathbb{R}$ with derivative $df = \frac{\partial f}{\partial x}dx$. For sake of grounding ourselves in reality, imagine this function $f$ describes a fluid pressure at a given point $x\in [a,b]$ where $[a,b]$ itself could be thought of as a 1-dimensional tube. The derivative of $f$, $df$, then tells us the rate of range of $f$ (pressure) per unit length (the $\partial f / \partial x$) times unit length (times $dx$). If we then accumulate the changes of $f$ relative to the length, $dx$ along all of $[a,b]$, we should know the total change in pressure ($f$) from one end to the other.
Symbolically we compute the above as: $$ \int_{[a,b]}df = \int_a^b \frac{\partial f}{\partial x}dx. $$
There’s a simpler way to do this though, I’d argue. Some of you just realized it from the symbols above, a handful of you from the description I provided above, and I’ve likely confused the rest. Nonetheless, let’s ignore the symbols.
Physically, we know something. If the interfaces of this tube are solely at $a$ and $b$, then the flow in the tube is completely defined by the pressure differences at that interface. (For those of you with background in electromagnetism, consider the sentence I just said but with voltage and current replacing pressure and flow respectively.) That is, fluid flows from interfaces of higher pressure to lower – if you’ve ever used a straw or a vacuum cleaner, you know this. If this is to be true, then adding up the internal pressure changes from $a$ to $b$ cannot possibly deviate in result. That is, unless some magical force existed inside the tube somewhere.
If all I did was add the changes of $f$ from $a$ until I got to $b$, certainly I could just have looked at $f(a)$ and $f(b)$ and found their difference to know how $f$ changed over this whole tube. Symbolically: $$ \int_{\partial [a,b]} f = \int_{{a,b}}f = f\vert_a^b = f(b) - f(a) $$ and we realize: $$ \int_{[a,b]}df = \int_{\partial [a,b]} f $$ or in other words: $$ \int_a^b \frac{\partial f}{\partial x}dx = f(b)-f(a) $$ the fundamental theorem of calculus.
The magical force? It can only exist if $df$ wasn’t the derivative of a function $f$.
going up a dimension
One dimension??? Boring! Two dimensions? Now we’re talking!
Consider the same scenario, but now instead $M=[a,b]$, we take $M=\mathbb{B}^2$, the 2-dimensional unit disk. In this case, consider $\mathbf{v}$ to be a fluid velocity (vector vield). If I have this $\mathbf{v}$ defined in the disk $\mathbb{B}^2$, I can compute how this field diverges $\nabla \cdot \mathbf{v}$. Briefly, divergence $\nabla \cdot \mathbf{v}$ tells me a value at each point $x\in \mathbb{B}^2$ for how quickly a fluid particle placed at $x$ would accelerate radially away from that point on average. Said differently, a positive value of divergence indicates a point where there is a source of fluid incoming to the region whereas a negative divergence indicates a sink of fluid (i.e., where fluid drains from the region).
Adding up this average radial (outward) acceleration or the sources/sinks of fluid in $\mathbb{B}^2$ can be done like so: $$ \int_{\mathbb{B}^2}\nabla \cdot \mathbf{v}. $$ However, if we knew all the amount of the fluid entering and leaving in the inside of the disk, we know that in a closed system, that much fluid must also pass through the boundary of the disk as well.
Play the experiment in your head with a simple setup. Consider two parallel glass plates held next to each other with just a small air gap between them. Cut a hole in one plate, and pump in air through that hole. On the outside of these plates, you would feel air escaping outward, right? Certainly, and I mean certainly, the amount of air coming out the outer edge of the plates must be the same that you pump in to the center, right? If not, where is it going?
Alas, this is true. Not just physically, but mathematically so. Stokes’ theorem tells us! It must be that $$ \int_{\mathbb{B}^2}\nabla \cdot \mathbf{v} = \int_{S^1} \mathbf{v}\cdot \mathbf{n} $$ where $S^1$ is the boundary circle to the disk $\mathbb{B}^2$ and $\mathbf{n}$ is the direction vector perpendicular to the boundary circle $S^1$. That is, the right hand side integral counts how much fluid is flowing out of the disk through the edge. If there are sources in the interior, then a positive amount of fluid flows out the edges, and vice-versa.
The general result is massive. Stokes’ theorem doesn’t just give us the fundamental theorem of calculus in dimension-1 and the divergence theorem in dimension-2, it gives us a result that is true in arbitrary dimension. It also tells us some high-caliber relationships such as Poincare (-Lefschetz) duality, Taylor’s theorem, provides us a way to write down global gauge symmetries, rears its head in elliptic theory to provide insight on fundamental mathematical physics concepts like fluid dynamics, electromagnetism, and heat flow, and is immensely useful in geometric inverse problems as it provides a means to extract interior data from non-invasive boundary data.
Let this just be step one of this discussion. There’s a lot more to say, and I probably didn’t even give this the best explanation or glory in this 20 minute speedrun blurb.
The Fourier transform is such a general and ubiquitous entity in mathematics. One of my goals is to tie up some loose ends I had in my understanding of all of it. So, we’ll start with at least something here. As always, there’s a tendency to lack a bit of rigor and devout formalism in these blurbs, so take what I’m saying with a grain of salt and DYOR (I hope this pushes you to DYOR!).
What is perhaps the most basic property of the Fourier transform (or anything like it)? The key concept is algebraic in that the Fourier transform is a isomorphism of algebras that maps convolution to pointwise multiplication which is given in the convolution theorem. In precise settings, you may get more from this property than you might expect.
Let $G$ be a group, then the Fourier transform is an isomorphism of algebras $\mathcal{F} : L^2_\mathbb{F}(G) \to L^2_\mathbb{F}(\widehat{G_\mathbb{F}})$ where $\widehat{G_\mathbb{F}}$ is the dual group of $G$. Briefly, I am obsessively tagging these items with subscripted $\mathbb{F}$ whereas you may see nothing of the sort elsewhere. What are all of these things?
-
$L^2_\mathbb{F}(G)$ is the Hilbert space of square-integrable $\mathbb{F}$-valued functions on $G$ where $\mathbb{F}$ is some field (some of this can be done with a ring $R$ instead and some of this is easiest with $\mathbb{F}=\mathbb{C}$) – note that I am not really assuming that $G$ is some smooth group or anything as I will mean integration in a very general sense (e.g., sums in discrete land).
-
The dual group $\widehat{G}_\mathbb{F}$ is the set of all (continuous) characters $$\chi : G \to \mathbb{F}.$$ The additional important restriction here is that characters are invariant over conjugacy classes in $G$, that is $\chi(hgh^{-1})=\chi(g)$ We will get to details of characters later on.
First, we see that $L^2_\mathbb{F}(G)$ is an algebra under convolution (as multiplication) and pointwise addition. Let $u, v \in L^2_\mathbb{F}(G)$, then the convolution of $u$ and $v$ written as $u \ast v$ is defined by: $$(u \ast v)(g) = \int_{h \in G} u(h) v(gh^{-1}) d\lambda(h)$$ where, concatenation is the multiplication in $\mathbb{F}$, and we write the group operation in the group $G$ as concatenation with inverse as the superscript $-1$. Further, I have appended a measure on $G$ via $d\lambda$ which is a translation invariant measure on $G$ called the Haar measure. This means $d\lambda(gh) = d\lambda(g)$ for all $h \in G$ which, in a sense, means that $\lambda$ sees all of $G$ as homogeneous (there’s no lumpiness or concentration of mass in $G$ that $\lambda$ sees).
Now, the convolution theorem states that the Fourier transform $\mathcal{F}$ is an isomorphism of algebras in that $$ \mathcal{F}(u\ast v) = \mathcal{F}(u) \cdot \mathcal{F}(v) $$ where $\cdot$ is pointwise multiplication in $L^2(\widehat{G}_\mathbb{F})$. This is where we want to construct since this property is immensely powerful.
Also very important is that the mapping $L^2_\mathbb{F}(G) \times L^2_\mathbb{F}(G) \to \mathbb{F}$ via: $$ \langle u, v\rangle \coloneqq \frac{1}{|G|}\int_{g \in G} u(g)^{\textrm{op}} v(g) d\lambda (g) $$ is an inner product on $L^2_\mathbb{F}(G)$ and $\textrm{op}$ signifies a relevant adjoint used if necessary (we can always construct this inner product and they will always be proportional see here which will be helpful later too). This is where the conjugacy class invariance will really shine.
Here’s where things get amazing. Let’s use the following theorem which is, in a sense, a generalization of Maschke’s theorem for finite groups which can be proven using the Wedderburn-Artin theorem.
Theorem (Peter-Weyl): We have that $$ L^2_\mathbb{F}(G) = \overline{\bigoplus_{\pi \in \Sigma}} E_\pi^{\oplus \dim E_\pi} $$ where $\Sigma$ is the set of isomorphism classes of irreducible representations of $G$ and the overline denotes the closure in $L^2_\mathbb{F}(G)$.
Now, if we can just find a way to project onto the irreducible representations, we can decompose any function in $L^2_\mathbb{F}(G)$ into its characteristic components and with any luck, we tie up some loose ends in my above ramblings. The means to do so is using another immense result. First, our notion of characters was as conjugacy class invariant maps $\chi \colon G \to \mathbb{F}$ which (when they’re morphisms) pair together with perfectly with the translation invariant measure in our inner product along with the following result give us the power we need:
Lemma (Schur): Let $R$ be a ring and $M, N$ be simple $R$-modules. Let $\varphi \in \text{Hom}_R(M, N)$, then $\varphi$ is either an isomorphism or the zero map.
If we look above, we see that $L^2_\mathbb{F}(G)$ is split into non-isomorphic irreducible representations which just so happen to be simple $\mathbb{F}[G]$-modules (a brief on the relationship here).
Then, we can project onto these irreducible representations using the irreducible characters of $G$ since these characters are exactly the functions that split $L^2_\mathbb{F}(G)$ as:
Corollary: The projection of $u \in L^2_\mathbb{F}(G)$ onto the irreducible representation $E_\pi$ is given by virtue of the orthogonality of characters as: $$ \langle \chi_\pi, \chi_{\pi'} \rangle = \frac{1}{|G|}\int_{g \in G} \chi_\pi(g)^{\textrm{op}} \chi_{\pi'}(g) d\lambda(g) = \begin{cases} d_{\pi \pi'} & \pi = \pi' \\ 0 & \pi \neq \pi' \end{cases} $$ where $d_{\pi \pi'}\neq 0$ for all $\pi, \pi'$.
Hence, we now have a mapping $\mathcal{F} : L^2_\mathbb{F}(G) \to L^2_\mathbb{F}(\widehat{G}_\mathbb{F})$ which is an isomorphism of algebras and we can project onto the irreducible representations of $G$ using the characters of $G$. In particular, let $u\in L^2_\mathbb{F}(G)$ then the map $\mathcal{F}$ is given by: $$ \mathcal{F}(u) = \int_{\pi \in \Sigma} \langle u, \chi_\pi \rangle \chi_\pi $$ which now is a sum of $\mathbb{F}$-linear combinations of the irreducible characters of $G$ which are exactly elements of $\widehat{G}_\mathbb{F}$ which form a basis for $L^2_\mathbb{F}(\widehat{G}_\mathbb{F})$.
The inverse Fourier transform is given by: $$ \mathcal{F}^{-1}(\mathcal{F}(u)) = \int_{\pi' \in \Sigma} d_{\pi \pi'}^{-1}\left\langle \int_{\pi \in \Sigma} \langle u, \chi_\pi \rangle \chi_\pi, \chi_{\pi'} \right\rangle \chi_{\pi'} = u $$ where we have used the orthogonality of characters in the inner product.
This shows that the Fourier transform is an isomorphism of $\mathbb{F}$-vector spaces, but we need to show that it is an isomorphism of algebras. Now, if we take specifically the component of $u \ast v$ in the irreducible representation $E_\pi$, we have that: $$ \begin{align*} \mathcal{F}_{\chi_\pi}(u \ast v) &= \langle u \ast v, \chi_\pi \rangle \\ &= \left \langle \int_{h \in G} u(h) v(gh^{-1}) d\lambda(h), \chi_\pi\right \rangle \\ &= \int_{l \in G} \int_{h \in G} u(h) v(l h^{-1}) \chi_\pi(l) d\lambda(h) d\lambda(l) \end{align*} $$ and we realize that, to get to our intended answer, we need something slightly stronger. In particular, if $\chi \colon G \to \mathbb{F}$ is multiplicative (i.e., $\chi(gh)=\chi(g)\chi(h)$), then we have that we can do a change of variables $l \mapsto k = lh$ and we get: $$ \begin{align*} \mathcal{F}_{\chi_\pi}(u \ast v) &= \int_{l \in G} \int_{h \in G} u(h) v(l h^{-1}) \chi_\pi(l) d\lambda(h) d\lambda(l) \\ &=\int_{k \in G}\int_{h\in G} u(h) v(k h^{-1}) \chi_{\pi}(k) d\lambda(h) d\lambda(k)\\ &= \int_{k \in G} \int_{h \in G} u(h) v(l) \chi_\pi(h)\chi_\pi(l) d\lambda(h) d\lambda(k) & \text{by multiplicativity} \\ &= \int_{h \in G} u(h) u(h) \chi_\pi(h) d\lambda(h) \cdot \ \int_{l \in G} v(l) \chi_\pi(l) d\lambda(l) & \text{by translation invariance of $\lambda$}\\ &= \mathcal{F}_{\chi_\pi}(u) \cdot \mathcal{F}_{\chi_\pi}(v) \end{align*} $$
Now, we have a nice result that:
Theorem: Let $G$ be an abelian group. Then the irreducible representations are 1-dimensional and the Fourier transform is an isomorphism of algebras. That is, all characters of irreducible representations are multiplicative. Lastly, $d_{\pi \pi'} = 1$ for all $\pi, \pi'$.
Let’s try to think of a function $\delta$ where we can compute the convolution with any function $u$ and get back $u$, i.e., $u \ast \delta = u$ where $v$ is the multiplicative identity element of the convolution algebra. By definition, we have: $$ (u \ast \delta)(g) = \int_{h \in G} u(h) \delta(gh^{-1}) = u(g). $$ Note that we can also write $\delta_g(h) = \delta(gh^{-1})$ by which we see that $\delta_e(h)=\delta(h)$ where $e\in G$ is the identity of $G$. Our question now is, given some group $G$, what is this $\delta$?
Given now that we have the Fourier transform as a an algebra isomorphism $\mathcal{F} \colon L^2_\mathbb{F}(G) \to L^2_\mathbb{F}(\widehat{G_\mathbb{F}})$, then we have: $$ \mathcal{F}(u \ast v) = \mathcal{F}(u) \cdot \mathcal{F}(v) $$ and: $$ \mathcal{F}(u) = \mathcal{F}(u \ast \delta) = \mathcal{F}(u) \cdot \mathcal{F}(\delta) = \mathcal{F}(u) $$ and we see that the isomorphism (as is required) drags the identity in $L^2_\mathbb{F}(G)$ to the identity in $\mathbb{F}$.
Given our results, we can use the inverse Fourier transform to find this $\delta$. In particular, we have that the multiplicative identity in $L^2_\mathbb{F}(\widehat{G_\mathbb{F}})$ is $1$ and hence we have: $$ \begin{align*} \mathcal{F}^{-1}(1) &= \int_{\pi \in \Sigma} \langle 1, \chi_\pi \rangle \chi_\pi \\ \end{align*} $$
This is where I’ll leave off, but we have set the stage for a lot more. I’ll point to Pontryagin duality which we have almost come to here which is closely related to space and quantity duality. For the latter point, we see that in a space of functions (quantities) with rich algebraic structure like the multiplicative characters $\chi \colon G \to \mathbb{F}$, we can identify the space itself such as a $g \in $G$ with quantities $\delta_g \in L^2_\mathbb{F}(G)$. In the best cases, we can topologize the set of such $\delta_g$ and recover the group $G$ itself.
It is possible to do this same sort of thing for non-group algebras and the like, but I will leave that for another time. In case you’re interested, see the Gelfand representation for a generalization of this to non-group algebras (and note, you don’t really need to use $\mathbb{C}$ in the statements you see there).
I have been working with Lean 4 for a few days now and I have been enjoying it quite a bit. My current goal is to extend some of the very recent work that proves the group law on elliptic curves. It may be a bit ambitious, but I like the idea of a challenge and the upshot is well worth it in my opinion.
Essentially, what has been shown now is that the points on an elliptic curve form a group, and this is defined in Mathlib/AlgebraicGeometry/EllipticCurve/Group.lean
, specifically
noncomputable instance instAddCommGroupPoint : AddCommGroup W.Point where
nsmul := nsmulRec
zsmul := zsmulRec
zero_add := zero_add
add_zero := add_zero
add_left_neg := add_left_neg
add_comm := add_comm
add_assoc := add_assoc
So, we know that the points on an elliptic curve form an abelian (additive commutative) group.
From here, if we wanted to use this for elliptic curve cryptography (ECC) we would specifically want to restrict to the case where the field for the curve is a finite field.
On my end, I’ve worked on this a bit to get an instance of a Group
from the points on an elliptic curve over a finite field.
See this snippet:
def ECPoints {k : Type} [Field k] [Fintype k] (E : EllipticCurve k) : Type := E.toAffine.Point
noncomputable instance ECPointsCommGroup {k : Type} [Field k] [Fintype k] (E : EllipticCurve k) : AddCommGroup (ECPoints E) :=
WeierstrassCurve.Affine.Point.instAddCommGroupPoint
noncomputable instance ECPointsGroup {k : Type} [Field k] [Fintype k] (E : EllipticCurve k) : Group (ECPoints E) :=
{
one := AddCommGroup.toAddGroup.zero,
mul := AddCommGroup.toAddGroup.add,
inv := AddCommGroup.toAddGroup.neg,
one_mul := AddCommGroup.toAddGroup.zero_add,
mul_assoc := AddCommGroup.toAddGroup.add_assoc,
mul_one := AddCommGroup.toAddGroup.add_zero,
mul_left_inv := AddCommGroup.toAddGroup.add_left_neg,
}
One useful fact for us if we want to use this for cryptography is the following theorem:
Theorem: The group of points on an elliptic curve over a finite field is cyclic or the product of two cyclic groups.
One can find a reference here for this theorem. Anyway, it took me a bit, but we had to create a new proposition in Lean which defines the notion that a group is the product of two cyclic groups:
def IsProductOfTwoCyclicGroups (G : Type*) [Group G] : Prop :=
∃ (H K : Subgroup G), IsCyclic H ∧ IsCyclic K ∧ Nontrivial H ∧ Nontrivial K ∧ Nonempty (G ≃* H.prod K)
but once we had this, we were ready to state our theorem:
theorem point_group_is_cyclic_or_product_of_cyclic {k : Type} [Field k] [Fintype k] (E : EllipticCurve k) [Group (ECPoints E)] :
IsCyclic (ECPoints E) ∨ IsProductOfTwoCyclicGroups (ECPoints E) := by
sorry
Right now, this remains unproven in Lean (the sorry
keyword is a placeholder for a proof), but I am excited to work on this and see where it goes.
Once this is proven, it would be nice to refine this with other results in order to be able to classify which curves are cyclic groups and which are a product.
Further, we would then want to be able to construct the generators for those groups in a formal way.
This would allow us to then do some cryptography!
I will continue to work in my Autoparallel/shred
repository, so if you want to see progress or contribute, feel free to check it out!
Today I made my first attempts to work with Lean 4. I wanted to define a type and prove some properties about it. Here are some of my thoughts about it so far.
First, I had some gripes trying to work with the Lean 4 environment itself.
I followed the instructions from the Lean 4 manual exactly.
For one, opening up a .lean
file and typing: import
sometimes yields a menu that shows me available imports though usually I am immediately met with a red underline and an error message.
Sometimes I am able to see the imports, e.g., I can type import Mathlib
and, when lucky, I see a list of available imports.
Yet past that I still found trying something like import Mathlib.Top
would filter and show me imports with this prefix, but it did not seem to work with me.
Nonetheless, I am sure I can fix this problem and the Zulip chat seems to be extremely active!
Next, I have found the filesystem organization to be a bit confusing (though maybe it’s just a skill issue).
I have become so spoiled with Rust and Cargo that I may just find it difficult to work with a new system.
Not having the ease of a package manager like Cargo is a bit of a bummer as you have to stub something like: require mathlib from git "https://github.com/leanprover-community/mathlib4"
in your lakefile.
Lastly, I had some kind of issue where I could write a theorem
using some syntax, but not a lemma
with the same syntax.
I’m not quite sure why that’s the case, but I’ll have to read more.
Please don’t take any of this as a shot at Lean itself, I’m just finding this to be a bit more challening than I was hoping, but I am sure I will get the hang of it!
I look forward to this challenge and learning with the community!
Now, onto the good stuff. Defining things like types and functions is a breeze. For instance, I defined the following:
def PosNat := { n : Nat // n > 0 }
def prev (n : PosNat) : Nat := n.val - 1
which is the type of positive natural numbers and a function that returns the predecessor of a positive natural number as a natural number. From this, I was challenging myself to prove some statements about this type with this function.
This felt like I was catupulted back to my undergraduate days when I was first learning about proofs. Because I was too eager, I wanted to immediately jump to proving that this map was injective and surjective, but that was a bit too much for me. Instead, I took a step back and wanted to prove a helpful statement that would ideally help me out with the injective and surjective proofs. Let’s take a look
Let’s write the statement formally here:
Theorem: For a positive natural number $n$, the previous value $n-1$ is equal to 0 if and only if $n$ is equal to 1.
Now, let’s see how we write this in lean:
theorem prev_eq_zero_iff_eq_one (n : PosNat) : prev n = 0 ↔ n.val = 1 := by
apply Iff.intro
{ intro h
have h1 : n.val - 1 = 0 := h
have h2 : n.val = 1 := by
calc
n.val = n.val - 1 + 1 := by rw [Nat.sub_add_cancel n.property]
_ = 1 := by rw [h1]
exact h2
}
{ intro h
have h1 : prev n = 0 := by
calc
prev n = n.val - 1 := by rfl
_ = 1 - 1 := by rw [h]
_ = 0 := by rfl
exact h1
}
Now, let’s break this down:
- We start by applying the
Iff.intro
tactic to split the proof into two directions. The first set of brackets needs to prove thatprev n = 0
impliesn.val = 1
and the second set of brackets needs to prove thatn.val = 1
impliesprev n = 0
. - (Prove first direction) We can note that
n.val - 1 = 0
is granted to us by our initial assumption here and by the definition ofprev
. We follow now by showing thatn.val = 1
by doing a calculation. Specifically, this calculation is the math: $$ n = n - 1 + 1 = 1 $$ where we can cancel $-1 + 1$ using the resultNat.sub_add_cancel
given the property ofn
which is thatn.val > 0
. All of this yields the first direction. - (Prove second direction) This one is a bit easier as we are showing that
n.val = 1
impliesprev n = 0
. We have our initial assumption and then we want to showprev n = 0
by calculation again. This time, we have: $$ n - 1 = 1 - 1 = 0 $$ with the final equality given to us by therfl
tactic which, in this case, extends the equality by reflexivity $1=1$ to yield $1-1 = 0$.
We have now proven something, albeit very simple! What I find really interesting about this is that I can now take this type I’ve just defined, and work with it programmatically knowing that these features about the type are provably true. Being able to both prove theorems in math formally is and use them for programs is insanely powerful!
I have touched on the topic of space and quantity in both sheaves are probes and deep isometry to an extent, but I want to revisit it. The idea is remarkably beautiful, in essence we can think of spaces as presheaves (contravariant functors) on a category $C$ (into the category of sets). Succinctly, this is because all it takes to define a space is the means in which you can probe them and this is exactly what a (pre)sheaf does. For this, we say that a space is modeled on the category $C$. In order to not lose concreteness, we can still think of spaces as objects like topological spaces – this is strictly more general. Also, probes we should think of as a means of mapping into a space.
On the other hand, quantity should reflect some kind of measurement out of a space into some set we know how to interpret measurements in. For instance, if I have the set of all line segments $\mathcal{S}$, then a valid measurement (or quantity) is a function $f \colon \mathcal{S} \to \mathbb{R}$ which assigns a real number to each line segment. An example of a valid quantity would be $f([a,b]) = b - a$. Categorically, if a space is modeled on $C$, this is just a covariant functor $C$ into the category of sets.
But how is this useful? What the hell is going on? Let’s take it just one more step. It turns out that there is a notion of duality called Isbell duality which gives us this extremely powerful and beautiful relationship between spaces and quantities. Namely, there is a duality between spaces themselves and quantities on those spaces that allows one to determine the other (at least in certain cases!).
A concrete example for us would be the following. Consider a smooth compact surface $X$ as a space. On $X$, we can consider all of the holomorphic functions on that surface $H(X)$ as a set of $\mathbb{C}$-valued quantities. Now, it turns out that $H(X)$ itself is a ring. We can compute the maximal ideals of this ring and put them into the set $\mathcal{I}(X)$.
Real fast, for concreteness, these ideals in $\mathcal{I}(X)$ are just the set of all functions that vanish at a single point. Note a holomorphic function on a compact surface may only vanish at finitely many points since zeros of holomorphic functions are isolated and we have the Bolzano-Weierstrass theorem. Then, for an element to be a maximal ideal, it must only vanish at a single point, not two or more.
Amazingly, there is a holomorphic function $f \colon X \to \mathbb{C}$ that vanishes at any given point $x \in X$ and so the set of maximal ideals $\mathcal{I}(X)$ is in bijection with the points of $X$. Finally, given the right topology on $\mathcal{I}(X)$, we can recover the topology of $X$ from the set of maximal ideals. That is to say, $\mathcal{I}(X) \cong X$ (homeomorphism or even conformal equivalence) and we have realized that quantities and spaces are dual to each other and can even be used to recover one another.
The concept of Isbell duality or this general duality between space and quantity is rather new to me and I wish I would have known it when I was trying to do inverse problems. Put this tool in your toolbox.
I wrote a blog on algebra and computation that took a deep dive into the sum and product types in programming languages. Briefly, I want to show you a corollary from this discussion that yields some insight for how you can structure some code.
Something that commonly happens is that we have two data types that have shared components that we want to work with. Let’s take an example here:
struct Home {
home_type: String,
street_address: String,
city: String,
state: String,
zip: String,
}
struct Office {
business_name: String,
street_address: String,
city: String,
state: String,
zip: String,
}
These are product types in Rust. Briefly, product types are a means of enforcing that all the data is present as it is required for construction and they give us a way to have the data come in “chunks” that we can work with. The structure of products is such that they can be split and therefore resused, nested, and extended for convenience and modularity. Those are powerful perks that we want.
For instance, above, we can see that Home
and Office
share some components, so given these are product types, we can refactor into the following:
struct Address {
street_address: String,
city: String,
state: String,
zip: String,
}
struct Home {
home_type: String,
address: Address,
}
struct Office {
business_name: String,
address: Address,
}
What we can notice here is that since we are nesting a product type in another product type, we can now build extensible structures that can be easily manipulated. If need be, we can handle addresses separately, or construct new types from here that contain an address, e.g.:
pub struct Apartment {
unit_number: String,
address: Address,
}
pub struct School {
school_name: String,
address: Address,
}
One can then know that functionality can be yielded from product types that nest in the Address
type like so:
trait Addressable {
fn get_address(&self) -> Address;
}
In summary, if you see shared context in products, it can be excised and made into its own product type, then it can be nested in the original product types.
It points to the fact that some things may just need to have the Address
type attached to them, and now this can be enforced with something like the trait above.
On the other hand, what if we had two sum types that shared some components? Let’s take an example here:
enum SharedSpaces {
School(School),
Office(Office),
Apartment(Apartment),
}
enum Residential {
Home(Home),
Apartment(Apartment),
}
These are coproduct (sum) types (really, enumerations) in Rust. These in particular have an almost opposite data structure to products. Instead of all components being occupied, these enforce only a single variant is occupied. So, if that is the case, it tells us that these types should not be split up since combining these enumerations yields a strictly more flexible and general type where, instead of the extra structures being reused, the extra structures can now be easily avoided if we didn’t even need them. Sums, in essence, want to be combined and not split as we get a means of ducking out of the extra data if we don’t need it. For example, we combine like so:
enum Building {
Home(Home),
Office(Office),
Apartment(Apartment),
School(School),
}
and if we wanted to match only on the residential types, we can do so:
fn is_residential(building: Building) -> bool {
match building {
Building::Home(_) | Building::Apartment(_) => true,
_ => false,
}
}
where the _
allows us to ignore the extra data that we don’t need which is a powerful feature of sum types.
In short:
- product types with shared context should be broken into shared factors and nested for extensibility and modularity since they can be easily attached to one another.
- sum types with shared context should be combined into a single instance for generality and flexibility since they have can easily duck out of extra data.
When we first learn linear algebra we are (hopefully) taught how to find eigenvalues and eigenvectors for a matrix so that we can “diagonalize” it. Have we been told more places that we do this?
Diagonalization in finite dimensional inner product spaces
Let’s remind ourselves of the concept and it’s utility in finite dimensional land. Let $L \colon V \to V$ be a linear operator on a finite dimensional vector space $V$ over a field $\mathbb{F}$ with an inner product $$ \langle \_ , \_ \rangle \colon V \times V \to \mathbb{F}. $$ An example inner product is the Euclidean dot product $\cdot$ which for $\R^n$ is defined as: $$ \langle \boldsymbol{x}, \boldsymbol{y} \rangle = \boldsymbol{x} \cdot \boldsymbol{y} = x_1 y_1 + \cdots + x_n y_n. $$
We say $\boldsymbol{e}$ is an eigenvector of $L$ with eigenvalue $\lambda \in \mathbb{F}$ if: $$ L \boldsymbol{e} = \lambda \boldsymbol{e}. $$ If there exists a basis $\boldsymbol{e}_1, \ldots, \boldsymbol{e}_n$ of $V$ such that each $\boldsymbol{e}_i$ is an eigenvector of $L$ with eigenvalue $\lambda_i$, then we say $L$ is diagonalizable.
Perhaps most familiarly, we can represent $L$ as an $n \times n$ matrix $[L]$ and when it is diagonalizable, we can define $[P]$ to be the change-of-basis matrix mapping the assumed basis we built $[L]$ from into the eigen-basis (basis of the eigenvectors of $L$). When done properly, we get $[D] = [P]^{-1} [L] [P]$ where $[D]$ looks like: $$ \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{bmatrix} $$ hence the name “diagonalizable” for $L$.
Why was this nice? First a theorem:
Theorem: If $L$, then the basis of eigenvectors of $L$ is orthogonal.
So, without loss of generality, we can assume that $\boldsymbol{e}_i$ are chosen to be normalized to length 1 and orthogonal to each other, i.e., in symbols: $$ \langle \boldsymbol{e}_i, \boldsymbol{e}_j \rangle = \delta_{ij}. $$
Now, with all of this, if we get a problem like this: $$ L \boldsymbol{x} = \boldsymbol{y} $$ for some $\boldsymbol{y} \in V$, then writing this out in terms of the eigen-basis: $$ L (x_1 \boldsymbol{e}_1 + \cdots + x_n \boldsymbol{e}_n) = y_1 \boldsymbol{e}_1 + \cdots + y_n \boldsymbol{e}_n \\ \Rightarrow \lambda_1 x_1 \boldsymbol{e}_1 + \cdots + \lambda_n x_n \boldsymbol{e}_n = y_1 \boldsymbol{e}_1 + \cdots + y_n \boldsymbol{e}_n $$ where, critically, we have this identity due to our theorem: $$ y_i = \langle \boldsymbol{y}, \boldsymbol{e}_i \rangle. $$ This means that our solution is just: $x_i = y_i/\lambda_i$.
Notice that if $\lambda_i =0$, then there is no $x_i$, but this is expected. Namely, if $\lambda_i = 0$, then $\boldsymbol{e}_i$ is in the kernel of $L$ and so $L \boldsymbol{e}_i = 0$ and we could not have found an element of the domain that maps to $\boldsymbol{y}$. We can rid of this problem if we assume that $L$ is definite (or, even weaker, invertible).
Diagonalization in infinite dimensional inner product spaces
What if we ignore the finite dimensional restriction? Let’s take an example. Let $I = [0,1] \subset \R$ be the unit interval and consider the set of real valued smooth functions: $$ C^\infty(I)= \{ f \colon I \to \R ~\vert~ f \textrm{ is smooth} \}. $$ Note that $C^\infty(I)$ is an (infinite dimensional) vector space over $\R$.
Now, let’s let $L= \frac{d^2}{dx^2}$ be the second derivative operator (also called the Laplacian). This operator is indeed linear and furthermore it is a diagonalizable (or, self adjoint) and (negative) definite operator on $C^\infty(I)$. Now, how can I show you this? We just consider the eigenvalue problem: $$ \frac{d^2}{dx^2}f = \lambda^2 f. $$ which is often called the Helmholtz equation. Note that $\varphi_\lambda(x)= e^{\lambda x}$ solves this expression for any $\lambda$. Hence $\varphi_\lambda(x)$ is an eigen“function“ of $L$ with eigenvalue $\lambda^2$.
But I’ve dropped two key ingredients that we need right now:
- We need an inner product on $C^\infty(I)$.
- We need boundary conditions since it doesn’t even make sense to consider $\frac{d^2}{dx^2}$ on the endpoints $\{0,1\}$.
For the first, we can define the inner product in the same way we define the Euclidean dot product. That is, we just multiply each component of each vector together, then add them up. In continuum land, this is just this integral: $$ \langle f, g \rangle = \int_0^1 f(x) g(x) dx. $$ Nice, right?
Let’s now consider the second point. On the endpoints of $I$, we can consider the following boundary conditions:
- Periodic boundary conditions;
- Dirichlet boundary conditions;
- Neumann boundary conditions;
- Robin boundary conditions.
To keep things simple, let’s use periodic boundary conditions which you can imagine take the endpoints $\{0,1\}$ of our line segment $I$, and glue them together making a circle. Formulaically, this just implies that we are looking at the space: $$ C_\textrm{p}^\infty(I) = \{ f \colon I \to \R ~\vert~ f \textrm{ is smooth and } f(0) = f(1) \}. $$ Coming back to our previous answer where we found the eigenfunctions to be $\varphi_\lambda(x) = e^{\lambda x}$, we can now see that the eigenfunctions with the new boundary conditions are: $$ \varphi_{n}(x) = e^{2\pi i n x} $$ for $n \in \Z$. Feel free to check this work and be skeptical that we now have complex eigenfunctions. Ignore the complex numbers for now, they are just a technicality (you sometimes add them in to diagonalize matrices too). I will just adjust our inner product slightly so that we take the complex conjugate $*$ of the first function, i.e.: $$ \langle f, g \rangle = \int_0^1 f(x)^* g(x) dx. $$ Okay great. Now, can we do the same trick we did before to solve problems?
Absolutely. Note that just as before $\{\varphi_n\}$ is an orthonormal basis of $C_\textrm{p}^\infty(I)$. Hence, if we have a problem like this: $$ L f = g $$ then we can write this out in terms of the eigenbasis: $$ L \sum_{n \in \Z} f_n \varphi_n(x) = \sum_{n \in \Z} g_n \varphi_n(x)\\ \Rightarrow \sum_{n \in \Z} \lambda_n^2 f_n \varphi_n(x) = \sum_{n \in \Z} g_n \varphi_n(x) $$ where again $g_n = \langle g, \varphi_n \rangle$. Lo and behold we should probably write this last bit out: $$ g_n = \int_0^1 g(x) e^{2 \pi i n x} dx $$ which is often called the Fourier transform of $g$. Just to be clear, because we’re on a compact set, the $g_n$ are a discrete set of numbers. All in all, this means our solution to the problem is: $$ f_n = \frac{g_n}{4 \pi n^2}. $$ Incredible, really!
There’s a lot more to be said here. We can do this sort of thing for any (definite) linear operator on any (Hilbert) space. I’ll come back around at some point to show the extension of this and how you can use this same concept to solve problems like those in electromagnetism.
When I was working on my dissertation (I’m still angry one of my figures was not rendered properly on this) I was captivated by the question of what measurements of a space $X$ are needed for you to recover the space. In particular, I was curious about whether or not you could recover a Riemannian manifold $(M, g)$ from the Dirichlet-to-Neumann operator (more generally a Poincaré-Steklov operator on the boundary of $M$, $\partial M$. This general problem is often called the Calderón problem Physically, this is realized in the problem of asking: “Can we determine what the inside of a body looks like by applying a voltage to the surface, and measuring the outgoing current?” This physical problem is called the electrical impedance tomography problem and the generalization to Riemannian manifolds of arbitrary dimension was what I wanted to solve, given it has not been solved in any dimension higher than two (for non-analytic manifolds, that is).
Over the years, I made headway on the problem but it still remains unsolved. The main approach I took was to generalize (part of) the Boundary Control (BC) method which can be summarized in the following bullet points (WARNING: Lots of jargon ahead, feel free to skip the bullet points and go to $\star$):
- Take the space of (real) spinor fields $C(M; \mathcal{G}^+)$ and note that this is a $C^*$-algebra.
- Take a $(M, g)$ and consider the space of monogenic spinor fields $\mathcal{M}(M) \subset C(M; \mathcal{G}^+)$. (Think of these as a higher dimensional generalization of complex holomorphic functions.)
- Construct subsurface spinors which are commutative unital Banach algebras inside of $\mathcal{M}(M)$.
- Define their dual, the spinor currents $$ C_{\mathcal{G}^+}(M;\mathcal{G})' = \{ T\colon C(M;\mathcal{G} \to \mathcal{G} ~\vert~ T\textrm{ is continuous}\} $$ which are $\mathcal{G}^+$-dual to $C(M;\mathcal{G}^+)$ (these are $\mathcal{G}^+$-valued Radon measures of spinor fields on $M$).
- Define the spinor spectrum $\mathfrak{M}(M) \subset C_{\mathcal{G}^+}(M;\mathcal{G})'$ to (essentially) be algebra morphisms on each the subsurface spinor subalgebras of $\mathcal{M}(M)$. Call the elements $\delta \in \mathfrak{M}(M)$ spin characters.
$\star$ After this long winded effort and some essential lemmas, we prove:
Theorem: For $M$ a compact submanifold of $\R^n$, then the space $\mathfrak{M}(M)$ with the weak-$\ast$ topology is homeomorphic to $M$.
What is the point here? The point is that through a sufficiently rich algebraic structure of functions and functors defined on a space, we can recover the space itself. This is in fact a very general idea in mathematics and is highlighted by the Yoneda lemma in category theory. More on this another time.
At the very end of my Ph.D, I came across sheaves. As it turned out, another proof technique for the Calderón problem was to use sheaf theory but it required the manifold be real-analytic (which is highly restrictive).
Something was nagging at me though. It seemed that the two approaches were the same in dimension two, but diverged in higher dimensions. The BC method was less restrictive on smoothness but failed to reach higher dimensions, while the sheaf theory method was more restrictive on smoothness but could reach higher dimensions. Yet, in dimension two they both essentially relied on this same underlying principals of algebras/continuation/uniqueness. Namely, the BC method relied on isomorphisms between boundary values of holomorphic functions to an internal holomorphic function algebra, from which we can yield a isometry of $(M,g)$ whereas the sheaf theory method relied on the unique continuation of analytic functions to build up a Hausdorff sheaf which identified points of $M$ and used a mapping from the sheaf into open sets of $\R^n$ to yield a isometry of $g$.
What was the relationship here? Is there something worth investigating in the relationship between these two methods? I think so.
Codes are a means of providing redundancy and consistency checks in data transmission. What I am finding interesting is how codes are related to algebraic geometry and how they pop their head up in other protocols. For example, STARKs use Reed-Solomon codes in FRI, SNARKs can use linear codes, and sumcheck uses multilinear encoding. Here’s a bit on both STARKs and SNARKs.
We can quickly write down the Reed-Solomon code process. Firstlet $\mathbb{F}_q$ be a finite field and let $m=\{m_0, m_1, \ldots, m_{n-1}\}$ be the message we want to encode written in terms of elements of the field (if written in bits, just combine into field elements). Then we define a polynomial using this message: $$ p_m(a)= \sum_{i=0}^{n-1} m_i a^i $$ If we then take $\{a_0, a_1, \ldots, a_{n-1}\} \in \mathbb{F}_q$ to be distinct points in the finite field, we can evaluate the polynomial at these points to get the codeword: $$ \{p_m(a_0), p_m(a_1), \ldots, p_m(a_{n-1})\}. $$ That’s the encoding process.
We decode using Lagrange interpolation and, since this degree $k-1$ polynomial is uniquely determined by $k$ points, we can recover the message from the codeword even when $n-k$ elements are missing.
Kicking this up a notch, we can look at Reed-Muller codes which, for all intents and purposes, seem to take Reed-Solomon codes but extend the definition into multivariate polynomials. We encode a message the same way as Reed-Solomon as we can define a Reed-Muller code from a message $m$ by a multivariate polynomial $p_m(x_1, x_2, \ldots, x_r)$ and then evaluating at points in $\mathbb{F}_q^r$.
Where does the algebra or algebraic geometry come in? I started reading this piece, Algebraic Geometry Codes which begins quickly by mentioning this: We take a geometric object $\mathcal{X}$ which is an algebraic variety (zeros of a set of polynomials), get a collection of rational points $\mathcal{P} = \{P_1, P_2, \ldots, P_n\}$ on this variety (lie in $\mathbb{F}_q$), a set of functions $L=\{ f \colon \mathcal{X} \to \mathbb{F}_q\}$, and finally an evaluation map $\operatorname{ev}_\mathcal{P} \colon L \to \mathbb{F}_q^n$ so that $$\operatorname{ev}_\mathcal{P}(f) = \{f(P_1), f(P_2), \ldots, f(P_n)\}.$$
We can see that Reed-Solomon and Reed-Muller codes are examples of algebraic geometry codes. In particular, Reed-Solomon takes $\mathcal{X}$ to be the affine-line over $\mathbb{F}_q$ and Reed-Muller takes $\mathcal{X}$ to be the $r$-dimensional affine space over $\mathbb{F}_q$.
There’s a lot of beauty in algebraic varieties, and the paper goes on to describe how the genus of these varieties relates to curves. Further, it also gets into Riemann-Roch which I’d be interested in seeing how it relates to codes.
I found this piece today called motivation for sheaves, cohomology and higher stacks which was really an interesting read after spending some time in Sheaf Theory through Examples.
The central idea here is that we often want to try to understand a space by examining it’s relationship with other spaces. A classic example is the study of a topological space $X$ by looking at the continuous functions from spheres $S^n$ into $X$. Amazingly, this set of continuous mappings $S^n \to X$ with the right notion of equivalence forms a group called the $n$th homotopy group $\pi_n(X)$ which is a topological invariant of $X$. Point is, we can start to see similarities in distinct spaces by looking at how they interact with other spaces.
Sheaves are a generalization of this idea. Succinctly, sheaves are collections of probes that we can use to study a space and they can be used to generalize our notion of a space.
Take a space $U$, then we think of the set of all the ways to map $U$ into $X$ as the set of probes of $X$ by $U$, which we call $X(U)$. In order to really understand $X$, we need to understand how it interacts with all possible probes. For instance, I may also probe $X$ with $V$, and if we have $f \colon U \to V$, then the difference of the probes $X(U)$ and $X(V)$ should be related by the map $f$.
- Consistency is guaranteed by $\operatorname{Id}_U \colon U \to U$ yields a $\operatorname{Id}_{X(U)} \colon X(U) \to X(U)$.
- Composability is guaranteed by $f \colon U \to V$ and $g \colon V \to W$, then $X(U) \to X(W)$ is the same as $X(U) \to X(V) \to X(W)$.
These just show our $X(\_)$ is a functor from the category of spaces to the category of sets – a presheaf.
For our probes to make sense, we have to have the ability to construct probes from other probes. For instance, if $V_1 \cup V_2 = U$ with $V_1 \cap V_2 \neq \emptyset$, then we should be able to construct a probe in $X(U)$ from probes in $X(V_1)$ along with $X(V_2)$ so long as the probe $p_1 \in X(V_1)$ and $p_2 \in X(V_2)$ agree on the overlap $p_1 = P_2 \in X(V_1 \cap V_2)$. Expressed purely categorically, this condition is that the parallel morphisms (individual restrictions) $X(V_1) \times X(V_2) \to X(V_1 \cap V_2)$ is the set of matching probes in $X(V_1)$ and $X(V_2)$. This final condition yields the locality and gluing conditions for a sheaf if this is true for all $U$ and $V_1, V_2$.
So sheaves encapsulate the notion of being able to analyze a space with another space. Taking this perspective, it’s pretty reasonable to see why sheaves can have a use in something like signal processing. If your signals live on a space (which they do), they all the ways of analyzing that signal lie inside of all the ways of probing the space!
Encryption is a mapping a space of messages $\mathcal{M}$ into a space of ciphertexts $\mathcal{C}$ using a set of ciphers $\mathcal{T}$. To decrypt, we need a set of keys $\mathcal{K}$. So we have for for $C \in \mathcal{C}$ and $K \in \mathcal{K}$: $$ C : \mathcal{M} \to \mathcal{C} $$ $$ K : \mathcal{C} \to \mathcal{M} $$
When do we deem that an encryption scheme gives us perfect secrecy?
Let’s think about this intuitively first. Assuming a cryptanalyst has access to a collection of ciphertexts $\{C_1, C_2, \dots, C_n\}$ and knows the encryption system being used. That is, they know the set of ciphers $\mathcal{T}$ and the set of keys $\mathcal{K}$ as well as the probabilities they are used. What can they learn about the original messages $\mathcal{M}$ given this information? If the encryption scheme is perfectly secret, then the cryptanalyst should not be able to learn anything about the original messages without any other information entering their mind.
Said another way, an encryption scheme is perfectly secret if the probability of the cryptanalyst determining any message $M$ does not increase no matter how many ciphertexts $C$ they have received. Therefore, determining any $M$ given a list of ciphertexts $C$ has the same likelihood as determining $M$ without having received any ciphertexts whatsoever. The cryptanalyst’s best guess is to decipher by brute force using the keys at their disposal.
We can formulate this mathematically. The probability of $M$ being the message yielding the ciphertext $C$ is $P(M|C)$. Using Bayes’ theorem, we have: $$ P(M|C) = \frac{P(C|M)P(M)}{P(C)}. $$ If the encryption scheme is perfectly secret, then we gain no information by observing ciphertexts and hence the probability of getting a ciphertext $C$ given a message $M$ is just that of knowing the message $M$ outright. That is: $$ P(C|M) = P(C). $$
Two facts follow from the above:
- The number of ciphertexts $|\mathcal{C}|$ must be at least as large as the number of messages $|\mathcal{M}|$.
- The number of possible keys $|\mathcal{K}|$ must be at least as large as the number of messages $|\mathcal{M}|$.
■
■
I am rather new to cryptography, so I wanted to try to prove these results myself. I’m not completely sure these proofs are valid and that I have everything right here, but it did seem like it was along the right track. Please let me know if you see any errors or have any suggestions for improvement.
The universe, as far as we know, is a pseudo-Riemannian manifold à-la general relativity. In this context, autoparallel fields are those that do not twist or turn with respect to a connection. This is a broad generalization of Newton’s first law for curved spaces. In particular, autoparallel fields $V$ satisfy the autoparallel equation: $$ \nabla_{V}V = 0. $$ Above, we are taking a directional derivative (really, covariant derivative) of $V$ along $V$ and setting it to zero. Succinctly, we can think: “$V$ does not change in length or direction when you move in the direction of $V$.”
We usually draw pictures of curves $\gamma$ on a manifold $M$ to illustrate the concept:
We also draw pictures of fields $V$ on manifolds like this:
What is the intuition? Autoparallel fields are, in a way, the lowest energy configurations of a system. If we take a step back and look at the autoparallel equation where $V=\dot{\gamma}$ is the tangent vector field to a particle following a curve $\gamma$, it can be unraveled into Newton’s first law (and you can generalize to get Newton’s second law). By the way $\dot{\gamma}$ is the velocity of the particle.
So, autoparallel $\gamma$ moves at constant velocity as it is not being acted on by an external force. Autoparallel fields are those that do not accelerate or decelerate in the direction of the field itself, i.e., they have no force acting on them – hence they are in a low energy configuration. Further, the only reason an autoparallel curve $\gamma$ (or field $V$) may appear to bend, twist, or lengthen, is purely due to the curvature of the space it is living in.
Example (flat space): If we take the case where we have a curve $\gamma$ whose on a flat manifold such as the plane $\R^2$, then I claim $\nabla_{\dot{\gamma}}\dot{\gamma} = 0$ is $\gamma$ moving at a constant velocity – so it does not have rotational acceleration (i.e., it does not twist or turn and there is no torque applied) and it also does not have linear acceleration (i.e., it does not start to move more quickly or slow down).
Below is an example of a curve that is not autoparallel:
Notice, due to the bend in the curve, the direction of $\dot{\gamma}$ has to change from point to point. For example, from $\dot{\gamma}(t)$ to $\dot{\gamma}(t’)$ there is a change in direction. You can think of this change in direction is due to the force applied to the curve $\gamma$, and this force just so happens to be $\nabla_{\dot{\gamma}}\dot{\gamma}$. If we add up the $\nabla_{\dot{\gamma}}\dot{\gamma}$ at each point in time $t$ to $t’$, we will end up with the new velocity $\dot{\gamma}(t’)$. Furthermore, if we add the magnitude of $\nabla_{\dot{\gamma}}\dot{\gamma}$ at every point along the curve, we get a sort of “tension” energy, which is the energy of the configuration of the curve $\gamma$ (or field) I mentioned earlier. This can, at the least, be zero. Hence, autoparallels are the lowest energy configurations of a system.
You can do this all much more generally, you could also take a multivector field (and therefore spinor field) $F$ and claim it is parallel along $V$ if: $$ \nabla_{V}F = 0. $$ This is useful if you need to be able to transport multivectors along a manifold. I used this technique to be able to define subsurface spinor fields in some of my personal work.
Taking a look at paths of paths we could see that there is a geometry invoked by thinking of functions of functions. Considering categories which consist purely of morphisms (functions), we can start to build up their geometric structure.
Take a category with objects $A$ and $B$. Functions/morphisms between these objects are arrows $f_1$ and $f_2$. Morphisms or paths between these functions can sometimes be built as well, call them $H_1$ and $H_2$. We can continue this process and have morphisms between $H_1$ and $H_2$, call it $\Phi$. We are building up a complex of morphisms and we can draw a diagram:
We can think of these mappings as a means of attaching cells together. In particular, there are $0$-cells given by the identity maps $\operatorname{Id}_A$ and $\operatorname{Id}_B$ which you can just picture as points located at $A$ and $B$ respectively. The $1$-cells are the functions $f_1$ and $f_2$ which are just edges connecting the $0$-cells. The $2$-cells are the higher order functions $H_1$ and $H_2$ which are just disks glued together at the $1$-cells. The $3$-cells are the higher order functions $\Phi$ which are just balls glued together at the $2$-cells. We can explode this diagram to see this more clearly:
We have duplicated $A$ and $B$, but we did so by having identity mappings be the arrows between them, which means these are all actually collapsed to a point. I told you this was the exploded view! The single arrows form a 2-dimensional square on all six sides of this cubical diagram. Finally, the 3-cell is the hashed arrow (the $\Phi$ morphism that mapped $H_1$ to $H_2$) which gives this cube a 3-dimensional volume.
What’s beautiful about this is that we can think of this as a topological space and moreover start to study categories, or categories of categories, as topological spaces. This is conceptually the idea behind the Grothendieck topology. Something I would dearly love to be able to construct in Lean 4 or in Rust (or possibly Haskell), to some extent.
Rust allows for type
and const
generics which can nearly be used to create something close to inductive types.
A struct we might want to define may have fields that depend on a predecessor or successor value of the current const
associated to it.
For instance, we may want to have $k$-cells attach to $(k-1)$-cells to build a cell complex.
We want to enforce at compile time that we indeed map a specific type Cell<K>
to a Cell<K-1>
as opposed to a runtime check like:
struct Cell {
dim: usize,
attachments: Vec<Cell>
}
impl Cell {
fn attach(&mut self, other: &Cell) {
if self.dim - 1 == other.dim {
self.attachments.push(other);
} else {
panic!();
}
}
}
fn main() {
let k = 2;
let k_cell: Cell = Cell::new(k);
let k_minus_one_cell: Cell = Cell::new(k - 1);
k_cell.attach(k_minus_one_cell); // passes
k_cell.attach(k_cell); // panics
}
The runtime check is clunky and it should be possible to do this at compile time.
Let’s try to define this as so, with code that may not compile (especially without #![feature(generic_const_exprs)]
):
const fn predecessor(K: usize) -> usize
where K > 0
{
K - 1
}
struct Cell<const K: usize>
where K > 0
{
attachments: Vec<Cell<predecessor(K)>>
}
struct Cell<const K: usize>
where K == 0 {
attachments: (),
}
impl Cell<const K: usize>
where K > 0
{
fn attach(&mut self, other: &Cell<predecessor(K)>) {
self.attachments.push(other);
}
}
Though the above code does not compile, it shows the direction we want to go in.
Specifically, we can parameterize types via the (almost) natural numbers and enforce relationships between them at compile time.
We see our base case for our “inductive type” Cell<0>
is defined separately from the inductive case Cell<K>
.
Our base case yields constraints, for instance, trying:
let 0cell: Cell<0> = Cell::new();
0cell.attach(_); // will not compile for any input
will not compile because .attach()
is not even defined for Cell<0>
, as it should not be.
Actually, .attach()
is what you get when you apply the “cell functor” to the predecessor()
function that defines the inductive step.
We can actually do something similar if you check out this video: Type Theory for the Working Rustacean. I’ll summarize the idea here in code:
trait Nat {}
struct Zero {}
impl Nat for Zero {}
struct Succ<N: Nat> {
_predecessor: PhantomData<N>
}
impl<N: Nat> Nat for Succ<N> {}
With the above trait setup, we have essentially defined a natural number type. For instance, if we instead attached $k$-cells to $(k+1)$-cells, we could define a struct like:
struct OpCell<K: Nat> {
attachments: Vec<OpCell<Succ<K>>>,
}
impl<K: Nat> OpCell<K> {
fn attach(&mut self, other: &OpCell<Succ<K>>) {
self.attachments.push(other);
}
}
Some problems here:
- It is not obvious how to use this for the previous case where $k$-cells attach to $(k-1)$-cells.
- Managing these types is terrible.
Imagine having elements such as where we have that natural $k=8$.
Then it would be of type:
OpCell<Succ<Succ<Succ<Succ<Succ<Succ<Succ<Succ<Zero>>>>>>>>>
.
The Curry-Howard correspondence was something I found out about too late. In essence, it shows that mathematical proofs are equivalent to computer programs. Since then, I’ve been curious about type theory and what it means to use a theorem prover or type checker. In particular, I want to use these tools to write formally verified software.
Curry-Howard gives us the following diagram of equivalences:
There’s a nice relationship between higher order functions and higher homotopy theory. Let’s develop a mental picture for what we mean by “paths of paths”.
First, we can consider a path as a (continuous) function from the unit interval to a topological space $X$. We’ll take two paths from point $a \in X$ to $b\in X$: $$ f,g: [0, 1] \to X. $$ Specifically, $f(0)=g(0)=a$ and $f(1)=g(1)=b$. Figure of paths in $X$ below.
Let’s consolidate notation and put $\Hom([0,1], X)_{a,b}$ to denote all paths from $a$ to $b$ in $X$.
From this, we can now consider a path of paths as a (continuous) function: $$ H: [0,1] \to \Hom([0,1], X)_{a,b}. $$ This $H$ can be taken such that $H(0)=f$ and $H(1)=g$. Diagramatically:
It is not true that such an $H$ exists for all $f,g$, and $X$. The reason is quite geometric and satisfyingly, the diagram above is a good way to see why.
Imagine $H$ performs the action of “dragging” the $f$ arrow to the $g$ arrow. If there were a “singularity” in the diagram, then $H$ would have to pull the arrow across the singularity, which it cannot do.
Amazingly, this becomes concrete if we consider that:
$$\Hom([0,1],\Hom([0,1], X)_{a,b}) \cong \Hom( [0,1]\times[0,1], X)$$ where $\cong$ denotes an equivalence, and $\Hom([0,1] \times [0,1], X)$ is the set of functions that map the unit square $[0,1] \times [0,1]$ into $X$ such that the boundary of the square is glued to $f$ and $g$. (We just discovered a functioral property of this $\Hom$ item.)
By this equivalence, we can now see this as mapping $[0,1]\times[0,1]$ to $X$ which is a map of the unit square into $X$. See below.
Visually, we can see that if we were to “puncture” $X$ between our paths $f$ and $g$, then the image of $H([0,1])$ (the shaded pink region) would be torn. For instance, we can add a void by cutting out a hole in $X$ between $f$ and $g$ shown with the hashed added region below: The tearing is a visual representation of the fact that no such $H$ exists for a given space.
All of falls under the notion of homotopy in topology. This is the same idea of homotopy in homotopy type theory. In this theory, we can consider types (for which there are function types) and equality between these types as paths. I’m trying to get to a more formal understanding of this, but it’s a bit of a journey.