commit 9070fddd10448ed35fadde4078469b2522a46319
parent d89495bab1c1645e9ddb43a8e2cc1e8143b7fb07
Author: MikoĊaj Lenczewski <mblenczewski@gmail.com>
Date: Mon, 20 Jan 2025 15:30:38 +0000
Backdate original posts
Diffstat:
4 files changed, 64 insertions(+), 81 deletions(-)
diff --git a/date.sh b/date.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+
+set -ex
+
+touch -d 20200910 src/blog/test.html
+touch -d 20210514 src/blog/cdoku.html
+touch -d 20201112 src/blog/pyecs.html
+touch -d 20201010 src/blog/clfs.html
diff --git a/src/blog/cdoku.html b/src/blog/cdoku.html
@@ -9,8 +9,7 @@
which couldn't be completed after 3 hours of runtime:
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
. . . | . . . | . . .
. . . | . . 3 | . 8 5
. . 1 | . 2 . | . . .
@@ -22,8 +21,7 @@
5 . . | . . . | . 7 3
. . 2 | . 1 . | . . .
. . . | . 4 . | . . 9
-</code>
-</pre>
+</code></pre>
<p>
This case is pathological, because the initial row consists of the numbers
@@ -43,8 +41,7 @@
in X:
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
For example:
X = { 0, 1, 2, 3, 4, 5, 6 }
@@ -53,8 +50,7 @@ A = { 0, 1, 3 }
B = { 2, 4, 5 }
C = { 2, 6 }
D = { 6 }
-</code>
-</pre>
+</code></pre>
<p>
In an exact cover problem, you are trying to find a subset of S such that each
@@ -93,8 +89,7 @@ and our possibility set S will be the union of all numer-position pairs.
Algorithm X works as follows:
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
Let M be our possibility-constraint matrix
Let M(i,j) denote the value in column i, row j
@@ -107,8 +102,7 @@ Let M(i,j) denote the value in column i, row j
Remove row i from the matrix
Remove column j from the matrix
6. Recurse on the new, reduced matrix M
-</code>
-</pre>
+</code></pre>
<p>
Knuth recommends picking the column with the fewest 1s in step 2. Although any
@@ -117,8 +111,7 @@ Let M(i,j) denote the value in column i, row j
our previous example problem, heres how Algorithm X would run through the problem:
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
X = { 0, 1, 2, 3, 4, 5, 6 }
S = { A, B, C, D }
@@ -183,8 +176,7 @@ M:
19. M has no columns, return true
Therefore, S = { A, B, D}
-</code>
-</pre>
+</code></pre>
<p>
As we can see, the algorithm is quite efficient. We didn't have to make many
@@ -206,8 +198,7 @@ Therefore, S = { A, B, D}
An example of what the linked-list based matrix would look like for our example:
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
"Dense" Matrix:
0 1 2 3 4 5 6
A [ 1 1 0 1 0 0 0 ]
@@ -222,8 +213,7 @@ A | A0 <> A1 <------> A3
B | B2 <------> B4 <> B5
C | C2 <------------------> C6
D | D6
-</code>
-</pre>
+</code></pre>
<p>
All nodes in a column are doubly-linked to the elements above them and below
@@ -250,15 +240,13 @@ D | D6
Splicing an element out of a linked list:
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
A <---> B <---> C
After splicing out B:
A <---> C
\_ B _/
-</code>
-</pre>
+</code></pre>
<p>
Here, B has retained its previous pointer to A and its next pointer to C. It can
@@ -267,18 +255,17 @@ A <---> C
itself (B):
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
A <---> C
\_ B _/
After splicing in B:
A <---> B <---> C
-</code>
-</pre>
+</code></pre>
<h3>Implementation</h3>
<p>
Although this post has mostly been on Algorithm X, and hasn't shown any code,
- my implementation (in C) can be found <a href="https://github.com/mblenczewski/cdoku">here</a>.
+ my implementation (in C) can be found
+ <span><a href="https://git.lenczewski.org/cdoku/log.html">here</a></span>.
</p>
diff --git a/src/blog/clfs.html b/src/blog/clfs.html
@@ -18,16 +18,14 @@
correct sequence of operations:
</p>
-<pre>
- <code class="language-text">
+<pre><code class="language-text">
1) Build barebones binutils for your chosen arch [no libc]
2) Build barebones gcc for your chosen arch [no libc]
3) Compile your libc for your chosen arch
3.1) Compile your libc++ for your chosen arch
4) Recompile binutils for your chosen arch
5) Recompile gcc for your chosen arch
- </code>
-</pre>
+</code></pre>
<aside>Step 3.1 is optional if you don't want to support C++ in your cross-gcc</aside>
@@ -59,5 +57,5 @@
a wall. After hopping onto irc and asking for help, these issues were
eventually resolved and I managed to continue with the project. The source
code for the build script can be found
- <span><a href="https://github.com/mblenczewski/clfs-bootstrap">here</a></span>.
+ <span><a href="https://git.lenczewski.org/clfs/log.html">here</a></span>.
</p>
diff --git a/src/blog/pyecs.html b/src/blog/pyecs.html
@@ -62,18 +62,16 @@
"purity" and simplicity reasons.
</p>
-<pre>
-<code class="language-c">
+<pre><code class="language-c">
typedef unsigned long entity_t; // as simple as a uint64
typedef struct Entity {
- unsigned long id;
- unsigned long flags;
- unsigned long tags;
- // ...
+ unsigned long id;
+ unsigned long flags;
+ unsigned long tags;
+ // ...
} Entity_t; // or something more complex
-</code>
-</pre>
+</code></pre>
<h3>Components</h3>
<p>
@@ -101,8 +99,7 @@ typedef struct Entity {
from an intel pdf are as follows:
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
Data Source Latency
L3 Cache hit, line unshared between cores ~40 cycles
L3 Cache hit, line shared readonly between cores ~65 cycles
@@ -110,8 +107,7 @@ L3 Cache hit, line shared readwrite between cores ~75 cycles
L3 Cache hit, from other core ~100-300 cycles
Local DRAM ~60ns
Remote DRAM ~100ns
-</code>
-</pre>
+</code></pre>
<p>
For a CPU clocked at 2GHz (slower than modern CPUs), each cycle takes half a
@@ -128,55 +124,53 @@ Remote DRAM ~100ns
Getting back to why components are a thing, with an example:
</p>
-<pre>
-<code class="language-c">
+<pre><code class="language-c">
// clean separation into multiple discrete components
typedef struct Transform {
- double px, py, pz;
+ double px, py, pz;
} Transform_t;
typedef struct Velocity {
- double vx, vy, vz;
+ double vx, vy, vz;
} Velocity_t;
typedef struct Mesh {
- void *mesh_data;
+ void *mesh_data;
} Mesh_t;
// chucking it all into one mixer and pressing "blend"
typedef struct State {
- Transform transform;
- Velocity velocity;
- Mesh mesh;
+ Transform transform;
+ Velocity velocity;
+ Mesh mesh;
} State_t;
// ...
void main(void) {
- // this approach will have worse cacheline efficiency
- State game_objects[] = { ... };
+ // this approach will have worse cacheline efficiency
+ State game_objects[] = { ... };
- for (int i = 0; i < ARRLEN(game_objects); i++) {
- calculate_physics(game_objects[i]);
- render_object(game_objects[i]);
- }
+ for (int i = 0; i < ARRLEN(game_objects); i++) {
+ calculate_physics(game_objects[i]);
+ render_object(game_objects[i]);
+ }
- // this approach will have better cacheline efficiency
- Entity entities[] = { ... };
- Transform transforms[] = { ... };
- Velocity velocities[] = { ... };
- Mesh meshes[] = { ... };
+ // this approach will have better cacheline efficiency
+ Entity entities[] = { ... };
+ Transform transforms[] = { ... };
+ Velocity velocities[] = { ... };
+ Mesh meshes[] = { ... };
- for (int i = 0; i < ARRLEN(entities); i++) {
- process_physics(transforms[i], velocities[i]);
- }
+ for (int i = 0; i < ARRLEN(entities); i++) {
+ process_physics(transforms[i], velocities[i]);
+ }
- for (int i = 0; i < ARRLEN(entities); i++) {
- render(transforms[i], meshes[i]);
- }
+ for (int i = 0; i < ARRLEN(entities); i++) {
+ render(transforms[i], meshes[i]);
+ }
}
-</code>
-</pre>
+</code></pre>
<p>
Why is the second approach better? Don't we iterate twice compared to once? The
@@ -189,8 +183,7 @@ void main(void) {
for Transform, Velocity, and Mesh. The dashes show a simplified cacheline.
</p>
-<pre>
-<code class="language-text">
+<pre><code class="language-text">
Mixed State:
| x0 y0 z0 x1 | y1 z1 x2 y2 | z2 x3 y3 z3 | x4 y4 z4 |
| ----------- | ----------- | ----------- | ----------- |
@@ -201,8 +194,7 @@ Components:
| x0 z0 x1 z1 | x2 z2 x3 z3 | x4 z4 |
| ----------- | ----------- | ----------- |
-</code>
-</pre>
+</code></pre>
<p>
Not only does processing the mixed state do worse on the instruction cache, it
@@ -253,16 +245,14 @@ Components:
as discussed earlier.
</p>
-<pre>
-<code class="language-python">
+<pre><code class="language-python">
class RenderSystem:
def actions():
return {
self.process_active: (Transform2D, ScreenElement),
self.process_stale: (StaleTag,),
}
-</code>
-</pre>
+</code></pre>
<p>
This is the approach I took, where each System has an <code>actions()</code>