website

website.git
git clone git://git.lenczewski.org/website.git
Log | Files | Refs

commit 9070fddd10448ed35fadde4078469b2522a46319
parent d89495bab1c1645e9ddb43a8e2cc1e8143b7fb07
Author: MikoĊ‚aj Lenczewski <mblenczewski@gmail.com>
Date:   Mon, 20 Jan 2025 15:30:38 +0000

Backdate original posts

Diffstat:
Adate.sh | 8++++++++
Msrc/blog/cdoku.html | 45++++++++++++++++-----------------------------
Msrc/blog/clfs.html | 8+++-----
Msrc/blog/pyecs.html | 84+++++++++++++++++++++++++++++++++++--------------------------------------------
4 files changed, 64 insertions(+), 81 deletions(-)

diff --git a/date.sh b/date.sh @@ -0,0 +1,8 @@ +#!/bin/sh + +set -ex + +touch -d 20200910 src/blog/test.html +touch -d 20210514 src/blog/cdoku.html +touch -d 20201112 src/blog/pyecs.html +touch -d 20201010 src/blog/clfs.html diff --git a/src/blog/cdoku.html b/src/blog/cdoku.html @@ -9,8 +9,7 @@ which couldn&#39;t be completed after 3 hours of runtime: </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> . . . | . . . | . . . . . . | . . 3 | . 8 5 . . 1 | . 2 . | . . . @@ -22,8 +21,7 @@ 5 . . | . . . | . 7 3 . . 2 | . 1 . | . . . . . . | . 4 . | . . 9 -</code> -</pre> +</code></pre> <p> This case is pathological, because the initial row consists of the numbers @@ -43,8 +41,7 @@ in X: </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> For example: X = { 0, 1, 2, 3, 4, 5, 6 } @@ -53,8 +50,7 @@ A = { 0, 1, 3 } B = { 2, 4, 5 } C = { 2, 6 } D = { 6 } -</code> -</pre> +</code></pre> <p> In an exact cover problem, you are trying to find a subset of S such that each @@ -93,8 +89,7 @@ and our possibility set S will be the union of all numer-position pairs. Algorithm X works as follows: </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> Let M be our possibility-constraint matrix Let M(i,j) denote the value in column i, row j @@ -107,8 +102,7 @@ Let M(i,j) denote the value in column i, row j Remove row i from the matrix Remove column j from the matrix 6. Recurse on the new, reduced matrix M -</code> -</pre> +</code></pre> <p> Knuth recommends picking the column with the fewest 1s in step 2. Although any @@ -117,8 +111,7 @@ Let M(i,j) denote the value in column i, row j our previous example problem, heres how Algorithm X would run through the problem: </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> X = { 0, 1, 2, 3, 4, 5, 6 } S = { A, B, C, D } @@ -183,8 +176,7 @@ M: 19. M has no columns, return true Therefore, S = { A, B, D} -</code> -</pre> +</code></pre> <p> As we can see, the algorithm is quite efficient. We didn&#39;t have to make many @@ -206,8 +198,7 @@ Therefore, S = { A, B, D} An example of what the linked-list based matrix would look like for our example: </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> &quot;Dense&quot; Matrix: 0 1 2 3 4 5 6 A [ 1 1 0 1 0 0 0 ] @@ -222,8 +213,7 @@ A | A0 &lt;&gt; A1 &lt;------&gt; A3 B | B2 &lt;------&gt; B4 &lt;&gt; B5 C | C2 &lt;------------------&gt; C6 D | D6 -</code> -</pre> +</code></pre> <p> All nodes in a column are doubly-linked to the elements above them and below @@ -250,15 +240,13 @@ D | D6 Splicing an element out of a linked list: </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> A &lt;---&gt; B &lt;---&gt; C After splicing out B: A &lt;---&gt; C \_ B _/ -</code> -</pre> +</code></pre> <p> Here, B has retained its previous pointer to A and its next pointer to C. It can @@ -267,18 +255,17 @@ A &lt;---&gt; C itself (B): </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> A &lt;---&gt; C \_ B _/ After splicing in B: A &lt;---&gt; B &lt;---&gt; C -</code> -</pre> +</code></pre> <h3>Implementation</h3> <p> Although this post has mostly been on Algorithm X, and hasn&#39;t shown any code, - my implementation (in C) can be found <a href="https://github.com/mblenczewski/cdoku">here</a>. + my implementation (in C) can be found + <span><a href="https://git.lenczewski.org/cdoku/log.html">here</a></span>. </p> diff --git a/src/blog/clfs.html b/src/blog/clfs.html @@ -18,16 +18,14 @@ correct sequence of operations: </p> -<pre> - <code class="language-text"> +<pre><code class="language-text"> 1) Build barebones binutils for your chosen arch [no libc] 2) Build barebones gcc for your chosen arch [no libc] 3) Compile your libc for your chosen arch 3.1) Compile your libc++ for your chosen arch 4) Recompile binutils for your chosen arch 5) Recompile gcc for your chosen arch - </code> -</pre> +</code></pre> <aside>Step 3.1 is optional if you don't want to support C++ in your cross-gcc</aside> @@ -59,5 +57,5 @@ a wall. After hopping onto irc and asking for help, these issues were eventually resolved and I managed to continue with the project. The source code for the build script can be found - <span><a href="https://github.com/mblenczewski/clfs-bootstrap">here</a></span>. + <span><a href="https://git.lenczewski.org/clfs/log.html">here</a></span>. </p> diff --git a/src/blog/pyecs.html b/src/blog/pyecs.html @@ -62,18 +62,16 @@ "purity" and simplicity reasons. </p> -<pre> -<code class="language-c"> +<pre><code class="language-c"> typedef unsigned long entity_t; // as simple as a uint64 typedef struct Entity { - unsigned long id; - unsigned long flags; - unsigned long tags; - // ... + unsigned long id; + unsigned long flags; + unsigned long tags; + // ... } Entity_t; // or something more complex -</code> -</pre> +</code></pre> <h3>Components</h3> <p> @@ -101,8 +99,7 @@ typedef struct Entity { from an intel pdf are as follows: </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> Data Source Latency L3 Cache hit, line unshared between cores ~40 cycles L3 Cache hit, line shared readonly between cores ~65 cycles @@ -110,8 +107,7 @@ L3 Cache hit, line shared readwrite between cores ~75 cycles L3 Cache hit, from other core ~100-300 cycles Local DRAM ~60ns Remote DRAM ~100ns -</code> -</pre> +</code></pre> <p> For a CPU clocked at 2GHz (slower than modern CPUs), each cycle takes half a @@ -128,55 +124,53 @@ Remote DRAM ~100ns Getting back to why components are a thing, with an example: </p> -<pre> -<code class="language-c"> +<pre><code class="language-c"> // clean separation into multiple discrete components typedef struct Transform { - double px, py, pz; + double px, py, pz; } Transform_t; typedef struct Velocity { - double vx, vy, vz; + double vx, vy, vz; } Velocity_t; typedef struct Mesh { - void *mesh_data; + void *mesh_data; } Mesh_t; // chucking it all into one mixer and pressing "blend" typedef struct State { - Transform transform; - Velocity velocity; - Mesh mesh; + Transform transform; + Velocity velocity; + Mesh mesh; } State_t; // ... void main(void) { - // this approach will have worse cacheline efficiency - State game_objects[] = { ... }; + // this approach will have worse cacheline efficiency + State game_objects[] = { ... }; - for (int i = 0; i < ARRLEN(game_objects); i++) { - calculate_physics(game_objects[i]); - render_object(game_objects[i]); - } + for (int i = 0; i < ARRLEN(game_objects); i++) { + calculate_physics(game_objects[i]); + render_object(game_objects[i]); + } - // this approach will have better cacheline efficiency - Entity entities[] = { ... }; - Transform transforms[] = { ... }; - Velocity velocities[] = { ... }; - Mesh meshes[] = { ... }; + // this approach will have better cacheline efficiency + Entity entities[] = { ... }; + Transform transforms[] = { ... }; + Velocity velocities[] = { ... }; + Mesh meshes[] = { ... }; - for (int i = 0; i < ARRLEN(entities); i++) { - process_physics(transforms[i], velocities[i]); - } + for (int i = 0; i < ARRLEN(entities); i++) { + process_physics(transforms[i], velocities[i]); + } - for (int i = 0; i < ARRLEN(entities); i++) { - render(transforms[i], meshes[i]); - } + for (int i = 0; i < ARRLEN(entities); i++) { + render(transforms[i], meshes[i]); + } } -</code> -</pre> +</code></pre> <p> Why is the second approach better? Don't we iterate twice compared to once? The @@ -189,8 +183,7 @@ void main(void) { for Transform, Velocity, and Mesh. The dashes show a simplified cacheline. </p> -<pre> -<code class="language-text"> +<pre><code class="language-text"> Mixed State: | x0 y0 z0 x1 | y1 z1 x2 y2 | z2 x3 y3 z3 | x4 y4 z4 | | ----------- | ----------- | ----------- | ----------- | @@ -201,8 +194,7 @@ Components: | x0 z0 x1 z1 | x2 z2 x3 z3 | x4 z4 | | ----------- | ----------- | ----------- | -</code> -</pre> +</code></pre> <p> Not only does processing the mixed state do worse on the instruction cache, it @@ -253,16 +245,14 @@ Components: as discussed earlier. </p> -<pre> -<code class="language-python"> +<pre><code class="language-python"> class RenderSystem: def actions(): return { self.process_active: (Transform2D, ScreenElement), self.process_stale: (StaleTag,), } -</code> -</pre> +</code></pre> <p> This is the approach I took, where each System has an <code>actions()</code>