August 2024

16 August:

Progress on Data Compression in Depth:

   106     118    2777 ./main.tex
  2632   27535  184506 ./content/chapter3.tex
  3635   37047  273143 ./content/chapter5.tex
    33     321    2255 ./content/appendix.tex
    14     796    5391 ./content/foreword.tex
   484    5366   35317 ./content/chapter7.tex
  1316   11660   77651 ./content/chapter0.tex
  1488   11531   76060 ./content/chapter4.tex
  1460   11414   83695 ./content/chapter1.tex
  3021   27776  186147 ./content/chapter2.tex
  2112   15631  113221 ./content/chapter6.tex
   247    3365   22936 ./auxiliary/scratchpad.tex
 16548  152560 1063099 total

15 August:

Picture an instant messenger (e.g. Discord). Open a random channel in a STEM-related server.

“A: Hey, I need help with problem [X]. Please DM me.” “A: Hey, I need help with problem [X]. B: Sent a DM.”

Sounds familiar?

It has always puzzled me why would people rather offer help via direct messages instead of discussing the subject in public. After all, this way the answer can benefit everyone reading who may be asking themselves the same question. Also, there may be some people who will ask follow-up question, and yet another person to answer them. Perfectly logical.

Now, step back for a second. Depending on how long you have been on the internet, this may annoy you even more than some of the freshly-recruited STEM hobbyists. Since a lot of programming forums are dead, spammed with useless content or poorly moderated, it’s becoming more difficult to just look the the question online. Have you ever found yourself joining a community and immediately after searching for your problem in the message logs? When you really think about it, is it actually true that there will be many people who will benefit from the answer? After all, we can’t efficiently index Discord servers too. And what is the likelihood that two people with the exact same problem will be browsing the channel at the same time?

Someone demonstrates a simple C program along these lines:

#include <stdio.h>
int main(void) {
    char a = 0x12, b = 0xAB;
    printf("%04X %04X", a, b);
}

Contrary to what they expected, this program prints 0012 FFFFFFAB. Why does this happen? Pretty simple - so you start explaining: printf wants to print a variable of type unsigned int, but the variables you have declared are signed chars. So the signed char has to be widened to an int, but in 2s complement 0xAB is a negative number, hence to preserve the sign it’s padded with 1 bits at the front (i.e. sign extension; compared to padding with zero bits - zero extension). You suggest that marking the variable as unsigned char will solve the issue.

Before your suggested solution could even be tested, someone interjects. After all, the C standard does not guarantee the signed number representation. Further, the standard doesn’t specify whether a char is signed or unsigned, so it’s incorrect to assume that it’s signed. You are wrong.

Annoyed yet? And you have been unlucky too, because the pedantic interjection is actually technically correct. These kinds of exchanges stem from the (exceedingly common!) pedantry in educational settings coming from complete misunderstanding of didactics. Practically speaking, simple problems that have difficult explanations usually need some degree of simplification to be digestible by a beginner. Unfortunately, such simplification, for a sufficiently simple subject, leads to you technically saying things that are not entirely correct. This makes “safe” explanations really fucking difficult, if not impossible.

Finally, the flip side of this is the fact that pedantry and inability to temporarily accept axioms never actually leads to learning. Learning is a deeply unscientific process, driven by empirical experimentation, simplification and iteratively patching the voluntarily created gaps in one’s knowledge. In other words, you probably want to learn like BFS with pruning, not like DFS.

Takeaway: People would rather explain simple problems in private to avoid having to deal with idiots butting in.

June 2024

1 June:

Really happy with how my book is coming together. I spent a lot of time on it. Maybe publishing it is not as remote as I thought it will be. Also happy pride month.

May 2024

23 May:

Every mathematician believes that they are ahead of others. The reason they never state this in belief in public is because they are intelligent people. On the contrary, many programmers do state such a belief in public, because the median programmer is not intelligent.

22 May:

Typesetting my book got me crying a few times.-

20 May:

I must be the only person on the planet who eats olives straight out of a jar as a snack. I love being an independent adult who lives alone.

13 May:

German business model was based on cheap energy from russia, cheap subcontractors in eastern eu and steadily growing exports to china. all three are gone by now, but german politicians are still stuck in a world that doesn’t exist anymore. So now after the whole country has been turned into a smelly coal-burning pit thanks to fake reports about nuclear and understating the coal plant emissions by 200x, there’s no going back and germany is sooner or later going to level with eastern european countries.

8 May:

wow, Windows ACLs are real nasty to work with from within C and WinAPI.

5 May:

The mistakes i made when developing bzip3 are as follows:

  • applying LZP front-to-back (mistake shared by most codecs)
  • no end of file marker
  • RLE stage (it looked lucrative at first when i tested it on some corpora, but ultimately it’s a mistake)
  • following LZ4’s idea of a frame and block format (people get confused to hell and back when they learn that bz3_decompress uses a different format than the bzip3 CLI tool; tldr the API function knows more about the input so it can compress more efficiently, while the cli tool has to take pipe input)
  • using a strong statistical model by default without a knob to turn it off and replace it with something faster.
  • not taking advantage of block size reduction on known input sizes

April 2024

29 April:

If you are working on a project, do your best to never burn a bridge. Try to not take limiting design decisions unless you are absolutely sure of them. Document future potential for improvement, but focus on getting a MVP first.

23 April:

I came back home! A few highlights of my trip to wroclaw:

  • horrible weather. i packed short shorts, it was under 5C consistently the whole time, windy and rainy.
  • the organisers were really nice and my talk went great
  • got some free stuff from OVH, including a pizza slicer
  • got confused for a nun once. when i turned around, a guy in his early 40s started apologising.
  • got delayed for 3hrs by border control. i don’t even know what they were doing. with this amount of time they could’ve checked everyone’s rectal cavity for cocaine multiple times over.
  • hotel room was nice as always, managed to book something close to the old town & the university itself.
  • public transport was free on the day of my talk because of elections. i went on morning and afternoon walks, at some point got cold and figured that i need to catch the streetcar. then someone has politely informed me that it’s free. it’s only a shame that streetcars are not heated.
  • i explored a bit more of the city and wroclaw actually seems somewhat pretty and clean.

16 April:

Controversial stance: I really hate grug developers. Especially the species that will never learn how anything computing actually works. Like a hashmap/hashset, vtables, literally anything. They will discourage others from learning it because it’s useless, and will not learn it themselves because if you roll your own hashtable then it will be buggy.

15 April:

abysmal /əˈbɪzml/

  1. extremely bad; appalling. similar: very bad, dreadful, awful, terrible.
  2. literary: very deep. similar: profound, complete, utter, thorough.

14 April:

My life feels like a kaleidoscope. Every time it twists, it becomes completely different from how it used to be.

13 April:

One thing i feel bad about: none of my current projects that are ongoing are public so far. I feel like the stuff i don’t put on my github gets excessively little publicity. Actually publishing stuff gives me a sense of fulfillment in that i actually made something and it’s gotten to the state in which it’s usable, a state which is usually not shared with the stuff I deliberately choose to not publish. Also working on my book is a neverending process. I poured so much time into that.

“Don’t take criticism from people whose advice you would not follow” is something i should start living by. It’s difficult to tune out the noise and not be let down by it, and I should actually crack on the things that need to be done.

12 April:

Planning to spend my evening tomorrow with my friends having a board game night together. It kinda sucks that so little people even want to play board games. Almost everyone i know would rather play something on their pc. My jet lag from having been to canada for a month has still not healed. Hopefully it gets better because I have some errands to run.

6 April:

Benjamin installed more RAM in my computer, now I have 40GB. Relatedly, the amount of Turing machine approximations available on the market is rather disappointing.

5 April:

Favourite song of today: Toshiki Kadomatsu - Fly by day

4 April:

Today we ate dinner in the CN Tower restaurant. It was really good. The place where you eat spins around so you can see the whole city. Unfortunately, during our booking it was raining, so you couldn’t see much.

2 April:

Two-day long city break starts today, we plan to catch the train at 12:00 and stay there for ~2 days. Then on the 9th of April I have to go back home in Germany :(.

1 April:

We made pizza for dinner. It was all gone in ten minutes.

March 2024

23 March:

Hexdump of a 526-byte Mersenne Twister program (ELF64, takes decimal seed as input, writes output as a sequence of 4-byte LE fields). Approx. 1.5GiB/s on my machine. I wonder if I can do better.

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7800 4000 0000 0000  ..>.....x.@.....
00000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 4000 0000 0000  ....@.8...@.....
00000040: 0100 0000 0700 0000 0000 0000 0000 0000  ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000060: 0e02 0000 0000 0000 ce2b 0000 0000 0000  .........+......
00000070: 0010 0000 0000 0000 488d 358f 0100 0048  ........H.5....H
00000080: c7c2 1000 0000 0f05 31c0 4489 c10f b60c  ........1.D.....
00000090: 318d 51c6 83fa f672 0d6b c00a 41ff c001  1.Q....r.k..A...
000000a0: c883 e830 ebe4 8905 6221 0000 4831 c948  ...0....b!..H1.H
000000b0: ffc1 4c8d 0555 2100 0048 81f9 7002 0000  ..L..U!..H..p...
000000c0: 740f 69c0 b517 0000 4189 0488 48ff c1eb  t.i.....A...H...
000000d0: e866 bb70 0231 c041 b900 0000 804c 8d15  .f.p.1.A.....L..
000000e0: 2201 0000 4831 ff48 ffc7 6681 fb70 027c  "...H1.H..f..p.|
000000f0: 3f31 c948 81f9 e300 0000 743e 418b 1488  ?1.H......t>A...
00000100: 4421 ca45 8b5c 8804 4489 db81 e3ff ffff  D!.E.\..D.......
00000110: 7f09 d3d1 eb41 83e3 0143 8b14 9a41 3394  .....A...C...A3.
00000120: 8834 0600 0031 da41 8914 8848 ffc1 ebc3  .4...1.A...H....
00000130: 480f bfcb 418b 0c88 eb7f 31c9 4881 f98c  H...A.....1.H...
00000140: 0100 0074 3b41 8b94 888c 0300 0044 21ca  ...t;A.......D!.
00000150: 458b 9c88 9003 0000 4489 db81 e3ff ffff  E.......D.......
00000160: 7f09 d341 83e3 0143 8b14 9a41 3314 88d1  ...A...C...A3...
00000170: eb31 da41 8994 888c 0300 0048 ffc1 ebbc  .1.A.......H....
00000180: 8b15 442a 0000 4421 ca8b 0d7f 2000 0041  ..D*..D!.... ..A
00000190: 89cb 4181 e3ff ffff 7f41 09d3 89ca 83e2  ..A......A......
000001a0: 0141 8b14 9233 1593 2600 0041 d1eb 4431  .A...3..&..A..D1
000001b0: da89 1513 2a00 0031 dbff c389 cac1 ea0b  ....*..1........
000001c0: 31ca 89d1 c1e1 0781 e180 562c 9d31 d189  1.........V,.1..
000001d0: cac1 e20f 81e2 0000 c6ef 31ca 89d1 c1e9  ..........1.....
000001e0: 1231 d189 c289 0c96 ffc0 3d00 0800 000f  .1........=.....
000001f0: 85f5 feff ffba 0020 0000 4889 f80f 0531  ....... ..H....1
00000200: c0e9 e4fe ffff 0000 0000 dfb0 0899       ..............

20 March:

Applying any sort of entropy coding to lzw output seems like a lot of pain. Nominally you have a lot of invariants like e.g. US-ASCII packets that you could efficiently use FSE for to get a double digit CR, but the match packets (>256 + 2 for table clear and promotion) are so just random. the “simple” scheme of inserting trash into the LZW dictionary on almost every byte is moot and you end up with some sort of weird 16-bit alphabet to compress with sparse entries which occur once or twice per block. and then there’s the invariant that all the values are <= than currently seen maximum since last table clear, which UNIX “compress” seems to use for some sort of primitive entropy coding where you begin with 9-bit codes, promote all the way up to 16-bit codes and then freeze the dict and scrape it up when CR drops below delta of 1-3%. for purely technical reasons my program seems faster/stronger than unix compress but it’s a low bar. there’s nothing you can really do with this format, i can see why people ditched it in the 90s and never came back. … but now i’m reading up on some interesting schemes like LZW with 12-bit dictionaries that are selected by o0…o3 context and it clicked in my head that LZP/ROLZ are basically LZW, also looking into some other variations on LZW. probably won’t figure out a nice way to do EC on the LZW output though.

17 March:

weird: fwrite(&x,1,2,out); significantly slower than fputc(x>>8,out) and then fputc(x&0xff,out); - glibc 2.37-15.1. some afterthought: maybe not so weird after all.

14 March:

Hoping to get the very first draft of my whole book printed today. With all the changes to reducing the font and diagram size (to 9pt) it’s 110 pages now.

12 March:

USB-C charging ports = planned obsolescence.

8 March:

Copilot-generated spaghetti left unchecked does almost as much damage to a codebase as an experienced Haskell programmer.

2 March:

The reason why we patch most buffer overflow vulnerabilities is not because they’re a potential RCE. You can’t reliably exploit most of these bugs to get a RCE. The real reason why they’re fixed is that they provide surface for a DoS attack. There’s negligible difference between a heap buffer overflow leading into a segfault and a panic!("Out of bounds.").

January 2024

14 January:

Bzip3 running on my boyfriend’s PC-98 (486, 66MHz). It seems to work when the block size is set to 1MB and compresses at 22KB/s. The performance can likely be improved. bzip3 on PC-98 bzip3 on PC-98

5 January:

Feelings on GregTech New Horizons so far (got only to around midway LV, may not be representative of the modpack as a whole):

  • I generally dislike the ore chunk mechanism; it makes it either too easy to obtain resources (encouraging manually digging out the chunk once you found the one you need) or too difficult (can’t find the one chunk that you need).
  • The threshold to being able to stop manual labour and start automating things is rather offputting, given the amount of stuff that needs to be done and manually crafted to do some elementary automation with steam age tools is very time-consuming.
  • A lot of things that were familiar to me from mods (BC pipes, IC machines, RP circuits, etc) are no longer available. Me no likey.
  • The amount of grind required to get to Thaumcraft and other fun mods is absurd.

December 2023

12 December:

People who think that

int max(int * array, int length)
{
  int currentMaximum = 0;
  for (int index = 0; index < length; index++)
  {
    if(array[index] > currentMaximum)
    {
      currentMaximum = array[index];
    }
  }
  return currentMaximum;
}

is better than

I max(I*a,I n)_(I t=0;Fi(n,IF(a[i]>t,t=a[i]))t)

Can not be trusted.

10 December:

Getting slowly bored of AoC. Too many uninteresting, cookie cutter imperative problems.

9 December:

I wish mailing lists were more common. E-mail feels a bit like a forgotten medium to me, but it helps me stay organised and safe with OpenPGP signing & encryption.

November 2023

18 November:

(1)  (2×((¯3+25×)×!(+)×!)+.÷!(3×)×2*)

9 November:

Interesting benchmark results for general-purpose data compressor performance on Brainfuck code (compiled tiny_sk from the asm2bf distribution):

% wc -c *
29446 tiny_sk.b
 1188 tiny_sk.b.br
 1405 tiny_sk.b.bz2
 1368 tiny_sk.b.bz3
 1345 tiny_sk.b.gz
 1392 tiny_sk.b.lzma
 1097 tiny_sk.b.paq8l
 1269 tiny_sk.b.zst

Remarkably: While PAQ8L wins, its closest contender is actually Brotli. Then Zstandard, to be followed by gzip, bzip3, lzma and bzip2.

8 November:

Figuratively, trying to bite off twice as much as I can chew sounds like a good way to wrap up the last few years for me. Seemingly, I can chew more and more as time passes. This can’t be sustainable.

July 2023

13 July:

Some musings about register allocation.

Before you read this, remember that this is a highly hypothetical scenario completely disconnected from how CPUs work. Registers do not have to be 8 bytes, there are caches for registers, etc…

Consider a special variant of register allocation where outside of wanting to minimise the amount of spills interprocedurally, we also want to put another constraints on how the registers are allocated. For example, instead of using the first available temporary register as many code generators for MIPS do (unfortunately further away from optimal on x86 due to register requirements for div, etc…); we want to pick a register such that the last referenced register is the closest to the currently considered register. In particular, consider some function f(x, n, m) which models the runtime of the two operand instruction being currently executed. Long division p <- p/q has the computational complexity of O(q^2), hence our function f(x,n,m)=(xm)^2, where x signifies the cost of loads from p and q. Loading Q after having loaded Q again is cheap (caching), but loading P after having loaded Q or vice versa is more expensive. The cost x is defined as |R_p - R_q| - i.e. the distance between two registers in the register file. This may come useful in scenarios where registers are large and the computer has multi-level cache that quickly evicts unused data and eagerly caches new data.

For example: div r1, r4 has the cost factor |1-4|=3 further applied to the worst-case (r4)^2 amount of operations - the instruction would take 3*(r4)^2 cycles. The cost factor of div r2, r1 would be only |2-1|=1, hence the instruction takes only (r1)^2 cycles.

Hence the question posed is: What is the most efficient algorithm to model this particular register preference? The answer to this question would probably provide answers to other similar questions regarding register preference that are ubiquitous on platforms where the exact register you choose for a particular variable does matter (e.g. x86; due to how certain instructions like to store output/take input from a hardcoded register).

A crucial thing to notice is that the problem of register allocation with a preference on the closest register available is essentially equivalent to a modified variation of the 1-D Travelling Salesman Problem where every node can be (and is encouraged to be) visited multiple times if possible!

It just so happens that compilers appear to emit low numbered registers, but that’s due to preferential treatment for volatile registers, as used by calling conventions and then coalesced etc. since using a higher numbered (typically where non-volatile/callee-save live) amounts to spill + reload insertion. In graph colouring, one could use a heuristic to select free colours as closest to an already selected colour. Hence the compiler backend developer’s solution to the problem would be prioritising colours closest to the direct neighbours already assigned colours assuming an ordering to the colours, obviously where colours are numbers to produce a non-optimal but relatively good result.

Notice how similar this approach is to the nearest neighbour search approximate solution to the Travelling Salesman Problem. Hence, to connect the dots: I think that this particular solution is the best one considering speed and how close the output is to being optimal. An optimal analogue would be the exhaustive search TSP solution, while a considerably less optimal but way faster in terms of computational complexity option could be applying the Christofides algorithm.

If you are still wondering what is the use of it, I have to disappoint you and refer you to reading and comprehending this article: https://esolangs.org/wiki/Brainfuck

6 July

A good example of this (red: “It’s not that computers are getting slower, it’s that software is becoming less efficient”) on a completely different layer is JS engines. Notice how stuff like V8 is extremely fast and complicated for what it does. If the web ran on QuickJS or PUC-Rio Lua it would be completely unusable. And this all is because of how much awful horrible JS code there is around, so instead of fixing the very obviously wrong code we simply make better interpreters/compilers, which in the long run, is significantly more complicated.

Instead of putting in the effort to write high quality optimising runtimes for functional or formally verified languages which would actually push computing forward in the long run, we keep trying to make a 30 year old, insignificant or even regressive from a theoretical standpoint, language run fast because the code written in it sucks.

1 July:

Regarding the recent WHO decision to classify aspartame as a potential carcinogen, remember that being a hairdresser or eating pickled food according to the medical literature is also potentially carcinogenic.

https://pubmed.ncbi.nlm.nih.gov/19755396/ & https://aacrjournals.org/cebp/article/21/6/905/69347/Pickled-Food-and-Risk-of-Gastric-Cancer-a

June 2023

30 June:

I have just removed 99% of JavaScript code that used to be on my website. The remaining 0.5KB is used for the mobile navbar to work. So, technically speaking, my website is now completely the same as if you disabled JS entirely. And it still has syntax highlighting in blog posts and math rendering. The only exception being the inherently dynamic subpages of my website (the SDL game ports, etc… - these obviously won’t work well without JS)

26 June:

Most people would never tolerate common scams in a physical setting but if you make one small change as to the technology being used, the mentality in some people changes.

This phenomenon of distancing layers via technology is actually really common; think of how many friends you have that would never fall for traditional multi-level marketing scheme, “get rich quick” scam, penny stock pump and dump, but then if you change the technology to, say, cryptocurrency, then some of those red flags just subconsciously go away. I’ve seen real-life examples of people who on one hand are aware enough to say all these influencers trying to shill this penny stock just want to pump and dump me but then later on they say “Yeah, I really do think that this doggy themed token with no utility whatsoever is going to become 100x more valuable so I better get in quick!” and you might even know somebody who went “I’m not going to give this RuneScape scammer my go- oh my goodness Obama’s doubling my Bitcoin on Twitter!!”.

So many new scams are just old scams with new technology because of this very same psychological distancing barriers that we subconsciously create.

25 June:

It would be nice if IRC didn’t die and someone came up with ways to extend the protocol to support E2EE and other stuff.

I remember being a pretty happy and frequent IRC dweller back in maybe 2019, but it has only gone downhill since (when it came to activity, quality of discussion, the entire freenode drama, etc…) and because I haven’t made that many particularly good friends, I didn’t end up being invited into mid-sized private networks which to my knowledge still thrive and do surprisingly well considering that they are IRC. I can only imagine a similar fate has met XMPP.

OTOH, most quality internet places are slowly moving away from centralised services and slowly dig themselves underground. It’s getting harder and harder to tell AIs apart from humans, some of my friends are particularly paranoid about their messages being used to train LLMs. Internet is slowly becoming an useless sea of spam again.

My main issue with python is the GIL, mess with venvs and other nonsense, bad performance. Python itself is not very embeddable, lacks a proper “type” system. It would be nice if we had some sort of an unintrusive typing system that would help to catch a lot of embarrassing mistakes we make while writing js/python/lua/other untyped languages; I feel like gradual typing from TS actually solves this problem pretty nicely!

A reasonably fast lisp/scheme built on top of a lua-like VM with a gengc & jit compiler + gradual typing of TS and a ground-up implementation of a rich standard library that doesn’t make the programmer reinvent hashtables, graphs or linked hashsets would also be nice. To me what makes a scripting lang good is reasonable (not C-level but still okayish) performance, a substantial amount of software you can graft code from to speed up development (see: python’s ecosystem), some sort of largely inference-based static verification with minimal amount of decorators and other crust to prevent certain classes of errors in the runtime, etc.

20 June:

A gentle reminder of the still unsolved issue that I had with the Linux Kernel ever since I started using a M.2 drive:

If you use LUKS to encrypt a M.2 SSD drive and then perform intensive I/O from within the system, it is going to lock up your entire machine. Impressive, isn’t it?

Debian has dm-crypt worker I/O queues enabled by default and they’re written very poorly, so the kernel waits until they are full or near-full before trying to sync them to the disk, and with multiple queues all fighting for disk access, the disk dies under the load and the system locks up. Now, a linux nerd is going to cry me a bucket of tears that the queues are written perfectly with no flaws whatsoever. The problem is that i don’t care, whatever iotop shows is the truth revealed.

I also can’t run any of my VirtualBox VMs because of this bug in VBox reported 10 years ago: https://www.virtualbox.org/ticket/10031?cversion=0&cnum_hist=14. Obviously, VBox hates I/O latency and eventually gives up if access to the host’s storage takes too long so the hypervisor turns back around to the guest VM and says that the read/write is impossible and the Windows instance in the machine randomly bluescreens.

Situations like these make me miss Windows pretty badly. Shame that W8, W10 and W11 are essentially spyware unless you go through heights to debloat them.

19 June:

Calling all C language lawyers for help: I am wondering whether empty structures are allowed or not. assuming either c99 or c11, whichever more favourable. To quote the standard,

6.2.6.1 Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

This could imply that an empty struct has a non-zero size (so a C++ like behaviour), however, 6.2.5-20 says: “A structure type describes a sequentially allocated nonempty set of member objects”. So I thought that I can circumvent it the following way: struct { int dummy: 0; } variable;

One would have to remove the declarator, yielding struct { int : 0; } variable;, per 6.7.2.1.4: “The expression that specifies the width of a bit-field shall be an integer constant expression with a nonnegative value that does not exceed the width of an object of the type that would be specified were the colon and expression omitted.) If the value is zero, the declaration shall have no declarator.”

So finally, whether we can have empty structs or not depends on whether int : 0 counts as a member object, but i can’t find anything that would be conclusive on this matter. I have already observed that the C standard treats zero size bit-fields specially, but the only relevant bit of information I could find was 6.7.2.1.12: “As a special case, a bit-field structure member with a width of 0 indicates that no further bit-field is to be packed into the unit in which the previous bit-field, if any, was placed.”

Any ideas?

N.B. the wording in 6.7.2.1.4 says “non-negative”, not “positive”, meaning that the width of 0 is technically allowed as a “normal” width.

18 June:

One thing that I feel very thankful for is my university helping me to completely 180 my opinion of academia: from “It would be cool to be an uni-affiliated researcher after I am done with my studies” to open resentment.

My opinion is rather long and nuanced, but my main issues are the issues with publishing, how hierarchic academia is, especially when you look at freshly graduated M.Sc’s and B.Sc’s who were by chance allowed to TA in B.Sc-level classes.

In academia, you don’t even get to validate the results of “novel algorithms” yourself, let alone get any timings. Papers usually never publish source code, derivations - nothing. I wonder where is the outrage for the piss poor state of academic research, where you keep seeing lots of papers just rehashing known ideas, calling them new, presenting claims of SOA without even checking what’s out there, and providing no way to reproduce it? In image compression most are just based on MatLab simulations and cherry picked results on 4 or 5 low resolution images. Doesn’t help that most reviewers for scientific journals are completely illiterate when it comes to data compression. A lot of the papers are cryptic, seem difficult to implement, there’s no way to verify the claims so you have to trust the charts and tables, even if there was a reviewer motivated by hell knows what (reviewers are, usually, unpaid) the paper authors could corner him by saying that his re-implementation of their highly abstract paper is inefficient so the timings or memory usage don’t compare. And of course, nobody in academia ever publishes source code because by doing so you basically hand a loaded gun to the reviewer, as it’s simple to go through some ready made code, compile it and verify it rather than follow a dense math expression that looks like a 8th grader learning about “complex math notation” and try to reason whether that’s practical or not - not including the code makes it less likely for your fraud to be discovered. This problem isn’t even exclusive to data compression, but it’s specifically prevalent there. Many medical paper frauds aren’t properly uncovered for decades even though clinical trials can usually be repeated in just about every hospital around the world. The amount of people who want to go through a poorly typeset 70 page paper in some niche journal full of math gibberish that promises to turn lead into gold and spaghetti trees is probably close to zero. When you publish something, what you want on your cv is usually the only thing that matters. The people who will read and attempt at understanding your paper will probably be only you, the reviewer and that one random hobbyist who found it on google scholar and gave up when he saw the 3rd page.

May 2023

5 May:

This has to be the most curiosity-inducing error messagge that I have seen in a long while.

In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type*, const char_type*, std::size_t)’,
   inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:423:21,
   [...]
/usr/include/c++/12/bits/char_traits.h:431:56: warning: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets -4611686018427387902 and [-4611686018427387903, 4611686018427387904] may overlap up to 9223372036854775813 bytes at offset -3 [-Wrestrict]
 431 |         return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n));
     |                                        ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~

Turns out that this is actually a compiler bug - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329 and the comments on this thread are awesome. Creating a new string instance using the (int n, char c) constructor basically causes warnings to break due to some issue with Ranger, it’s been a known bug for 2 minor versions of GCC, and there is no good fix for it. And you need a PhD in compiler design to understand why.

April 2023

18 April:

Realisation: When I was working on my ELF infector post, I once didn’t properly sandbox it from the rest of my system. This way I accidentally got cargo and a bunch of other binaries infected and when they randomly started behaving weirdly, I finally figured out what was going on and reinstalled Rust.

14 April:

One way to allow RC without a cycle collector would be to enforce an unidirectional heap. And the actor model will surely help in avoiding atomics too.

13 April:

Seems like bzip3 wipes the floor with lzma when it comes to java class files. What if i told you that the size of the average jar you see online could be 2x-4x smaller?

 94136320 lisp.tar
  6827813 lisp.tar.bz3
 13984737 lisp.tar.gz
  7693715 lisp.tar.lzma
  8292311 lisp.tar.zst
130934896 total

February 2023

16 February:

It’s alive…

--> cas:taylor x 0 (cas:fn x \sin x)
ƒ(x)=(- x (+ (+ (* (/ 1 6) (** x 3)) (* (/ 1 120) (** x 5))) (o (** x 6))))
--> cas:integral (cas:taylor x 0 (cas:fn x \sin x)) dx
ƒ(x)=(- (- (* (/ 1 720) (** x 6)) (+ (* (/ 1 24) (** x 4)) (* (* (/ 1 2) x) x))))

13 February:

I have decided to decommission Lovelace (the i5-7400 16G server) and sell it. Primarily because it’s not very power efficient and I can move my services to VPSes anyway.

9 February:

An interesting method of computing permutation parity using the 3-cycle composition property:

bool parity(int * p, int n) {
    if (n == 1) return 0;
    int C[n], I[n];
    for(int i = 0; i < n; i++)
        C[i] = i, I[i] = i;
    for (int i = 0; i < n - 2; i++) {
        if (C[i] != p[i]) {
            int j = I[p[i]];
            int tmp = C[i];
            C[i] = C[j];
            int k = j == n - 1;
            C[j] = C[n - 1 - k];
            C[n - 1 - k] = tmp;
            I[C[n - 1 - k]] = n - 1 - k;
            I[C[j]] = j;
            I[C[i]] = i;
        }
    }
    return C[n - 1] != p[n - 1];
}

Unsurprisingly, translates horribly into array logic languages. I wonder how would I implement it in my Lisp…

January 2023

14 January:

A (hopefully) interesting idea: A virtual machine with a rich standard library and instruction set, procedural, functional, based on the Actor model. I plan to use only reference counting and cycle collection, have it be variable-based (no manual register allocation and no stack to make matters worse). Fully immutable, but it’s possible to implement functional data structures using a cute way built into the interpreter. Likely JIT-compiled using either cranelift or llvm. Can send code over the LAN or even the Internet for transparently parallel execution. Provides some cute utilities for number pushing; completely statically typed and ideally the code is monomorphised before being passed to the VM.

December 2022

20 December:

I hate programmers who have very big mouth and tunnel vision eyes that together jump to form the most radical and nonsense views I have seen in my life. And nothing to back their redundant opinions with.

8 December:

Having implemented the Lerch transcendent and Riemann zeta, now it’s time for the Hurwitz zeta. Technically speaking, the Lerch transcendent is a generalisation of the Hurwitz zeta, so that ζ(s,n)=L(0,n,s)=Φ(1,s,n); However, my implementation of Lerch phi (which is still not as efficient as I’d like…) computes the upper incomplete Gamma function value as a factor in the final result, and when z=1 a!=1 we stumble upon a funny case where the upper incomplete Gamma function has a complex pole /yet/ the Lerch phi is defined at this point (as of course the Hurwitz zeta).

The game plan now is to implement a somewhat general Euler-MacLaurin summation function and derive the formula for the n-th derivative of the Hurwitz zeta function with respect to s (which should obviously be trivial) to speed up the “general” method.

This will have an interesting consequence: We can compute an arbitrary derivative of the Hurwitz zeta at any point we wish, meaning that computing the Glaisher constant defined in terms of the derivative of zeta at some integral point will become attainable.

The pieces of puzzle in SciJava are slowly coming together.

November 2022

11 November:

I tasted vanilla Chai Tea for the first time in my life! And the first thing I did after coming back home from uni cafe was to try making it myself at home. Turns out i packaged some ginger when i moved, i had some cinnamon and i bought honey on my way home. I made it and the taste was rather mild and I realised that the tea I brewed from the leaves was too strong. Better luck tomorrow I hope.

August 2022

30 August:

Taking another look at it, I feel like MalbolgeLISP (especially v1.2) might be the best thing I made in my life. It’s so weird to think that it’s been a year now…

29 August:

My thinkpad arrived today! Just E14 with a Zen 3 Ryzen 5 and 16G of RAM. Initially had a few issues with making it boot off USB and getting networking to work, but now it works pretty good and I’m happy with it.

May 2022

23 May:

Good and bad news!

First, my compressor is assymetric now. The bad news is that it’s assymetric the wrong way around - compression is quite a bit faster than decompression…

4 May:

Once in a while, the circumstances allow to use the goes-to operator…

            uint32_t log2 = (run_length <= 256)
                ? constant::LOG2[run_length - 1]
                : (31 - __builtin_clz(run_length));
            if(dst_idx >= count - log2)
                { res = false; break; }
            while(log2 --> 0) // The famous "goes to" operator.
                dst[dst_idx++] = (run_length >> log2) & 1;
            continue;