The third iteration of Doug Bolden's various thoughts and musings.

Category: Blog Stuff

Inline Substitution Ciphers to Play with Semi-Hidden Text

jLh 903mO moh0 m3 S6 SnE 0so Vhshn0Sh 3SnmsV3 6Y ShFS SL0S 0nh 903mO0MME ‘Lmoohs ms QM0ms 3mSh’ (C2r!) 9tS 0M36 Vhshn0MME nhO6Vsmd09Mh 03 LtU0s-BnmSShs ShFS 9E nhS0msmsV UtOL 6Y SLh QtsOSt0Sm6s, BLmSh3Q0Oh, 0so 6SLhn hMhUhsS3. jLm3 B6tMo hs09Mh Uh, Y6n ms3S0sOh, S6 BnmSh ShFS SL0S O6sS0msho 3Q6mMhn3 6n L0o 6SLhn 03QhOS3 s6S msShsoho S6 9h nh0o 9E jLh 7MV6nmSLU BLmMh 3tnn6tsoho 9E ShFS SL0S m3 QhnYhOSME LtU0s- 0so U0OLmsh-nh0o09Mh. a O6tMo 30E ‘J6t = Q66 Q66 Lh0o’ BmSL6tS SL0S 9hmsV msohFho. uE NhhQmsV mS 0 36UhBL0S 3mUQMh 3t93SmStSm6s OEQLhn, SLm3 Uh0s3 SL0S mS h03E Y6n Qh6QMh S6 Sn0s3M0Sh h1hs BmSL6tS 0sE 3OnmQS 0so 0MM6B3 mS S6 9h nhM0Sm1hME tso6sh 0S 0 M0Shn o0Sh.


If you click the text above, it should “solve out” to a line of text that reads:

The basic idea is to try and generate strings of text that are basically ‘hidden in plain site’ (PUN!) but also generally recognizable as human-written text by retaining much of the punctuation, whitespace, and other elements. This would enable me, for instance, to write text that contained spoilers or had other aspects not intended to be read by The Algorithm while surrounded by text that is perfectly human- and machine-readable. I could say ‘You = poo poo head’ without that being indexed. By keeping it a somewhat simple substitution cypher, this means that it easy for people to translate even without any script and allows it to be relatively undone at a later date.

And then if you click it again (without refreshing the page), it should do essentially nothing. This is my basic first pass on coming up with an idea I have had for Dickens of a Blog since way back. I am unsure when I first posited it but likely around 2006 or 2007.

The idea was simple: set aside some portion of the text in an otherwise open-to-read blog post {e.g., spoilers, info semi-hidden from scrapers, bits that otherwise might be triggers} through a simple enough cipher or baseline encryption that solving it would not become hostile to Doug’s happiness if keys/etc were lost.

The Code Behind It

Version 1 is above. What happens if I have a fairly simple Python code:

from random import sample

def scramble_AlphaNum(oldAlphaNum):
    return ''.join(sample(oldAlphaNum, len(oldAlphaNum)))

alphaNum = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz0123456789"
newAlphaNum = scramble_AlphaNum(alphaNum)

text = "The basic idea is to try and generate strings of text that are basically 'hidden in plain site' (PUN!) but also generally recognizable as human-written text by retaining much of the punctuation, whitespace, and other elements. This would enable me, for instance, to write text that contained spoilers or had other aspects not intended to be read by The Algorithm while surrounded by text that is perfectly human- and machine-readable. I could say 'You = poo poo head' without that being indexed. By keeping it a somewhat simple substitution cypher, this means that it easy for people to translate even without any script and allows it to be relatively undone at a later date."
txet = ""
paraName = "demo01"

for t in text:
    try:
        txet = txet + newAlphaNum[alphaNum.index(t)]
    except:
        txet = txet + t
        
output = """ <p id=\"""" + paraName + """\" onclick="gentleScramble('""" + newAlphaNum + """', '""" + paraName + """'); this.onclick=null;">""" + txet + """</p>"""
        
print(output)

Right now, I have to manually edit the file to have the paragraph, div, or span ID and then the contents. It’s fairly trivial to more generalize this. Running that, it spits out a paragraph tag that looks like:

<p id="demo01" onclick="gentleScramble('70u9wOboihzYgVXLamGHINxMAUrsk6CQfcpnv3jS2tR1DB4FJEZdeyWqP8Tl5K', 'demo01'); this.onclick=null;">jLh 903mO moh0 m3 S6 SnE 0so Vhshn0Sh 3SnmsV3 6Y ShFS SL0S 0nh 903mO0MME 'Lmoohs ms QM0ms 3mSh' (C2r!) 9tS 0M36 Vhshn0MME nhO6Vsmd09Mh 03 LtU0s-BnmSShs ShFS 9E nhS0msmsV UtOL 6Y SLh QtsOSt0Sm6s, BLmSh3Q0Oh, 0so 6SLhn hMhUhsS3. jLm3 B6tMo hs09Mh Uh, Y6n ms3S0sOh, S6 BnmSh ShFS SL0S O6sS0msho 3Q6mMhn3 6n L0o 6SLhn 03QhOS3 s6S msShsoho S6 9h nh0o 9E jLh 7MV6nmSLU BLmMh 3tnn6tsoho 9E ShFS SL0S m3 QhnYhOSME LtU0s- 0so U0OLmsh-nh0o09Mh. a O6tMo 30E 'J6t = Q66 Q66 Lh0o' BmSL6tS SL0S 9hmsV msohFho. uE NhhQmsV mS 0 36UhBL0S 3mUQMh 3t93SmStSm6s OEQLhn, SLm3 Uh0s3 SL0S mS h03E Y6n Qh6QMh S6 Sn0s3M0Sh h1hs BmSL6tS 0sE 3OnmQS 0so 0MM6B3 mS S6 9h nhM0Sm1hME tso6sh 0S 0 M0Shn o0Sh.</p>

I add that to my document via Custom HTML. The first string is the randomized a-z/A-Z/0-9 alphanumeric characters of the common American English alphabet (etc). It is randomized per running of the script.

Then at the bottom of the page, I insert another Custom HTML section with this Javascript:

<script>
function gentleScramble(newAlpha,para) {
	const AlphaNum = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz0123456789";
	const newAlphaNum = newAlpha;
	
	let victim = document.getElementById(para).textContent;
	let solution = "";
		
	for (let v = 0; v < victim.length; v++) {
		foundIt = newAlphaNum.indexOf(victim[v]);
		if (foundIt != -1) {
			solution = solution + AlphaNum[foundIt];
		} else {
			solution = solution + victim[v];
		}
	}
		
	document.getElementById(para).textContent=solution;
}
</script>	

That text the paragraph and the substitution cipher and runs it the first click before passing the “this.onclick=null” to stop it from glitching out if a reader spam clicks it.

As it runs through it checks for characters in the defined “alphaNum” and ignores any that are not included. Those that are included it just re-subs them back to their original.

Voila.

Before you might say that this is fairly insecure, that is kind of the point. Is not trying to deeply encode the text, it is more just trying to play at gently hiding the text in a somewhat breakable pattern.

Current Issues

The first issue is that it is pretty hands on to generate the content, which is not 100% a problem for me but if I have several of these elements it will start to wear.

The solution I’m going to do is build a quick tool that allows for different element types {div, p, span} and a bit more of a GUI, probably through just a quick HTML page with text areas and buttons.

The second issue is that it only accepts characters in the a-z/A-Z/0-9 ranges. If I am typing in French and other languages, characters with diacritical marks will be ignored. This means that “ä” will show up as “ä” in the enciphered text. It’s not a deal breaker since the bulk of the text will be gently scrambled, but it can lead to potential weirdness.

The solution to this could be either to scan the contents and generate a shortened “alphaNum” that only includes characters in it while ignoring all the punctuation OR creating a new diaAlphaNum that includes a separate list of diacritically marked text.

I’m not sure which I prefer. I think I prefer to not worry about that so much.

The final issue at a glance is that any HTML elements inside that element {em, a, strong} would likewise be translated which at best would simply glitch them and at worse could in theory create HTML that is broken if it happens to stumble upon a different element than intended.

My solution to this problem is just to not do any of that.

There is a slight non-issue that feed readers and such will likely break in trying to help, but that’s a bit ok for the moment. Not for driving clicks or any such thing, just in that earlier attempts to build CSS/Javascript spoiler type solutions sometimes resulted in said spoilers being clearly visible to feed readers. It does possibly interfere with screen readers and that is a much bigger problem, but I’ll have to test it.

Possibilities for Expansion

My possible end goal for this would include this as a checklist:

  • Perhaps using a Vigenère cipher instead of a simple substitution one [because I prefer those],
  • Making it at least “smart” enough to ignore interior HTML elements, and
  • Generating a bit of styling that makes it more obvious what the reader is supposed to do, possibly including a failsafe type option if the reader has all javascript blocked, etc.

The Reclamation of Dickens of a Blog

In the first post on this blog, I talked about the updates to the “old” wyrmis.com and how I consider this to be a continuation — eight-years later — of the initial project which that blog & website represented. Which was in, in principle, a continuation of various blogs and websites that I had been working on for years.

BY THE WAY: I plan on having a part two to this post, more or less, more of a trip down memory lane type thing. I have screenshots and everything. I’ll come back and link it when I post it. If I post it, I suppose, but I think I will. I’m old enough now that reminiscing is nice.

Here is what the old site looked like around the time it was first launched (I think this screenshot would be more in the 2007-era after it had already gone through some evolution):

Then, around the time it was abandoned (2016), it looked more like:

In that decade in between, while the general color schemes and rough layout had remained the same, the back-end had grown a lot more complicated — involving multiple custom scripts in Python and PHP and a more complicated file structure — while also growing more out of date with modern web practises.

To put it in perspective, while that version was well after the general “Blogging” trend had started, it was an outgrowth of a website that had actually started back in 1997. One of its core issues was that it was dragging along a lot of content and structure as it became less and less a 90’s style website and more a 2000’s era blog.

If you note, the first image shows the “journal” section was off-site — first on Livejournal and latter on Blogger — because it wasn’t until later an adequate but generally poor blogging “software” was integrated by myself into the existing page. By “integrated,” I mean that I coded it and then spent entirely too long making it act like the rest of the website.

If there is one lesson you take from this: Make your tools work for you, do not work for your tools. I violated that principle. It shows.

The Problem(s) As It Stands

There are a few problems with what to do with the old site. The main ones to be:

  • The HTML, CSS, PHP, Python, Images and essentially all the rest are a hodge-podge of 1997-2017. Twenty years of various web eras.
    • Even though the bulk of the site was at least partially updated and badly polished throughout the 2010s, enough issues remain that made it nearly impossible to edit as-is into anything truly fitting a post-2010 website.
    • In fact, some of the multitudinous layers of bandages actually hurt the repair because different eras of pages have different enough code that anything but a hand-coded fix is likely to break other portions.
  • A lot of the content would only be saved for purely archival purposes (we’ll dub this The Librarian Principle). Links are likely broken and fixing them would potentially require longer than any value would be added for anyone. Timely content is no longer near timely. Trends and discussions are based on their era which is not that far ago but over a decade.
  • The contrast of The Librarian Principle is The Embarrassment Principle. Past-Doug was a weird boy. Some of the things typed up because I thought it was funny at the time are decidedly not. A few of the points for which I argue vehemently are no longer anything like a stance I would take. By the time we get to the wyrmis.com-era, that is less true, but…man. I think I might make a third part for this. One where I lecture myself. It’s not quite suited to going into any more for this post.

Is It Worth Solving?

In a word…

I don’t really know. Like I said, I just volunteer here. At least some of it seems worthy. A few bits. Possibly even the majority, really.

I need to write a script that takes all the pages on the site and then just randomly picks five of them to read. See if the Librarian beats out the Embarrassed via dice roll.

The truest answer I can make at this time is that the best solution forward is to “fix” the big pieces and then figure out which of the smaller individual pieces to retain.

It feels dishonest to delete all the portions with which I now disagree or dislike, so I’ll work on something like a balance. A triage. Some will get instantly deleted if they simply do not fit (“fit” is doing some heavy lifting, take it as you will), are too time-locked to be worth saving, or any other heavy complaint I might have. Some will get instantly saved and enshrined into place as a part of my decades online. Some will get updated and possibly ported over here.

I think the lines I’ll draw in the sand is that stuff that is good enough to stay as is will stay where it is (Type A), stuff that could be better might get brought over to this blog and updated (Type B), and stuff that I don’t feel like saving will either join with Type A with minimal fixing or simply disappear. This means the old site/blog will have a mix of highs and lows with the middle joining my new writings.

A hint towards verisimilitude actually masking a large scale “reclamation” project.

The Mechanics of It All

Going back up The Problem(s) As It Stands, the first bullet point and the sub-bullets are the meat of the mechanical issues. There are two variations of solution:

  1. Develop a new schema and then port the old bits into the new bits.
  2. Strip the old schema off and just retain the core bits.

I am currently opting for #3: a bit of both. The new schema is largely just a minimal CSS and jQuery working frame that delivers the text in a readable — both human-readable and machine-readable — manner (albeit fairly bland) but otherwise ignores much of the intricacies of what came before. Headers {e.g., H1, H2} and body content {e.g., P, LI} items will be mostly HTML-standard with a few variations.

In practical terms, this means I am:

  1. Taking the old page (currently one at a time)
  2. Replacing the HEAD content with a newer, improved version.
  3. Deleting all the old menu, footer, counter, and similar code not in the main body.
  4. Replacing the title/banner portion with a simplified version.
  5. Adding in new DIVs that act as placeholder for repeating content {e.g., menus, site-wide idents} and then using jQuery to handle that.
  6. Generally going through the body and making sure things mostly work. Deleting a few portions that no longer fit the criteria above.
  7. Uploading that to the site.
  8. In some cases, adding redirects to “close” a portion of the site or to make up for things that will now be missing.
  9. Eventually, going through and deleting the pages that neither fit into Types A or Type B.

I have worked out the stuff to get the new-schema pages and the site as a whole into HTTPs. And to be more responsive. In theory, I can work on a script that will do #2, #3, and #5 for me though I’ll likely have to do the rest by hand.

I also will be adding this post as at least a temporary link to show up near the bottom to explain to folks why things are happening. Ironically, this will only show up on pages I have partially fixed but so it goes.

Why “Reclamation”?

Just to wrap this up: why am I calling it a reclamation?

It just feels right as a term. There were years of myself in that website and blog. Lots of memories. Lots of creative output. In theory it could stay as is — online or just on my personal storage devices — but I like the idea of retaining some of it. More than that. Making it usable, again. Giving credit to past-Doug where credit is due. Also holding my past-self to a higher standard.

I am me because of his idiocy. I just wish someone had fussed at him like I am about to fuss at myself.

There is also a complicated side-aspect that some of those posts have been taken a bit out of context or been copied over and all sorts of stuff that can happen to websites across decades. By cleaning it up and improving its general SEO-ness, it helps to establish it more as a part of its own record.

What Kind of Time-line Are We Looking At?

As for the question of how long will this take? I have only one answer…

Hello, is this thing on?

It is nice to talk to you again, Space Pilgrims.

The very last post I made to the old version of Dickens of a Blog was “I, This Thinking Thing”. That was August 2016. That means it has been over nine years since I’ve made a real post under that branding.

Today, I went through and created a new [possibly temporary] front page to the wyrmis.com site that looks a bit like this:

It mostly directs people to here, to The Doug Alone and to the [still very much so being finalized] Doug Talks Weird. Those two and this site are the new “Dougiverse” [pronounced “Dougie Verse”].

While Doug Alone has been brewing for over a year now, and Doug Talks Weird dates back to something like 2014 YouTube videos, I have spent a good amount of the past two weeks sorting and trying to rebuild my online identity so that I can start posting and sharing things without relying on “more traditional” social media. A strange sentence to type.

So Many Words to Say

I reached a point those nine years ago where I wanted to shut up for a minute. Then, around two-to-three-years later I kind of wanted to take it back. However, the time it would take to rescue the old blog — from younger-Doug’s rambles as much as younger-Doug’s hand-coded functions that had been left behind by something like ten years on a changing web — always made me shy away. I would post online here or there, share pictures here or there, but mostly I just withdrew.

However, I am at a time again where I would like to just have a spot to ramble. So this blog is here, now. It is not a replacement of the old one. It is more a continuation in a way that is a bit more responsive, a bit less intensive — I would sometimes have to go into the Python back-end of the old one and custom tweak things to keep posts working and had to remember dozens of custom commands, tools, and pieces — and hopefully a bit reader-friendly without so many baked-in Dougisms.

It Will Take Time

That being said, it will probably a week or two at least before the page even looks like it is going to look. I’m going to try and not sweat it too much.

As for today, I have just spent five hours getting everything set up to hit the point I can post this. I am an hour behind eating lunch and still need to do my daily work out and shower first. Well, maybe not first. I’ll figure it out.

Hopefully, I’ll see you soon.

–Doug Bolden

Powered by WordPress & Theme by Anders Norén