Short, random URL strings

maiki · February 25, 2020, 4:10am

I’m generating what amount to micro-blog posts. They have individual URLs, but the site is a feedback campaign, so we won’t run-out and I don’t have to future proof.

Currently I’m creating UUIDs, but the example URL is example.org/f/8152c556-0845-412b-81f7-e4a0721bafb7/, and I was thinking it might be a smudge easier to select and copy URLs if it looked more like example.org/f/bafb7.

It’s a WordPress site, so I’m using wp_generate_uuid4() to grab the UUID. That function is fairly straightforward (wp-includes/functions.php | WordPress Developer Resources):

function wp_generate_uuid4() {
    return sprintf(
        '%04x%04x-%04x-%04x-%04x-%04x%04x%04x',
        mt_rand( 0, 0xffff ),
        mt_rand( 0, 0xffff ),
        mt_rand( 0, 0xffff ),
        mt_rand( 0, 0x0fff ) | 0x4000,
        mt_rand( 0, 0x3fff ) | 0x8000,
        mt_rand( 0, 0xffff ),
        mt_rand( 0, 0xffff ),
        mt_rand( 0, 0xffff )
    );
}

I kinda get what’s happening there, but maybe one of you web commerce folks have already solved this particular issue and have a go to pattern.

draloff · February 26, 2020, 9:09am

PHP’s built-in uniqid() produces unique-ish strings 13 characters long, you can even add a prefix to it if you want them to be sorta-prettyish.

$ php -a
php > echo uniqid();
5e5634ca3da69
php > echo uniqid('post-');
post-5e56350105e02

It’s not guaranteed unique but it’s based on the current microsecond when it’s called so it’s going to be pretty unique unless you’re calling it across multiple servers at the same microsecond.

As a safety measure you could check to make sure that uniqid isn’t already used, and just re-roll if it is. If you want it to be even shorter there’s nothing stopping you from truncating the string to get even shorter ids, but of course it increases the change that you’ll have to re-roll if you do that.

draloff · February 26, 2020, 9:16am

Also, since they’re generated by microseconds, if you choose to truncate it’d probably be better to truncate off the front part rather than the back, since it’s the back part that will change most frequently.

Or if you really want to try hard to be less predictable, you could md5 your uniqids and then truncate them:

php > echo uniqid();
5e5636c2af59d
php > echo uniqid();
5e5636c381344
php > echo uniqid();
5e5636c43cfb2
php > echo uniqid();
5e5636c4e4ebf
php > echo uniqid();
5e5636c597134
php > echo uniqid();
5e5636c66248d
php > echo md5(uniqid());
4e04050c907f1ff961f84ab9283b5e3f
php > echo md5(uniqid());
6abe75ac746e8ba980af5d5255b6b921
php > echo md5(uniqid());
ffe3af5ce08bcd7f8439952aa3e77378
php > echo md5(uniqid());
1ecef56624978aa11bc2a06f93d9dfca
php > echo md5(uniqid());
cfc9b52c0dd78edcf8779261078574a4

As you can see, that little change gets you ids that aren’t so sequence-y.

But depending on how quickly these posts are being generated, none of this might matter that much compared to just using uniqid(). You can fiddle all day trying to get different kinds of pseudo-randomish strings

draloff · February 26, 2020, 9:21am

Another option might be to be to do something like sha1() the content of the post and refer to it by the first couple of characters of the sha1 hash. Sorta similar to how people might refer to git hashes using only the first few characters

php > echo sha1("What's your favorite food? Mine's pizza!");
a25a2317aa17433c7a5acd71ed5fc490d69e5a20
php > echo sha1("I'mma little teapot, short and thin 'cause I'm workin' out!");
fafd060dc465812ffc6b7106ae918a4e009b84be
php >

If these were the contents of each blog post you could refer to them by a25a2317 and fafd060d respectively. This also has the advantage of being based on content rather than just timestamp, so if multiple people posted at the same microsecond they wouldn’t get the same hash. If it’s likely that posts with the same content might come up, just append on more metadata to hash with the content, like user, publish timestamp, etc. It doesn’t matter how much you concat together, the sha1 hash always comes out to the same length:

php > echo sha1("danr" . microtime() . "Hey mom, you're so cool!");
faf08261de6e38b007ad6e3298f63836c604856c
php > echo sha1("danr" . microtime() . "Hey mom, you're so cool!");
99a25211994c1ba242f109114295605c0907b2b5
php > echo sha1("maiki" . microtime() . "Hey mom, you're so cool!");
b3f43e13a23294a2250ee0e509e492d0f62fccf2

maiki · February 26, 2020, 5:50pm

All great ideas, thanks! I’m using WordPress, which has a built-in to ensure unique URLs (by appending a number at the end, after check $slug, $title, etc.), so no worries there (and I doubt multiple submissions per minute, let along microsecond).

draloff · February 27, 2020, 6:08am

Nice! Glad I could help.

and I doubt multiple submissions per minute , let along microsecond

Ahh, but it doesn’t have to be a constant stream of posts, it only takes one funny happenstance to put a program in a pickle. I’ve found that as soon as I doubt something as being unlikely, it happens.

Then, to quote one of my favorite lovable bears, I found that I have proven myself, “Foolish, Deluded, and a Bear of No Brain at All”.

Topic		Replies	Views
How to Generate a Static HTML Copy of Your WordPress Website Webcraft wordpress , static-site-generators	0	330	July 14, 2018
Now with more utf-8! Webcraft wordpress , utf	2	225	November 19, 2020
StatusNet Widget Webcraft	0	94	July 10, 2011
FeedWordPress Webcraft wordpress-plugins , feedwordpress	0	152	October 20, 2013
URLs in federated messages coming from Friendica Slow Tech Support mastodon , tootstream , friendica , activitypub	14	459	June 18, 2019

Short, random URL strings

Related Topics