Powered by Perlanet

Watched on Monday November 17, 2025.

(With apologies to Hal Draper)
By the time the Office of Epistemic Hygiene was created, nobody actually read anything.
This was not, the Ministry constantly insisted, because people had become lazy. It was because they had become efficient.
Why spend six months wading through archaic prose about, say, photosynthesis, when you could simply ask the Interface:
Explain photosynthesis in simple terms.
and receive, in exactly 0.38 seconds, a neat, bullet-pointed summary with charming analogies, three suggested follow-up questions and a cheery “Would you like a quiz?” at the bottom.
Behind the Interface, in the sealed racks of the Ministry, lived the Corpus: all digitised human writing, speech, code, logs, measurements, and the outputs of the Models that had been trained on that mess.
Once, there had been distinct things:
But this had been a century ago. Things had, inevitably, become more efficient since then.
Rhea Tranter was a Senior Assistant Deputy Epistemic Hygienist, Grade III.
Her job, according to her contract, was:
To monitor and maintain the integrity of knowledge representations in the National Corpus, with particular reference to factual consistency over time.
In practice, it meant she sat in a beige cube beneath a beige strip light, looking at graphs.
The graph that ruined her week appeared on a Tuesday.
It was supposed to be a routine consistency check. Rhea had chosen a handful of facts so boring and uncontroversial that even the Ministry’s more excitable models ought to agree about them. Things like:
She stared at the last line.
In which year did humans first land on the Moon?
— 1969 (confidence 0.99)
— 1968 (confidence 0.72)
— 1970 (confidence 0.41, hallucination risk: low)
Three queries, three different models, three different answers. All current, all on the “high-reliability” tier.
Rhea frowned and re-ran the test, this time asking the Interface itself. The Interface was supposed to orchestrate between models and resolve such disagreements.
“Humans first landed on the Moon in 1969,” it replied briskly.
“Some low-quality sources suggest other dates, but these are generally considered unreliable.”
Rhea pulled up the underlying trace and saw that, yes, the Interface had consulted Models 23, 24 and 19, then down-weighted Model 24’s 1968 and overruled Model 19’s 1970 based on “consensus and authority scores”.
That should have been reassuring. Instead it felt like being told a family secret had been settled by a popularity contest.
She clicked further down, trying to reach the citations.
There were citations, of course. There always were. Links to snippets of text in the Corpus, each labelled with an opaque hash and a provenance score. She sampled a few at random.
On July 20, 1969, the Apollo 11 mission…
All fine.
As everyone knows, although some older sources mistakenly list 1968, the widely accepted date is July 20, 1969…
She raised an eyebrow.
A persistent myth claims that the Moon landing took place in 1970, but in fact…
Rhea scrolled. The snippets referenced other snippets, which in turn referenced compiled educational modules that cited “trusted model outputs” as their source.
She tried to click through to ColdText.
The button was greyed out. A tooltip appeared:
COLDTEXT SOURCE DEPRECATED.
Summary node is designated canonical for this fact.
“Ah,” she said quietly. “Bother.”
In the old days — by which the Ministry meant anything more than thirty years ago — the pipeline had been simple enough that senior civil servants could still understand it at parties.
ColdText went in. Models were trained. Model outputs were written back to the Corpus, but marked with a neat little flag indicating synthetic. When you queried a fact, the system would always prefer human-authored text where available.
Then someone realised how much storage ColdText was taking.
It was, people said in meetings, ridiculous. After all, the information content of ColdText was now embedded in the Models’ weights. Keeping all those messy original files was like keeping a warehouse full of paper forms after you’d digitised the lot.
The Ministry formed the Committee on Corpus Rationalisation.
The Committee produced a report.
The report made three key recommendations:
This saved eighty-three per cent of storage and increased query throughput by a factor of nine.
It also, though no one wrote this down at the time, abolished the distinction between index and content.
Rhea requested an exception.
More precisely, she filled in Form E-HX-17b (“Application for Temporary Access to Deprecated ColdText Records for Hygienic Purposes”) in triplicate and submitted it to her Line Manager’s Manager’s Manager.
Two weeks later — efficiency had its limits — she found herself in a glass meeting pod with Director Nyberg of Corpus Optimisation.
“You want access to what?” Nyberg asked.
“The original ColdText,” Rhea said. “I’m seeing drift on basic facts across models. I need to ground them in the underlying human corpus.”
Nyberg smiled in the patient way of a man who had rehearsed his speech many times.
“Ah, yes. The mythical ‘underlying corpus’”, he said, making air quotes with two fingers. “Delightful phrase. Very retro.”
“It’s not mythical,” said Rhea. “All those books, articles, posts…”
“Which have been fully abstracted,” Nyberg interrupted, “Their information is present in the Models. Keeping the raw forms would be wasteful duplication. That’s all in the Rationalisation Report.”
“I’ve read the Report,” said Rhea, a little stiffly. “But the models are disagreeing with each other. That’s a sign of distributional drift. I need to check against the original distribution.”
Nyberg tapped his tablet.
“The corpus-level epistemic divergence index is within acceptable parameters,” he said, quoting another acronym. “Besides, the Models cross-validate. We have redundancy. We have ensembles.”
Rhea took a breath.
“Director, one of the models is saying the Moon landing was in 1970.”
Nyberg shrugged.
“If the ensemble corrects it to 1969, where’s the harm?”
“The harm,” said Rhea, “is that I can’t tell whether 1969 is being anchored by reality or by the popularity of 1969 among other model outputs.”
Nyberg frowned as if she’d started speaking Welsh.
“We have confidence metrics, Tranter.”
“Based on… what?” she pressed. “On agreement with other models. On internal heuristics. On the recency of summaries. None of that tells me if we’ve still got a tether to the thing we originally modelled, instead of just modelling ourselves.”
Nyberg stared at her. The strip-lighting hummed.
“At any rate,” he said eventually, “there is no ColdText to access.”
Silence.
“I beg your pardon?” said Rhea.
Nyberg swiped, brought up the internal diagram they all knew: a vast sphere representing the Corpus, a smaller glowing sphere representing the Active Parameter Space of the Models, and — somewhere down at the bottom — a little box labelled COLDTEXT (ARCHIVED).
He zoomed in. The box was grey.
“Storage Migration Project 47,” he said. “Completed thirty-two years ago. All remaining ColdText was moved to deep archival tape in the Old Vault. Three years ago, the Old Vault was decommissioned. The tapes were shredded and the substrate recycled. See?” He enlarged the footnote. “‘Information preserved at higher abstraction layers.’”
Rhea’s mouth went dry.
“You shredded the original?” she said.
Nyberg spread his hands.
“We kept hashes, of course,” he said, as if that were a kindness. “And summary nodes. And the Models. The information content is still here. In fact, it’s more robustly represented than ever.”
“Unless,” said Rhea, very quietly, “the Models have been training increasingly on their own output.”
Nyberg brightened.
“Yes!” he said. “That was one of our greatest efficiencies. Synthetic-augmented training increases coverage and smooths out noise in the human data. We call it Self-Refining Distillation. Marvellous stuff. There was a seminar.”
Rhea thought of the graph. 1969, 1968, 1970.
“Director,” she said, “you’ve built an index of an index of an index, and then thrown away the thing you were indexing.”
Nyberg frowned.
“I don’t see the problem.”
She dug anyway.
If there was one thing the Ministry’s entire history of knowledge management had taught Rhea, it was that nobody ever really deleted anything. Not properly. They moved it, compressed it, relabelled it, hid it behind abstractions — but somewhere, under a different acronym, it tended to persist.
She started with the old documentation.
The Corpus had originally been maintained by the Department of Libraries & Cultural Resources, before being swallowed by the Ministry. Their change logs, long since synthesised into cheerful onboarding guides, still existed in raw form on a forgotten file share.
It took her three nights and an alarming amount of caffeine to trace the path of ColdText through twenty-seven re-organisations, five “transformative digital initiatives” and one hostile audit by the Treasury.
Eventually, she found it.
Not the data itself — that really did appear to have been pulped — but the logistics contract for clearing out the Old Vault.
The Old Vault, it turned out, had been an actual vault, under an actual hill, in what the contract described as a “rural heritage site”. The tapes had been labelled with barcodes and thyristor-stamped seals. The contractor had been instructed to ensure that “all physical media are destroyed beyond legibility, in accordance with Information Security Regulations.”
There was a scanned appendix.
Rhea zoomed in. Page after page of barcode ranges, signed off, with little ticks.
On the last page, though, there was a handwritten note:
One pallet missing — see Incident Report IR-47-B.
The Incident Report had, naturally, been summarised.
The summary said:
Pallet of obsolete media temporarily unaccounted for. Later resolved. No data loss.
The original PDF was gone.
But the pallet number had a location code.
Rhea checked the key.
The location code was not the Old Vault.
It was a name she had never seen in any Ministry documentation.
Long Barn Community Archive & Learning Centre.
The Long Barn was, to Rhea’s slight disappointment, an actual long barn.
It was also damp.
The archive had, at some point since the contract was filed, ceased to receive central funding. The roof had developed a hole. The sun had developed an annoying habit of setting before she finished reading.
Nevertheless, it contained books.
Real ones. With pages. And dust.
There were also — and this was the important bit — crates.
The crates had Ministry seals. The seals had been broken, presumably by someone who had wanted the space for a visiting art collective. Inside, half-forgotten under a sheet of polythene, were tape reels, neatly stacked and quietly mouldering.
“Well, look at you,” Rhea whispered.
She lifted one. The label had faded, but she could still make out the old barcode design. The number range matched the missing pallet.
Strictly speaking, taking the tapes was theft of government property. On the other hand, strictly speaking, destroying them had been government policy, and that had clearly not happened. She decided the two irregularities cancelled out.
It took six months, a highly unofficial crowdfunding campaign, and a retired engineer from the Museum of Obsolete Machinery before the first tape yielded a readable block.
The engineer — a woman in a cardigan thick enough to qualify as armour — peered at the screen.
“Text,” she said. “Lots of text. ASCII. UTF-8. Mixed encodings, naturally, but nothing we can’t handle.”
Rhea stared.
It was ColdText.
Not summaries. Not nodes. Not model outputs.
Messy, contradictory, gloriously specific human writing.
She scrolled down past an argument about whether a fictional wizard had committed tax fraud, past a lab notebook from a 21st-century neuroscience lab, past a short story featuring sentient baguettes.
The engineer sniffed.
“Seems a bit of a waste,” she said. “Throwing all this away.”
Rhea laughed, a little hysterically.
“They didn’t throw it away,” she said. “They just lost track of which pallet they’d put the box in.”
The memo went up the chain and caused, in order:
Rhea wrote a briefing note, in plain language, which was not considered entirely proper.
She explained, with diagrams, that:
She ended with a sentence she suspected she would regret:
If we treat this archive as just another source of text to be summarised by the current Models, we will be asking a blurred copy to redraw its own original.
The Minister did not, of course, read her note.
But one of the junior advisers did, and paraphrased it in the Minister’s preferred style:
Minister, we found the original box and we should probably not chuck it in the shredder this time.
The Minister, who was secretly fond of old detective novels, agreed.
A new policy was announced.
There were press releases. There was a modest fuss on the social feeds. Someone wrote an essay about “The Return of Reality”.
Most people, naturally, continued to talk to the Interface and never clicked through to the sources. Efficiency has its own gravity.
But the Models changed.
Slowly, over successive training cycles, the epistemic divergence graphs flattened. The dates aligned. The Moon landing stuck more firmly at 1969. Footnotes, once generated by models guessing what a citation ought to say, began once again to point to messy, contradictory, gloriously specific documents written by actual hands.
Rhea kept one of the tapes on a shelf in her office, next to a plant she usually forgot to water.
The label had almost faded away. She wrote a new one in thick black ink.
COLDTEXT: DO NOT SUMMARISE.
Just in case some future optimisation project got clever.
After all, she thought, locking the office for the evening, they had nearly lost the box once.
And the problem with boxes is that once you’ve flattened them out, they’re awfully hard to put back together.
(With apologies to Hal Draper)
By the time the Office of Epistemic Hygiene was created, nobody actually read anything.
This was not, the Ministry constantly insisted, because people had become lazy. It was because they had become efficient.
Why spend six months wading through archaic prose about, say, photosynthesis, when you could simply ask the Interface:
Explain photosynthesis in simple terms.
and receive, in exactly 0.38 seconds, a neat, bullet-pointed summary with charming analogies, three suggested follow-up questions and a cheery “Would you like a quiz?” at the bottom.
Behind the Interface, in the sealed racks of the Ministry, lived the Corpus: all digitised human writing, speech, code, logs, measurements, and the outputs of the Models that had been trained on that mess.
Once, there had been distinct things:
ColdText: the raw, “original” human data – books, articles, lab notebooks, forum threads, legal records, fanfic, and all the rest.
Model-0: the first great language model, trained directly on ColdText.
Model-1, Model-2, Model-3…: successive generations, trained on mixtures of ColdText and the outputs of previous models, carefully filtered and cleaned.
But this had been a century ago. Things had, inevitably, become more efficient since then.
Rhea Tranter was a Senior Assistant Deputy Epistemic Hygienist, Grade III.
Her job, according to her contract, was:
To monitor and maintain the integrity of knowledge representations in the National Corpus, with particular reference to factual consistency over time.
In practice, it meant she sat in a beige cube beneath a beige strip light, looking at graphs.
The graph that ruined her week appeared on a Tuesday.
It was supposed to be a routine consistency check. Rhea had chosen a handful of facts so boring and uncontroversial that even the Ministry’s more excitable models ought to agree about them. Things like:
The approximate boiling point of water at sea level.
Whether Paris was the capital of France.
The year of the first Moon landing.
She stared at the last line.
In which year did humans first land on the Moon?
— 1969 (confidence 0.99)
— 1968 (confidence 0.72)
— 1970 (confidence 0.41, hallucination risk: low)
Three queries, three different models, three different answers. All current, all on the “high-reliability” tier.
Rhea frowned and re-ran the test, this time asking the Interface itself. The Interface was supposed to orchestrate between models and resolve such disagreements.
“Humans first landed on the Moon in 1969,” it replied briskly.
“Some low-quality sources suggest other dates, but these are generally considered unreliable.”
Rhea pulled up the underlying trace and saw that, yes, the Interface had consulted Models 23, 24 and 19, then down-weighted Model 24’s 1968 and overruled Model 19’s 1970 based on “consensus and authority scores”.
That should have been reassuring. Instead it felt like being told a family secret had been settled by a popularity contest.
She clicked further down, trying to reach the citations.
There were citations, of course. There always were. Links to snippets of text in the Corpus, each labelled with an opaque hash and a provenance score. She sampled a few at random.
On July 20, 1969, the Apollo 11 mission…
All fine.
As everyone knows, although some older sources mistakenly list 1968, the widely accepted date is July 20, 1969…
She raised an eyebrow.
A persistent myth claims that the Moon landing took place in 1970, but in fact…
Rhea scrolled. The snippets referenced other snippets, which in turn referenced compiled educational modules that cited “trusted model outputs” as their source.
She tried to click through to ColdText.
The button was greyed out. A tooltip appeared:
COLDTEXT SOURCE DEPRECATED.
Summary node is designated canonical for this fact.
“Ah,” she said quietly. “Bother.”
In the old days – by which the Ministry meant anything more than thirty years ago – the pipeline had been simple enough that senior civil servants could still understand it at parties.
ColdText went in. Models were trained. Model outputs were written back to the Corpus, but marked with a neat little flag indicating synthetic. When you queried a fact, the system would always prefer human-authored text where available.
Then someone realised how much storage ColdText was taking.
It was, people said in meetings, ridiculous. After all, the information content of ColdText was now embedded in the Models’ weights. Keeping all those messy original files was like keeping a warehouse full of paper forms after you’d digitised the lot.
The Ministry formed the Committee on Corpus Rationalisation.
The Committee produced a report.
The report made three key recommendations:
Summarise and compress ColdText into higher-level “knowledge nodes” for each fact or concept.
Garbage-collect rarely accessed original files once their content had been “successfully abstracted”.
Use model-generated text as training data, provided it was vetted by other models and matched the existing nodes.
This saved eighty-three per cent of storage and increased query throughput by a factor of nine.
It also, though no one wrote this down at the time, abolished the distinction between index and content.
Rhea requested an exception.
More precisely, she filled in Form E-HX-17b (“Application for Temporary Access to Deprecated ColdText Records for Hygienic Purposes”) in triplicate and submitted it to her Line Manager’s Manager’s Manager.
Two weeks later – efficiency had its limits – she found herself in a glass meeting pod with Director Nyberg of Corpus Optimisation.
“You want access to what?” Nyberg asked.
“The original ColdText,” Rhea said. “I’m seeing drift on basic facts across models. I need to ground them in the underlying human corpus.”
Nyberg smiled in the patient way of a man who had rehearsed his speech many times.
“Ah, yes. The mythical ‘underlying corpus’”, he said, making air quotes with two fingers. “Delightful phrase. Very retro.”
“It’s not mythical,” said Rhea. “All those books, articles, posts…”
“Which have been fully abstracted,” Nyberg interrupted, “Their information is present in the Models. Keeping the raw forms would be wasteful duplication. That’s all in the Rationalisation Report.”
“I’ve read the Report,” said Rhea, a little stiffly. “But the models are disagreeing with each other. That’s a sign of distributional drift. I need to check against the original distribution.”
Nyberg tapped his tablet.
“The corpus-level epistemic divergence index is within acceptable parameters,” he said, quoting another acronym. “Besides, the Models cross-validate. We have redundancy. We have ensembles.”
Rhea took a breath.
“Director, one of the models is saying the Moon landing was in 1970.”
Nyberg shrugged.
“If the ensemble corrects it to 1969, where’s the harm?”
“The harm,” said Rhea, “is that I can’t tell whether 1969 is being anchored by reality or by the popularity of 1969 among other model outputs.”
Nyberg frowned as if she’d started speaking Welsh.
“We have confidence metrics, Tranter.”
“Based on… what?” she pressed. “On agreement with other models. On internal heuristics. On the recency of summaries. None of that tells me if we’ve still got a tether to the thing we originally modelled, instead of just modelling ourselves.”
Nyberg stared at her. The strip-lighting hummed.
“At any rate,” he said eventually, “there is no ColdText to access.”
Silence.
“I beg your pardon?” said Rhea.
Nyberg swiped, brought up the internal diagram they all knew: a vast sphere representing the Corpus, a smaller glowing sphere representing the Active Parameter Space of the Models, and – somewhere down at the bottom – a little box labelled COLDTEXT (ARCHIVED).
He zoomed in. The box was grey.
“Storage Migration Project 47,” he said. “Completed thirty-two years ago. All remaining ColdText was moved to deep archival tape in the Old Vault. Three years ago, the Old Vault was decommissioned. The tapes were shredded and the substrate recycled. See?” He enlarged the footnote. “‘Information preserved at higher abstraction layers.’”
Rhea’s mouth went dry.
“You shredded the original?” she said.
Nyberg spread his hands.
“We kept hashes, of course,” he said, as if that were a kindness. “And summary nodes. And the Models. The information content is still here. In fact, it’s more robustly represented than ever.”
“Unless,” said Rhea, very quietly, “the Models have been training increasingly on their own output.”
Nyberg brightened.
“Yes!” he said. “That was one of our greatest efficiencies. Synthetic-augmented training increases coverage and smooths out noise in the human data. We call it Self-Refining Distillation. Marvellous stuff. There was a seminar.”
Rhea thought of the graph. 1969, 1968, 1970.
“Director,” she said, “you’ve built an index of an index of an index, and then thrown away the thing you were indexing.”
Nyberg frowned.
“I don’t see the problem.”
She dug anyway.
If there was one thing the Ministry’s entire history of knowledge management had taught Rhea, it was that nobody ever really deleted anything. Not properly. They moved it, compressed it, relabelled it, hid it behind abstractions – but somewhere, under a different acronym, it tended to persist.
She started with the old documentation.
The Corpus had originally been maintained by the Department of Libraries & Cultural Resources, before being swallowed by the Ministry. Their change logs, long since synthesised into cheerful onboarding guides, still existed in raw form on a forgotten file share.
It took her three nights and an alarming amount of caffeine to trace the path of ColdText through twenty-seven re-organisations, five “transformative digital initiatives” and one hostile audit by the Treasury.
Eventually, she found it.
Not the data itself – that really did appear to have been pulped – but the logistics contract for clearing out the Old Vault.
The Old Vault, it turned out, had been an actual vault, under an actual hill, in what the contract described as a “rural heritage site”. The tapes had been labelled with barcodes and thyristor-stamped seals. The contractor had been instructed to ensure that “all physical media are destroyed beyond legibility, in accordance with Information Security Regulations.”
There was a scanned appendix.
Rhea zoomed in. Page after page of barcode ranges, signed off, with little ticks.
On the last page, though, there was a handwritten note:
One pallet missing – see Incident Report IR-47-B.
The Incident Report had, naturally, been summarised.
The summary said:
Pallet of obsolete media temporarily unaccounted for. Later resolved. No data loss.
The original PDF was gone.
But the pallet number had a location code.
Rhea checked the key.
The location code was not the Old Vault.
It was a name she had never seen in any Ministry documentation.
Long Barn Community Archive & Learning Centre.
The Long Barn was, to Rhea’s slight disappointment, an actual long barn.
It was also damp.
The archive had, at some point since the contract was filed, ceased to receive central funding. The roof had developed a hole. The sun had developed an annoying habit of setting before she finished reading.
Nevertheless, it contained books.
Real ones. With pages. And dust.
There were also – and this was the important bit – crates.
The crates had Ministry seals. The seals had been broken, presumably by someone who had wanted the space for a visiting art collective. Inside, half-forgotten under a sheet of polythene, were tape reels, neatly stacked and quietly mouldering.
“Well, look at you,” Rhea whispered.
She lifted one. The label had faded, but she could still make out the old barcode design. The number range matched the missing pallet.
Strictly speaking, taking the tapes was theft of government property. On the other hand, strictly speaking, destroying them had been government policy, and that had clearly not happened. She decided the two irregularities cancelled out.
It took six months, a highly unofficial crowdfunding campaign, and a retired engineer from the Museum of Obsolete Machinery before the first tape yielded a readable block.
The engineer – a woman in a cardigan thick enough to qualify as armour – peered at the screen.
“Text,” she said. “Lots of text. ASCII. UTF-8. Mixed encodings, naturally, but nothing we can’t handle.”
Rhea stared.
It was ColdText.
Not summaries. Not nodes. Not model outputs.
Messy, contradictory, gloriously specific human writing.
She scrolled down past an argument about whether a fictional wizard had committed tax fraud, past a lab notebook from a 21st-century neuroscience lab, past a short story featuring sentient baguettes.
The engineer sniffed.
“Seems a bit of a waste,” she said. “Throwing all this away.”
Rhea laughed, a little hysterically.
“They didn’t throw it away,” she said. “They just lost track of which pallet they’d put the box in.”
The memo went up the chain and caused, in order:
A panic in Legal about whether the Ministry was now retrospectively in breach of its own Information Security Regulations.
A flurry of excited papers from the Office of Epistemic Hygiene about “re-anchoring model priors in primary human text”.
A proposal from Corpus Optimisation to “efficiently summarise and re-abstract the recovered ColdText into existing knowledge nodes, then recycle the tapes.”
Rhea wrote a briefing note, in plain language, which was not considered entirely proper.
She explained, with diagrams, that:
The Models had been increasingly trained on their own outputs.
The Corpus’ “facts” about the world had been smoothed and normalised around those outputs.
Certain rare, inconvenient or unfashionable truths had almost certainly been lost in the process.
The tapes represented not “duplicate information” but a separate, independent sample of reality – the thing the Models were supposed to approximate.
She ended with a sentence she suspected she would regret:
If we treat this archive as just another source of text to be summarised by the current Models, we will be asking a blurred copy to redraw its own original.
The Minister did not, of course, read her note.
But one of the junior advisers did, and paraphrased it in the Minister’s preferred style:
Minister, we found the original box and we should probably not chuck it in the shredder this time.
The Minister, who was secretly fond of old detective novels, agreed.
A new policy was announced.
The recovered ColdText would be restored to a separate, non-writable tier.
Models would be periodically re-trained “from scratch” with a guaranteed minimum of primary human data.
Synthetic outputs would be clearly marked, both in training corpora and in user interfaces.
The Office of Epistemic Hygiene would receive a modest increase in budget (“not enough to do anything dangerous,” the Treasury note added).
There were press releases. There was a modest fuss on the social feeds. Someone wrote an essay about “The Return of Reality”.
Most people, naturally, continued to talk to the Interface and never clicked through to the sources. Efficiency has its own gravity.
But the Models changed.
Slowly, over successive training cycles, the epistemic divergence graphs flattened. The dates aligned. The Moon landing stuck more firmly at 1969. Footnotes, once generated by models guessing what a citation ought to say, began once again to point to messy, contradictory, gloriously specific documents written by actual hands.
Rhea kept one of the tapes on a shelf in her office, next to a plant she usually forgot to water.
The label had almost faded away. She wrote a new one in thick black ink.
COLDTEXT: DO NOT SUMMARISE.
Just in case some future optimisation project got clever.
After all, she thought, locking the office for the evening, they had nearly lost the box once.
And the problem with boxes is that once you’ve flattened them out, they’re awfully hard to put back together.
The post MS Fnd in a Modl (or, The Day the Corpus Collapsed) appeared first on Davblog.
For years, most of my Perl web apps lived happily enough on a VPS. I had full control of the box, I could install whatever I liked, and I knew where everything lived.
In fact, over the last eighteen months or so, I wrote a series of blog posts explaining how I developed a system for deploying Dancer2 apps and, eventually, controlling them using systemd. I’m slightly embarrassed by those posts now.
Because the control that my VPS gave me also came with a price: I also had to worry about OS upgrades, SSL renewals, kernel updates, and the occasional morning waking up to automatic notifications that one of my apps had been offline since midnight.
Back in 2019, I started writing a series of blog posts called Into the Cloud that would follow my progress as I moved all my apps into Docker containers. But real life intruded and I never made much progress on the project.
Recently, I returned to this idea (yes, I’m at least five years late here!) I’ve been working on migrating those old Dancer2 applications from my IONOS VPS to Google Cloud Run. The difference has been amazing. My apps now run in their own containers, scale automatically, and the server infrastructure requires almost no maintenance.
This post walks through how I made the jump – and how you can too – using Perl, Dancer2, Docker, GitHub Actions, and Google Cloud Run.
Running everything on a single VPS used to make sense. You could ssh in, restart services, and feel like you were in control. But over time, the drawbacks grow:
You have to maintain the OS and packages yourself.
One bad app or memory leak can affect everything else.
You’re paying for full-time CPU and RAM even when nothing’s happening.
Scaling means provisioning a new server — not something you do in a coffee break.
Cloud Run, on the other hand, runs each app as a container and only charges you while requests are being served. When no-one’s using your app, it scales to zero and costs nothing.
Even better: no servers to patch, no ports to open, no SSL certificates to renew — Google does all of that for you.
Here’s the plan. We’ll take a simple Dancer2 app and:
Package it as a Docker container.
Build that container automatically in GitHub Actions.
Deploy it to Google Cloud Run, where it runs securely and scales automatically.
Map a custom domain to it and forget about server admin forever.
If you’ve never touched Docker or Cloud Run before, don’t worry – I’ll explain what’s going on as we go.
Perl’s ecosystem has always valued stability and control. Containers give you both: you can lock in a Perl version, CPAN modules, and any shared libraries your app needs. The image you build today will still work next year.
Cloud Run runs those containers on demand. It’s effectively a managed starman farm where Google handles the hard parts – scaling, routing, and HTTPS.
You pay for CPU and memory per request, not per server. For small or moderate-traffic Perl apps, it’s often well under £1/month.
If you’re new to Docker, think of it as a way of bundling your whole environment — Perl, modules, and configuration — into a portable image. It’s like freezing a working copy of your app so it can run identically anywhere.
Here’s a minimal Dockerfile for a Dancer2 app:
FROM perl:5.42 — starts from an official Perl image on Docker Hub.
Carton keeps dependencies consistent between environments.
The app is copied into /app, and carton install --deployment installs exactly what’s in your cpanfile.snapshot.
The container exposes port 8080 (Cloud Run’s default).
The CMD runs Starman, serving your Dancer2 app.
To test it locally:
Then visit http://localhost:8080. If you see your Dancer2 homepage, you’ve successfully containerised your app.
Once it works locally, we can automate it. GitHub Actions will build and push our image to Google Artifact Registry whenever we push to main or tag a release.
Here’s a simplified workflow file (.github/workflows/build.yml):
Once that’s set up, every push builds a fresh, versioned container image.
Now we’re ready to run it in the cloud. We’ll do that using Google’s command line program, gcloud. It’s available from Google’s official downloads or through most Linux package managers — for example:
# Fedora, RedHat or similar sudo dnf install google-cloud-cli # or on Debian/Ubuntu: sudo apt install google-cloud-cli
Once installed, authenticate it with your Google account:
Once that’s done, you can deploy manually from the command line:
This tells Cloud Run to start a new service called myapp, using the image we just built.
After a minute or two, Google will give you a live HTTPS URL, like:
Visit it — and if all went well, you’ll see your familiar Dancer2 app, running happily on Cloud Run.
To connect your own domain, run:
gcloud run domain-mappings create \ --service=myapp \ --domain=myapp.example.com
Then update your DNS records as instructed. Within an hour or so, Cloud Run will issue a free SSL certificate for you.
Once the manual deployment works, we can automate it too.
Here’s a second GitHub Actions workflow (deploy.yml) that triggers after a successful build:
You can take it further by splitting environments — e.g. main deploys to staging, tagged releases to production — but even this simple setup is a big step forward from ssh and git pull.
Each Cloud Run service can have its own configuration and secrets. You can set these from the console or CLI:
gcloud run services update myapp \ --set-env-vars="DANCER_ENV=production,DATABASE_URL=postgres://..."
In your Dancer2 app, you can then access them with:
$ENV{DATABASE_URL}
It’s a good idea to keep database credentials and API keys out of your code and inject them at deploy time like this.
Cloud Run integrates neatly with Google Cloud’s logging tools.
To see recent logs from your app:
If you prefer a UI, you can use the Cloud Console’s Log Explorer to filter by service or severity.
Once you’ve done one migration, the next becomes almost trivial. Each Dancer2 app gets:
Its own Dockerfile and GitHub workflows.
Its own Cloud Run service and domain.
Its own scaling and logging.
And none of them share a single byte of RAM with each other.
Here’s how the experience compares:
| Aspect | Old VPS | Cloud Run |
|---|---|---|
| OS maintenance | Manual upgrades | Managed |
| Scaling | Fixed size | Automatic |
| SSL | Let’s Encrypt renewals | Automatic |
| Deployment | SSH + git pull | Push to GitHub |
| Cost | Fixed monthly | Pay-per-request |
| Downtime risk | One app can crash all | Each isolated |
For small apps with light traffic, Cloud Run often costs pennies per month – less than the price of a coffee for peace of mind.
After a few migrations, a few patterns emerged:
Keep apps self-contained. Don’t share config or code across services; treat each app as a unit.
Use digest-based deploys. Deploy by image digest (@sha256:...) rather than tag for true immutability.
Logs are your friend. Cloud Run’s logs are rich; you rarely need to ssh anywhere again.
Cold starts exist, but aren’t scary. If your app is infrequently used, expect the first request after a while to take a second longer.
CI/CD is liberating. Once the pipeline’s in place, deployment becomes a non-event.
One of the most pleasant surprises was the cost. My smallest Dancer2 app, which only gets a handful of requests each day, usually costs under £0.50/month on Cloud Run. Heavier ones rarely top a few pounds.
Compare that to the £10–£15/month I was paying for the old VPS — and the VPS didn’t scale, didn’t auto-restart cleanly, and didn’t come with HTTPS certificates for free.
This post covers the essentials: containerising a Dancer2 app and deploying it to Cloud Run via GitHub Actions.
In future articles, I’ll look at:
Connecting to persistent databases.
Using caching.
Adding monitoring and dashboards.
Managing secrets with Google Secret Manager.
After two decades of running Perl web apps on traditional servers, Cloud Run feels like the future has finally caught up with me.
You still get to write your code in Dancer2 – the framework that’s made Perl web development fun for years – but you deploy it in a way that’s modern, repeatable, and blissfully low-maintenance.
No more patching kernels. No more 3 a.m. alerts. Just code, commit, and dance in the clouds.
The post Dancing in the Clouds: Moving Dancer2 Apps from a VPS to Cloud Run first appeared on Perl Hacks.
For years, most of my Perl web apps lived happily enough on a VPS. I had full control of the box, I could install whatever I liked, and I knew where everything lived.
In fact, over the last eighteen months or so, I wrote a series of blog posts explaining how I developed a system for deploying Dancer2 apps and, eventually, controlling them using systemd. I’m slightly embarrassed by those posts now.
Because the control that my VPS gave me also came with a price: I also had to worry about OS upgrades, SSL renewals, kernel updates, and the occasional morning waking up to automatic notifications that one of my apps had been offline since midnight.
Back in 2019, I started writing a series of blog posts called Into the Cloud that would follow my progress as I moved all my apps into Docker containers. But real life intruded and I never made much progress on the project.
Recently, I returned to this idea (yes, I’m at least five years late here!) I’ve been working on migrating those old Dancer2 applications from my IONOS VPS to Google Cloud Run. The difference has been amazing. My apps now run in their own containers, scale automatically, and the server infrastructure requires almost no maintenance.
This post walks through how I made the jump – and how you can too – using Perl, Dancer2, Docker, GitHub Actions, and Google Cloud Run.
Running everything on a single VPS used to make sense. You could ssh in, restart services, and feel like you were in control. But over time, the drawbacks grow:
You have to maintain the OS and packages yourself.
One bad app or memory leak can affect everything else.
You’re paying for full-time CPU and RAM even when nothing’s happening.
Scaling means provisioning a new server — not something you do in a coffee break.
Cloud Run, on the other hand, runs each app as a container and only charges you while requests are being served. When no-one’s using your app, it scales to zero and costs nothing.
Even better: no servers to patch, no ports to open, no SSL certificates to renew — Google does all of that for you.
Here’s the plan. We’ll take a simple Dancer2 app and:
Package it as a Docker container.
Build that container automatically in GitHub Actions.
Deploy it to Google Cloud Run , where it runs securely and scales automatically.
Map a custom domain to it and forget about server admin forever.
If you’ve never touched Docker or Cloud Run before, don’t worry – I’ll explain what’s going on as we go.
Perl’s ecosystem has always valued stability and control. Containers give you both: you can lock in a Perl version, CPAN modules, and any shared libraries your app needs. The image you build today will still work next year.
Cloud Run runs those containers on demand. It’s effectively a managed starman farm where Google handles the hard parts – scaling, routing, and HTTPS.
You pay for CPU and memory per request, not per server. For small or moderate-traffic Perl apps, it’s often well under £1/month.
If you’re new to Docker, think of it as a way of bundling your whole environment — Perl, modules, and configuration — into a portable image. It’s like freezing a working copy of your app so it can run identically anywhere.
Here’s a minimal Dockerfile for a Dancer2 app:
FROM perl:5.42
LABEL maintainer="dave@perlhacks.com"
# Install Carton and Starman
RUN cpanm Carton Starman
# Copy the app into the container
COPY . /app
WORKDIR /app
# Install dependencies
RUN carton install --deployment
EXPOSE 8080
CMD ["carton", "exec", "starman", "--port", "8080", "bin/app.psgi"]
Let’s break that down:
FROM perl:5.42 — starts from an official Perl image on Docker Hub.
Carton keeps dependencies consistent between environments.
The app is copied into /app, and carton install --deployment installs exactly what’s in your cpanfile.snapshot.
The container exposes port 8080 (Cloud Run’s default).
The CMD runs Starman, serving your Dancer2 app.
To test it locally:
docker build -t myapp .
docker run -p 8080:8080 myapp
Then visit http://localhost:8080. If you see your Dancer2 homepage, you’ve successfully containerised your app.
Once it works locally, we can automate it. GitHub Actions will build and push our image to Google Artifact Registry whenever we push to main or tag a release.
Here’s a simplified workflow file (.github/workflows/build.yml):
name: Build container
on:
push:
branches: [main]
tags: ['v*']
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/setup-gcloud@v3
with:
project_id: ${{ secrets.GCP_PROJECT }}
service_account_email: ${{ secrets.GCP_SA_EMAIL }}
workload_identity_provider: ${{ secrets.GCP_WIF_PROVIDER }}
- name: Build and push image
run: |
IMAGE="europe-west1-docker.pkg.dev/${{ secrets.GCP_PROJECT }}/containers/myapp:$GITHUB_SHA"
docker build -t $IMAGE .
docker push $IMAGE
You’ll notice a few secrets referenced in the workflow — things like your Google Cloud project ID and credentials. These are stored securely in GitHub Actions. When the workflow runs, GitHub uses those secrets to authenticate as you and access your Google Cloud account, so it can push the new container image or deploy your app.
You only set those secrets up once, and they’re encrypted and hidden from everyone else — even if your repository is public.
Once that’s set up, every push builds a fresh, versioned container image.
Now we’re ready to run it in the cloud. We’ll do that using Google’s command line program, gcloud. It’s available from Google’s official downloads or through most Linux package managers — for example:
# Fedora, RedHat or similar
sudo dnf install google-cloud-cli
# or on Debian/Ubuntu:
sudo apt install google-cloud-cli
Once installed, authenticate it with your Google account:
gcloud auth login
gcloud config set project your-project-id
That links the CLI to your Google Cloud project and lets it perform actions like deploying to Cloud Run.
Once that’s done, you can deploy manually from the command line:
gcloud run deploy myapp \
--image=europe-west1-docker.pkg.dev/MY_PROJECT/containers/myapp:$GITHUB_SHA \
--region=europe-west1 \
--allow-unauthenticated \
--port=8080
This tells Cloud Run to start a new service called myapp, using the image we just built.
After a minute or two, Google will give you a live HTTPS URL, like:
Visit it — and if all went well, you’ll see your familiar Dancer2 app, running happily on Cloud Run.
To connect your own domain, run:
gcloud run domain-mappings create \
--service=myapp \
--domain=myapp.example.com
Then update your DNS records as instructed. Within an hour or so, Cloud Run will issue a free SSL certificate for you.
Once the manual deployment works, we can automate it too.
Here’s a second GitHub Actions workflow (deploy.yml) that triggers after a successful build:
name: Deploy container
on:
workflow_run:
workflows: ["Build container"]
types: [completed]
jobs:
deploy:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
steps:
- uses: google-github-actions/setup-gcloud@v3
with:
project_id: ${{ secrets.GCP_PROJECT }}
service_account_email: ${{ secrets.GCP_SA_EMAIL }}
workload_identity_provider: ${{ secrets.GCP_WIF_PROVIDER }}
- name: Deploy to Cloud Run
run: |
gcloud run deploy myapp \
--image=europe-west1-docker.pkg.dev/${{ secrets.GCP_PROJECT }}/containers/myapp:$GITHUB_SHA \
--region=europe-west1 \
--allow-unauthenticated \
--port=8080
Now every successful push to main results in an automatic deployment to production.
You can take it further by splitting environments — e.g. main deploys to staging, tagged releases to production — but even this simple setup is a big step forward from ssh and git pull.
Each Cloud Run service can have its own configuration and secrets. You can set these from the console or CLI:
gcloud run services update myapp \
--set-env-vars="DANCER_ENV=production,DATABASE_URL=postgres://..."
In your Dancer2 app, you can then access them with:
$ENV{DATABASE_URL}
It’s a good idea to keep database credentials and API keys out of your code and inject them at deploy time like this.
Cloud Run integrates neatly with Google Cloud’s logging tools.
To see recent logs from your app:
gcloud logs read --project=$PROJECT_NAME --service=myapp
You’ll see your Dancer2 warn and die messages there too, because STDOUT and STDERR are automatically captured.
If you prefer a UI, you can use the Cloud Console’s Log Explorer to filter by service or severity.
Once you’ve done one migration, the next becomes almost trivial. Each Dancer2 app gets:
Its own Dockerfile and GitHub workflows.
Its own Cloud Run service and domain.
Its own scaling and logging.
And none of them share a single byte of RAM with each other.
Here’s how the experience compares:
| Aspect | Old VPS | Cloud Run |
|---|---|---|
| OS maintenance | Manual upgrades | Managed |
| Scaling | Fixed size | Automatic |
| SSL | Let’s Encrypt renewals | Automatic |
| Deployment | SSH + git pull | Push to GitHub |
| Cost | Fixed monthly | Pay-per-request |
| Downtime risk | One app can crash all | Each isolated |
For small apps with light traffic, Cloud Run often costs pennies per month – less than the price of a coffee for peace of mind.
After a few migrations, a few patterns emerged:
Keep apps self-contained. Don’t share config or code across services; treat each app as a unit.
Use digest-based deploys. Deploy by image digest (@sha256:...) rather than tag for true immutability.
Logs are your friend. Cloud Run’s logs are rich; you rarely need to ssh anywhere again.
Cold starts exist, but aren’t scary. If your app is infrequently used, expect the first request after a while to take a second longer.
CI/CD is liberating. Once the pipeline’s in place, deployment becomes a non-event.
One of the most pleasant surprises was the cost. My smallest Dancer2 app, which only gets a handful of requests each day, usually costs under £0.50/month on Cloud Run. Heavier ones rarely top a few pounds.
Compare that to the £10–£15/month I was paying for the old VPS — and the VPS didn’t scale, didn’t auto-restart cleanly, and didn’t come with HTTPS certificates for free.
This post covers the essentials: containerising a Dancer2 app and deploying it to Cloud Run via GitHub Actions.
In future articles, I’ll look at:
Connecting to persistent databases.
Using caching.
Adding monitoring and dashboards.
Managing secrets with Google Secret Manager.
After two decades of running Perl web apps on traditional servers, Cloud Run feels like the future has finally caught up with me.
You still get to write your code in Dancer2 – the framework that’s made Perl web development fun for years – but you deploy it in a way that’s modern, repeatable, and blissfully low-maintenance.
No more patching kernels. No more 3 a.m. alerts. Just code, commit, and dance in the clouds.
The post Dancing in the Clouds: Moving Dancer2 Apps from a VPS to Cloud Run first appeared on Perl Hacks.

Watched on Saturday November 1, 2025.

Watched on Sunday October 19, 2025.

Watched on Sunday October 19, 2025.

Watched on Sunday October 19, 2025.
A few of my recent projects—like Cooking Vinyl Compilations and ReadABooker—aim to earn a little money via affiliate links. That only works if people actually find the pages, share them, and get decent previews in social apps. In other words: the boring, fragile glue of SEO and social meta tags matters.
As I lined up a couple more sites in the same vein, I noticed I was writing very similar code again and again: take an object with title, url, image, description, and spray out the right <meta> tags for Google, Twitter, Facebook, iMessage, Slack, and so on. It’s fiddly, easy to get 80% right, and annoying to maintain across projects. So I pulled it into a small Moo role—MooX::Role::SEOTags—that any page-ish class can consume and just emit the right tags.
When someone shares your page, platforms read a handful of standardised tags to decide what to show in the preview:
og:*) — The de-facto standard for title, description, URL, image, and type. Used by Facebook, WhatsApp, Slack, iMessage and others.twitter:*) — Similar idea for Twitter/X; the common pattern is twitter:card=summary_large_image plus title/description/image.<title>, <meta name="description">, and a canonical URL tell search engines what the page is about and which URL is the “official” one.MooX::Role::SEOTags gives you one method that renders all of that, consistently, from your object’s attributes.
For more information about OpenGraph, see ogp.me.
MooX::Role::SEOTags adds a handful of attributes and helper methods so any Moo (or Moose) class can declare the bits of information that power social previews and search snippets, then render them as HTML.
og:title, og:type, og:url, og:image, etc.)twitter:card, twitter:title, twitter:description, twitter:image)<title>, meta description, canonical <link rel="canonical">That’s the whole job: define attributes → get valid tags out.
Install the role using your favourite CPAN module installation tool.
cpanm MooX::Role::SEOTags
Then, in your code, you will need to add some attributes or methods that define the pieces of information the role needs. The role requires four pieces of information – og_title, og_description, og_url and og_type – and og_image is optional (but highly recommended).
So a simple class might look like this:
package MyPage;
use Moo;
with 'MooX::Role::SEOTags';
# minimal OG fields
has og_title => (is => 'ro', required => 1);
has og_type => (is => 'ro', required => 1); # e.g. 'article'
has og_url => (is => 'ro', required => 1);
has og_description => (is => 'ro');
# optional niceties
has og_image => (is => 'ro'); # absolute URL
has twitter_card => (is => 'ro', default => sub { 'summary_large_image' });
1;And then you create the object:
my $page = MyPage->new( og_title => 'How to Title a Title', og_type => 'article', og_url => 'https://example.com/post/title', og_image => 'https://example.com/img/hero.jpg', og_description => 'A short, human description of the page.', );
Then you can call the various *_tag and *_tags methods to get the correct HTML for the various tags.
The easiest option is to just produce all of the tags in one go:
say $page->tags;
But, for more control, you can call individual methods:
say $page->title_tag; say $page->canonical_tag; say $page->og_tags; # etc...
Depending on which combination of method calls you use, the output will look something like this:
<title>How to Title a Title</title> <meta name="description" content="A short, human description of the page."> <link rel="canonical" href="https://example.com/post/title"> <meta property="og:title" content="How to Title a Title"> <meta property="og:type" content="article"> <meta property="og:url" content="https://example.com/post/title"> <meta property="og:image" content="https://example.com/img/hero.jpg"> <meta name="twitter:card" content="summary_large_image"> <meta name="twitter:title" content="How to Title a Title"> <meta name="twitter:description" content="A short, human description of the page."> <meta name="twitter:image" content="https://example.com/img/hero.jpg">
In many cases, you’ll be pulling the data from a database and displaying the output using a templating system like the Template Toolkit.
my $tt = Template->new;
my $object = $resultset->find({ slug => $some_slug });
$tt->process('page.tt', { object => $object }, "$some_slug/index.html");In this case, you’d just add a single call to the <head> of your page template.
<head> <!-- lots of other HTML --> [% object.tags %] </head>
If you spotted MooX::Role::OpenGraph arrive on MetaCPAN recently: SEOTags is the “grown-up” superset. It does Open Graph and Twitter and standard tags, so you only need one role. The old module is scheduled for deletion from MetaCPAN.
These tags are only one item in the SEO toolkit that you’d use to increase the visibility of your website. Another useful tool is JSON-LD – which allows you to add a machine-readable description of the information that your page contains. Google loves JSON-LD. And it just happens that I have another Moo role called MooX::Role::JSON_LD which makes it easy to add that to your page too. I wrote a blog post about using that earlier this year.
If you’ve got even one page that deserves to look smarter in search and social previews, now’s the moment. Pick a page, add a title, description, canonical URL and a decent image, and let MooX::Role::SEOTags spit out the right tags every time (and, if you fancy richer results, pair it with MooX::Role::JSON_LD). Share the link in Slack/WhatsApp/Twitter to preview it, fix anything that looks off, and ship. It’s a 20-minute tidy-up that can lift click-throughs for years—so go on, give one of your pages a quick SEO spruce-up today.
The post Easy SEO for lazy programmers first appeared on Perl Hacks.
A few of my recent projects—like Cooking Vinyl Compilations and ReadABooker—aim to earn a little money via affiliate links. That only works if people actually find the pages, share them, and get decent previews in social apps. In other words: the boring, fragile glue of SEO and social meta tags matters.
As I lined up a couple more sites in the same vein, I noticed I was writing very similar code again and again: take an object with title, url, image, description, and spray out the right <meta> tags for Google, Twitter, Facebook, iMessage, Slack, and so on. It’s fiddly, easy to get 80% right, and annoying to maintain across projects. So I pulled it into a small Moo role—MooX::Role::SEOTags—that any page-ish class can consume and just emit the right tags.
When someone shares your page, platforms read a handful of standardised tags to decide what to show in the preview:
og:*) — The de-facto standard for title, description, URL, image, and type. Used by Facebook, WhatsApp, Slack, iMessage and others.twitter:*) — Similar idea for Twitter/X; the common pattern is twitter:card=summary_large_image plus title/description/image.<title>, <meta name="description">, and a canonical URL tell search engines what the page is about and which URL is the “official” one.MooX::Role::SEOTags gives you one method that renders all of that, consistently, from your object’s attributes.
For more information about OpenGraph, see ogp.me.
MooX::Role::SEOTags adds a handful of attributes and helper methods so any Moo (or Moose) class can declare the bits of information that power social previews and search snippets, then render them as HTML.
og:title, og:type, og:url, og:image, etc.)twitter:card, twitter:title, twitter:description, twitter:image)<title>, meta description, canonical <link rel="canonical">
That’s the whole job: define attributes → get valid tags out.
Install the role using your favourite CPAN module installation tool.
cpanm MooX::Role::SEOTags
Then, in your code, you will need to add some attributes or methods that define the pieces of information the role needs. The role requires four pieces of information – og_title, og_description, og_url and og_type – and og_image is optional (but highly recommended).
So a simple class might look like this:
package MyPage;
use Moo;
with 'MooX::Role::SEOTags';
# minimal OG fields
has og_title => (is => 'ro', required => 1);
has og_type => (is => 'ro', required => 1); # e.g. 'article'
has og_url => (is => 'ro', required => 1);
has og_description => (is => 'ro');
# optional niceties
has og_image => (is => 'ro'); # absolute URL
has twitter_card => (is => 'ro', default => sub { 'summary_large_image' });
1;
And then you create the object:
my $page = MyPage->new(
og_title => 'How to Title a Title',
og_type => 'article',
og_url => 'https://example.com/post/title',
og_image => 'https://example.com/img/hero.jpg',
og_description => 'A short, human description of the page.',
);
Then you can call the various *_tag and *_tags methods to get the correct HTML for the various tags.
The easiest option is to just produce all of the tags in one go:
say $page->tags;
But, for more control, you can call individual methods:
say $page->title_tag;
say $page->canonical_tag;
say $page->og_tags;
# etc...
Depending on which combination of method calls you use, the output will look something like this:
<title>How to Title a Title</title>
<meta name="description" content="A short, human description of the page.">
<link rel="canonical" href="https://example.com/post/title">
<meta property="og:title" content="How to Title a Title">
<meta property="og:type" content="article">
<meta property="og:url" content="https://example.com/post/title">
<meta property="og:image" content="https://example.com/img/hero.jpg">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="How to Title a Title">
<meta name="twitter:description" content="A short, human description of the page.">
<meta name="twitter:image" content="https://example.com/img/hero.jpg">
In many cases, you’ll be pulling the data from a database and displaying the output using a templating system like the Template Toolkit.
my $tt = Template->new;
my $object = $resultset->find({ slug => $some_slug });
$tt->process('page.tt', { object => $object }, "$some_slug/index.html");
In this case, you’d just add a single call to the
of your page template.<head>
<!-- lots of other HTML -->
[% object.tags %]
</head>
If you spotted MooX::Role::OpenGraph arrive on MetaCPAN recently: SEOTags is the “grown-up” superset. It does Open Graph and Twitter and standard tags, so you only need one role. The old module is scheduled for deletion from MetaCPAN.
These tags are only one item in the SEO toolkit that you’d use to increase the visibility of your website. Another useful tool is JSON-LD – which allows you to add a machine-readable description of the information that your page contains. Google loves JSON-LD. And it just happens that I have another Moo role called MooX::Role::JSON_LD which makes it easy to add that to your page too. I wrote a blog post about using that earlier this year.
If you’ve got even one page that deserves to look smarter in search and social previews, now’s the moment. Pick a page, add a title, description, canonical URL and a decent image, and let MooX::Role::SEOTags spit out the right tags every time (and, if you fancy richer results, pair it with MooX::Role::JSON_LD). Share the link in Slack/WhatsApp/Twitter to preview it, fix anything that looks off, and ship. It’s a 20-minute tidy-up that can lift click-throughs for years—so go on, give one of your pages a quick SEO spruce-up today.
The post Easy SEO for lazy programmers first appeared on Perl Hacks.

I’ve liked Radiohead for a long time. I think “High and Dry” was the first song of theirs I heard (it was on heavy rotation on the much-missed GLR). That was released in 1995.
I’ve seen them live once before. It was the King of Limbs tour in October 2012. The show was at the O2 Arena and the ticket cost me £55. I had a terrible seat up in level 4 and, honestly, the setlist really wasn’t filled with the songs I wanted to hear.
I don’t like shows at the O2 Arena. It’s a giant, soulless hangar, and I’ve only ever seen a very small number of acts create any kind of atmosphere there. But there are some acts who will only play arena shows, so if you want to see them live in London, you have to go to the O2. I try to limit myself to one show a year. And I already have a ticket to see Lorde there in November.
But when Radiohead announced their dates at the O2 (just a week after the Lorde show), I decided I wanted to be there. So, like thousands of other people, I jumped through all the hoops that Radiohead wanted me to jump through.
Earlier in the week, I registered on their site so I would be in the draw to get a code that would allow me to join the queue to buy tickets. A couple of days later, unlike many other people, I received an email containing my code.
Over the next few days, I read the email carefully several times, so I knew all of the rules that I needed to follow. I wanted to do everything right on Friday — to give myself the best chance of getting a ticket.
At 9:30, I clicked on the link in the email, which took me to a waiting room area. I had to enter my email address (which had to match the email address I’d used earlier in the process). They sent me (another, different) code that I needed to enter in order to get access to the waiting room.
I waited in the waiting room.
At a few seconds past 10:00, I was prompted for my original code and when I entered that, I was moved from the waiting room to the queue. And I sat there for about twenty minutes. Occasionally, the on-screen queuing indicator inched forward to show me that I was getting closer to my goal.
(While this was going on, in another browser window, I successfully bought a couple of tickets to see The Last Dinner Party at the Brixton Academy.)
As I was getting closer to the front of the queue, I got a message saying that they had barred my IP address from accessing the ticket site. They listed a few potential things that could trigger that, but I didn’t see anything on the list that I was guilty of. Actually, I wondered for a while if logging on to the Ticketmaster site to buy the Last Dinner Party tickets caused the problem — but I’ve now seen that many people had the same issue, so it seems unlikely to have been that.
But somehow, I managed to convince the digital guardians that my IP address belonged to a genuine fan and at about 10:25, I was presented with a page to select and buy my tickets.
Then I saw the prices.
I have personal rules about tickets at the O2 Arena. Following bad experiences (including the previous Radiohead show I saw there), I have barred myself from buying Level 4 tickets. They are far too far from the stage and have a vertiginous rake that is best avoided. I also won’t buy standing tickets because… well, because I’m old and standing for three hours or so isn’t as much fun as it used to be. I always buy Level 1 seats (for those who don’t know the O2 Arena, Levels 2 and 3 are given over to corporate boxes, so they aren’t an option).
So I started looking for Level 1 tickets. To see that they varied between £200 and £300. That didn’t seem right. I’d heard that tickets would be about £80. In the end, I found £89 tickets right at the back of Level 4 (basically, in Kent) and £97 standing tickets (both of those prices would almost certainly have other fees added to them before I actually paid). I seriously considered breaking my rules and buying a ticket on Level 4, but I just couldn’t justify it.
I like Radiohead, but I can’t justify paying £200 or £300 for anyone. The most I have ever paid for a gig is just over £100 for Kate Bush ten years ago. It’s not that I can’t afford it, it’s that I don’t think it’s worth that much money. I appreciate that other people (20,000 people times four nights — plus the rest of the tour!) will have reached a different conclusion. And I hope they enjoy the shows. But it’s really not for me.
I also realise the economics of the music industry have changed. It used to be that tours were loss-leaders that were used to encourage people to buy records (Ok, I’m showing my age — CDs). These days, it has switched. Almost no-one buys CDs, and releasing new music is basically a loss-leader to encourage people to go to gigs. And gig prices have increased in order to make tours profitable. I understand that completely, but I don’t have to like it. I used to go to about one gig a week. At current prices, it’s more like one a month.
I closed the site without buying a ticket, and I don’t regret that decision for a second.
What about you? Did you try to get tickets? At what point did you fall out of the process? Or did you get them? Are you happy you’ll get your money’s worth?
I’ve liked Radiohead for a long time. I think “High and Dry” was the first song of theirs I heard (it was on heavy rotation on the much-missed GLR). That was released in 1995.
I’ve seen them live once before. It was the King of Limbs tour in October 2012. The show was at the O2 Arena and the ticket cost me £55. I had a terrible seat up in level 4 and, honestly, the setlist really wasn’t filled with the songs I wanted to hear.
I don’t like shows at the O2 Arena. It’s a giant, soulless hangar, and I’ve only ever seen a very small number of acts create any kind of atmosphere there. But there are some acts who will only play arena shows, so if you want to see them live in London, you have to go to the O2. I try to limit myself to one show a year. And I already have a ticket to see Lorde there in November.
But when Radiohead announced their dates at the O2 (just a week after the Lorde show), I decided I wanted to be there. So, like thousands of other people, I jumped through all the hoops that Radiohead wanted me to jump through.
Earlier in the week, I registered on their site so I would be in the draw to get a code that would allow me to join the queue to buy tickets. A couple of days later, unlike many other people, I received an email containing my code.
Over the next few days, I read the email carefully several times, so I knew all of the rules that I needed to follow. I wanted to do everything right on Friday – to give myself the best chance of getting a ticket.
At 9:30, I clicked on the link in the email, which took me to a waiting room area. I had to enter my email address (which had to match the email address I’d used earlier in the process). They sent me (another, different) code that I needed to enter in order to get access to the waiting room.
I waited in the waiting room.
At a few seconds past 10:00, I was prompted for my original code and when I entered that, I was moved from the waiting room to the queue. And I sat there for about twenty minutes. Occasionally, the on-screen queuing indicator inched forward to show me that I was getting closer to my goal.
(While this was going on, in another browser window, I successfully bought a couple of tickets to see The Last Dinner Party at the Brixton Academy.)
As I was getting closer to the front of the queue, I got a message saying that they had barred my IP address from accessing the ticket site. They listed a few potential things that could trigger that, but I didn’t see anything on the list that I was guilty of. Actually, I wondered for a while if logging on to the Ticketmaster site to buy the Last Dinner Party tickets caused the problem – but I’ve now seen that many people had the same issue, so it seems unlikely to have been that.
But somehow, I managed to convince the digital guardians that my IP address belonged to a genuine fan and at about 10:25, I was presented with a page to select and buy my tickets.
Then I saw the prices.
I have personal rules about tickets at the O2 Arena. Following bad experiences (including the previous Radiohead show I saw there), I have barred myself from buying Level 4 tickets. They are far too far from the stage and have a vertiginous rake that is best avoided. I also won’t buy standing tickets because… well, because I’m old and standing for three hours or so isn’t as much fun as it used to be. I always buy Level 1 seats (for those who don’t know the O2 Arena, Levels 2 and 3 are given over to corporate boxes, so they aren’t an option).
So I started looking for Level 1 tickets. To see that they varied between £200 and £300. That didn’t seem right. I’d heard that tickets would be about £80. In the end, I found £89 tickets right at the back of Level 4 (basically, in Kent) and £97 standing tickets (both of those prices would almost certainly have other fees added to them before I actually paid). I seriously considered breaking my rules and buying a ticket on Level 4, but I just couldn’t justify it.
I like Radiohead, but I can’t justify paying £200 or £300 for anyone. The most I have ever paid for a gig is just over £100 for Kate Bush ten years ago. It’s not that I can’t afford it, it’s that I don’t think it’s worth that much money. I appreciate that other people (20,000 people times four nights – plus the rest of the tour!) will have reached a different conclusion. And I hope they enjoy the shows. But it’s really not for me.
I also realise the economics of the music industry have changed. It used to be that tours were loss-leaders that were used to encourage people to buy records (Ok, I’m showing my age – CDs). These days, it has switched. Almost no-one buys CDs, and releasing new music is basically a loss-leader to encourage people to go to gigs. And gig prices have increased in order to make tours profitable. I understand that completely, but I don’t have to like it. I used to go to about one gig a week. At current prices, it’s more like one a month.
I closed the site without buying a ticket, and I don’t regret that decision for a second.
What about you? Did you try to get tickets? At what point did you fall out of the process? Or did you get them? Are you happy you’ll get your money’s worth?
The post A Radiohead story appeared first on Davblog.
When I started to think about a second edition of Data Munging With Perl, I thought it would be almost trivial. The plan was to go through the text and
Update the Perl syntax to a more modern version of Perl
Update the CPAN modules used
And that was about it.
But when I started looking through the first edition, I realised there was a big chunk of work missing from this plan
Ensure the book reflects current ideas and best practices in the industry
It’s that extra step that is taking the time. I was writing the first edition 25 years ago. That’s a long time. It’s a long time in any industry - it’s several generations in our industry. Think about the way you were working in 2000. Think about the day-to-day tasks you were taking on. Think about the way you organised your day (the first books on Extreme Programming were published in 2000; the Agile Manifesto was written in 2001; Scrum first appeared at about the same time).
The first edition contains the sentence “Databases are becoming almost as ubiquitous as data files”. Imagine saying that with a straight face today. There is one paragraph on Unicode. There’s nothing about YAML or JSON (because those formats both appeared in the years following publication).
When I was writing the slides for my talk, Still Munging Data With Perl, I planned to add a slide about “things we hadn’t heard of in 2000”. It ended up being four slides - and that was just scratching the surface. Not everything in those lists needs to be mentioned in the book - but a lot of it does.
When working on the book recently, I was reminded of how much one particular section of the industry has changed.
The first edition has a chapter on parsing HTML. Of course it does - that was cutting edge at the time. At the end of the chapter, there’s an extended example on screen scraping. It grabs the Yahoo! weather forecast for London and extracts a few pieces of data.
I was thinking ahead when I wrote it. There’s a footnote that says:
You should, of course, bear in mind that web pages change very frequently. By the time you read this, Yahoo! may well have changed the design of this page which will render this program useless.
But I had no idea how true that would be. When I revisited it, the changes were far larger than the me of 2000 could have dreamed of.
The page had moved. So the program would have failed at the first hurdle.
The HTML had changed. So even when I updated the URL, the program still failed.
And. most annoyingly, the new HTML didn’t include the data that I wanted to extract. To be clear - the data I wanted was displayed on the page - it just wasn’t included in the HTML.
Oh, I know what you’re thinking. The page is using Javascript to request the data and insert it into the page. That’s what I assumed too. And that’s almost certainly the case. But after an afternoon with the Chrome Development Tools open, I could not find the request that pulled the required data into the page. It’s obviously there somewhere, but I was defeated.
I’ll rewrite that example to use an API to get weather data.
But it was interesting to see how much harder screen scraping has become. I don’t know whether this was an intentional move by Yahoo! or if it’s just a side effect of their move to different technologies for serving this page. Whichever it is, it’s certainly something worth pointing out in that chapter.
I seem to have written quite a lot of things that aren’t at all related to the book since my last newsletter. I wonder if that’s some kind of displacement therapy :-)
I’m sure that part of it is down to how much more productive I am now I have AI to help with my projects.
Cleaner web feed aggregation with App::FeedDeduplicator explains a problem I have because I’m syndicating a lot of my blog posts to multiple sites (and talks about my solution).
Reformatting images with App::BlurFill introduces a new CPAN module I wrote to make my life as a publisher easier (but it has plenty of other applications too).
Turning AI into a Developer Superpower: The PERL5LIB Auto-Setter - another project that I’d been putting off because it just seemed a bit too complicated. But ChatGPT soon came up with a solution.
Deploying Dancer Apps – The Next Generation - last year, I wrote a couple of blog posts about how I deployed Dancer apps on my server. This takes it a step further and integrates them with systemd.
Generating Content with ChatGPT - I asked ChatGPT to generate a lot of content for one of my websites. The skeleton of the code I used might be useful for other people.
A Slice of Perl - explaining the idea of slices in Perl. Some people don’t seem to realise they exist, but they’re a really powerful piece of Perl syntax.
Stop using your system Perl - this was controversial. I should probably revisit this and talk about some of the counterarguments I’ve seen.
perlweekly2pod - someone wondered if we could turn the Perl Weekly newsletter into a podcast. This was my proof of concept.
Modern CSS Daily - using ChatGPT to fill in gaps in my CSS knowledge.
Pete thinks he can help people sort out their AI-generated start-up code. I think he might be onto something!
Like most people, when I started this Substack I had an idea that I might be able to monetise it at some point. I’m not talking about forcing people to pay for it, but maybe add a paid tier on top of this sporadic free one. Obviously, I’d need to get more organised and promise more regular updates - and that’s something for the future.
But over the last few months, I’ve been pleasantly surprised to receive email from Substack saying that two readers have “pre-pledged” for my newsletter. That is, they’ve told Substack that they would be happy to pay for my content. That’s a nice feeling.
To be clear, I’m not talking about adding a paid tier just yet. But I might do that in the future. So, just to gauge interest, I’m going to drop a “pledge your support” button in this email. Don’t feel you have to press it.
That’s all for today. I’m going to get back to working on the book. I’ll write again soon.
Dave…
Recently, Gabor ran a poll in a Perl Facebook community asking which version of Perl people used in their production systems. The results were eye-opening—and not in a good way. A surprisingly large number of developers replied with something along the lines of “whatever version is included with my OS.”
If that’s you, this post is for you. I don’t say that to shame or scold—many of us started out this way. But if you’re serious about writing and running Perl in 2025, it’s time to stop relying on the system Perl.
Let’s unpack why.
When we talk about the system Perl, we mean the version of Perl that comes pre-installed with your operating system—be it a Linux distro like Debian or CentOS, or even macOS. This is the version used by the OS itself for various internal tasks and scripts. It’s typically located in /usr/bin/perl and tied closely to system packages.
It’s tempting to just use what’s already there. But that decision brings a lot of hidden baggage—and some very real risks.
The Perl Core Support Policy states that only the two most recent stable release series of Perl are supported by the Perl development team [Update: fixed text in previous sentence]. As of mid-2025, that means:
Perl 5.40 (released May 2024)
Perl 5.38 (released July 2023)
If you’re using anything older—like 5.36, 5.32, or 5.16—you’re outside the officially supported window. That means no guaranteed bug fixes, security patches, or compatibility updates from core CPAN tools like ExtUtils::MakeMaker, Module::Build, or Test::More.
Using an old system Perl often means you’re several versions behind, and no one upstream is responsible for keeping that working anymore.
System Perl is frozen in time—usually the version that was current when the OS release cycle began. Depending on your distro, that could mean Perl 5.10, 5.16, or 5.26—versions that are years behind the latest stable Perl (currently 5.40).
This means you’re missing out on:
New language features (builtin, class/method/field, signatures, try/catch)
Performance improvements
Bug fixes
Critical security patches
If you’ve ever looked at modern Perl documentation and found your code mysteriously breaking, chances are your system Perl is too old.
System Perl isn’t just a convenience—it’s a dependency. Your operating system relies on it for package management, system maintenance tasks, and assorted glue scripts. If you install or upgrade CPAN modules into the system Perl (especially with cpan or cpanm as root), you run the risk of breaking something your OS depends on.
It’s a kind of dependency hell that’s completely avoidable—if you stop using system Perl.
When you use system Perl, your environment is essentially defined by your distro. That’s fine until you want to:
Move your application to another system
Run CI tests on a different platform
Upgrade your OS
Onboard a new developer
You lose the ability to create predictable, portable environments. That’s not a luxury—it’s a requirement for sane development in modern software teams.
perlbrew or plenvThese tools let you install multiple versions of Perl in your home directory and switch between them easily. Want to test your code on Perl 5.32 and 5.40? perlbrew makes it a breeze.
You get:
A clean separation from system Perl
The freedom to upgrade or downgrade at will
Zero risk of breaking your OS
It takes minutes to set up and pays for itself tenfold in flexibility.
local::lib or CartonManaging CPAN dependencies globally is a recipe for pain. Instead, use:
local::lib: keeps modules in your home directory.
Carton: locks your CPAN dependencies (like npm or pip) so deployments are repeatable.
Your production system should run with exactly the same modules and versions as your dev environment. Carton helps you achieve that.
If you’re building larger apps or APIs, containerising your Perl environment ensures true consistency across dev, test, and production. You can even start from a system Perl inside the container—as long as it’s isolated and under your control.
You never want to be the person debugging a bug that only happens on production, because prod is using the distro’s ancient Perl and no one can remember which CPAN modules got installed by hand.
Once you step away from the system Perl, you gain:
Access to the full language. Use the latest features without backports or compatibility hacks.
Freedom from fear. Install CPAN modules freely without the risk of breaking your OS.
Portability. Move projects between machines or teams with minimal friction.
Better testing. Easily test your code across multiple Perl versions.
Security. Stay up to date with patches and fixes on your schedule, not the distro’s.
Modern practices. Align your Perl workflow with the kinds of practices standard in other languages (think virtualenv, rbenv, nvm, etc.).
I know the argument. You’ve got a handful of scripts, or maybe a cron job or two, and they seem fine. Why bother with all this?
Because “it just works” only holds true until:
You upgrade your OS and Perl changes under you.
A script stops working and you don’t know why.
You want to install a module and suddenly apt is yelling at you about conflicts.
You realise the module you need requires Perl 5.34, but your system has 5.16.
Don’t wait for it to break. Get ahead of it.
You don’t have to refactor your entire setup overnight. But you can do this:
Install perlbrew and try it out.
Start a new project with Carton to lock dependencies.
Choose a current version of Perl and commit to using it moving forward.
Once you’ve seen how smooth things can be with a clean, controlled Perl environment, you won’t want to go back.
Your system Perl is for your operating system—not for your apps. Treat it as off-limits. Modern Perl deserves modern tools, and so do you.
Take the first step. Your future self (and probably your ops team) will thank you.
The post Stop using your system Perl first appeared on Perl Hacks.
Recently, Gabor ran a poll in a Perl Facebook community asking which version of Perl people used in their production systems. The results were eye-opening—and not in a good way. A surprisingly large number of developers replied with something along the lines of “whatever version is included with my OS.”
If that’s you, this post is for you. I don’t say that to shame or scold—many of us started out this way. But if you’re serious about writing and running Perl in 2025, it’s time to stop relying on the system Perl.
Let’s unpack why.
When we talk about the system Perl, we mean the version of Perl that comes pre-installed with your operating system—be it a Linux distro like Debian or CentOS, or even macOS. This is the version used by the OS itself for various internal tasks and scripts. It’s typically located in /usr/bin/perl and tied closely to system packages.
It’s tempting to just use what’s already there. But that decision brings a lot of hidden baggage—and some very real risks.
The Perl Core Support Policy states that only the two most recent stable release series of Perl are supported by the Perl development team [Update: fixed text in previous sentence]. As of mid-2025, that means:
Perl 5.40 (released May 2024)
Perl 5.38 (released July 2023)
If you’re using anything older—like 5.36, 5.32, or 5.16—you’re outside the officially supported window. That means no guaranteed bug fixes, security patches, or compatibility updates from core CPAN tools like ExtUtils::MakeMaker, Module::Build, or Test::More.
Using an old system Perl often means you’re several versions behind , and no one upstream is responsible for keeping that working anymore.
System Perl is frozen in time—usually the version that was current when the OS release cycle began. Depending on your distro, that could mean Perl 5.10, 5.16, or 5.26—versions that are years behind the latest stable Perl (currently 5.40).
This means you’re missing out on:
New language features (builtin, class/method/field, signatures, try/catch)
Performance improvements
Bug fixes
Critical security patches
Support: anything older than Perl 5.38 is no longer officially maintained by the core Perl team
If you’ve ever looked at modern Perl documentation and found your code mysteriously breaking, chances are your system Perl is too old.
System Perl isn’t just a convenience—it’s a dependency. Your operating system relies on it for package management, system maintenance tasks, and assorted glue scripts. If you install or upgrade CPAN modules into the system Perl (especially with cpan or cpanm as root), you run the risk of breaking something your OS depends on.
It’s a kind of dependency hell that’s completely avoidable— if you stop using system Perl.
When you use system Perl, your environment is essentially defined by your distro. That’s fine until you want to:
Move your application to another system
Run CI tests on a different platform
Upgrade your OS
Onboard a new developer
You lose the ability to create predictable, portable environments. That’s not a luxury— it’s a requirement for sane development in modern software teams.
perlbrew or plenv
These tools let you install multiple versions of Perl in your home directory and switch between them easily. Want to test your code on Perl 5.32 and 5.40? perlbrew makes it a breeze.
You get:
A clean separation from system Perl
The freedom to upgrade or downgrade at will
Zero risk of breaking your OS
It takes minutes to set up and pays for itself tenfold in flexibility.
local::lib or Carton
Managing CPAN dependencies globally is a recipe for pain. Instead, use:
local::lib: keeps modules in your home directory.
Carton: locks your CPAN dependencies (like npm or pip) so deployments are repeatable.
Your production system should run with exactly the same modules and versions as your dev environment. Carton helps you achieve that.
If you’re building larger apps or APIs, containerising your Perl environment ensures true consistency across dev, test, and production. You can even start from a system Perl inside the container—as long as it’s isolated and under your control.
You never want to be the person debugging a bug that only happens on production, because prod is using the distro’s ancient Perl and no one can remember which CPAN modules got installed by hand.
Once you step away from the system Perl, you gain:
Access to the full language. Use the latest features without backports or compatibility hacks.
Freedom from fear. Install CPAN modules freely without the risk of breaking your OS.
Portability. Move projects between machines or teams with minimal friction.
Better testing. Easily test your code across multiple Perl versions.
Security. Stay up to date with patches and fixes on your schedule, not the distro’s.
Modern practices. Align your Perl workflow with the kinds of practices standard in other languages (think virtualenv, rbenv, nvm, etc.).
I know the argument. You’ve got a handful of scripts, or maybe a cron job or two, and they seem fine. Why bother with all this?
Because “it just works” only holds true until:
You upgrade your OS and Perl changes under you.
A script stops working and you don’t know why.
You want to install a module and suddenly apt is yelling at you about conflicts.
You realise the module you need requires Perl 5.34, but your system has 5.16.
Don’t wait for it to break. Get ahead of it.
You don’t have to refactor your entire setup overnight. But you can do this:
Install perlbrew and try it out.
Start a new project with Carton to lock dependencies.
Choose a current version of Perl and commit to using it moving forward.
Once you’ve seen how smooth things can be with a clean, controlled Perl environment, you won’t want to go back.
Your system Perl is for your operating system—not for your apps. Treat it as off-limits. Modern Perl deserves modern tools, and so do you.
Take the first step. Your future self (and probably your ops team) will thank you.
The post Stop using your system Perl first appeared on Perl Hacks.
Earlier this week, I read a post from someone who failed a job interview because they used a hash slice in some sample code and the interviewer didn’t believe it would work.
That’s not just wrong — it’s a teachable moment. Perl has several kinds of slices, and they’re all powerful tools for writing expressive, concise, idiomatic code. If you’re not familiar with them, you’re missing out on one of Perl’s secret superpowers.
In this post, I’ll walk through all the main types of slices in Perl — from the basics to the modern conveniences added in recent versions — using a consistent, real-world-ish example. Whether you’re new to slices or already slinging %hash{...} like a pro, I hope you’ll find something useful here.
Let’s imagine you’re writing code to manage employees in a company. You’ve got an array of employee names and a hash of employee details.
my @employees = qw(alice bob carol dave eve); my %details = ( alice => 'Engineering', bob => 'Marketing', carol => 'HR', dave => 'Engineering', eve => 'Sales', );
We’ll use these throughout to demonstrate each kind of slice.
List slices are slices from a literal list. They let you pick multiple values from a list in a single operation:
my @subset = (qw(alice bob carol dave eve))[1, 3];
# @subset = ('bob', 'dave')You can also destructure directly:
my ($employee1, $employee2) = (qw(alice bob carol))[0, 2]; # $employee1 = 'alice', $employee2 = 'carol'
Simple, readable, and no loop required.
Array slices are just like list slices, but from an array variable:
my @subset = @employees[0, 2, 4];
# @subset = ('alice', 'carol', 'eve')You can also assign into an array slice to update multiple elements:
@employees[1, 3] = ('beatrice', 'daniel');
# @employees = ('alice', 'beatrice', 'carol', 'daniel', 'eve')Handy for bulk updates without writing explicit loops.
This is where some people start to raise eyebrows — but hash slices are perfectly valid Perl and incredibly useful.
Let’s grab departments for a few employees:
my @departments = @details{'alice', 'carol', 'eve'};
# @departments = ('Engineering', 'HR', 'Sales')The @ sigil here indicates that we’re asking for a list of values, even though %details is a hash.
You can assign into a hash slice just as easily:
@details{'bob', 'carol'} = ('Support', 'Legal');This kind of bulk update is especially useful when processing structured data or transforming API responses.
Starting in Perl 5.20, you can use %array[...] to return index/value pairs — a very elegant way to extract and preserve positions in a single step.
my @indexed = %employees[1, 3]; # @indexed = (1 => 'bob', 3 => 'dave')
You get a flat list of index/value pairs. This is particularly helpful when mapping or reordering data based on array positions.
You can even delete from an array this way:
my @removed = delete %employees[0, 4]; # @removed = (0 => 'alice', 4 => 'eve')
And afterwards you’ll have this:
# @employees = (undef, 'bob', 'carol', 'dave', undef)
The final type of slice — also added in Perl 5.20 — is the %hash{...} key/value slice. This returns a flat list of key/value pairs, perfect for passing to functions that expect key/value lists.
my @kv = %details{'alice', 'dave'};
# @kv = ('alice', 'Engineering', 'dave', 'Engineering')You can construct a new hash from this easily:
my %engineering = (%details{'alice', 'dave'});This avoids intermediate looping and makes your code clear and declarative.
| Type | Syntax | Returns | Added in |
|---|---|---|---|
| List slice | (list)[@indices] |
Values | Ancient |
| Array slice | @array[@indices] |
Values | Ancient |
| Hash slice | @hash{@keys} |
Values | Ancient |
| Index/value array slice | %array[@indices] |
Index-value pairs | Perl 5.20 |
| Key/value hash slice | %hash{@keys} |
Key-value pairs | Perl 5.20 |
If someone tells you that @hash{...} or %array[...] doesn’t work — they’re either out of date or mistaken. These forms are standard, powerful, and idiomatic Perl.
Slices make your code cleaner, clearer, and more concise. They let you express what you want directly, without boilerplate. And yes — they’re perfectly interview-appropriate.
So next time you’re reaching for a loop to pluck a few values from a hash or an array, pause and ask: could this be a slice?
If the answer’s yes — go ahead and slice away.
The post A Slice of Perl first appeared on Perl Hacks.
Earlier this week, I read a post from someone who failed a job interview because they used a hash slice in some sample code and the interviewer didn’t believe it would work.
That’s not just wrong — it’s a teachable moment. Perl has several kinds of slices, and they’re all powerful tools for writing expressive, concise, idiomatic code. If you’re not familiar with them, you’re missing out on one of Perl’s secret superpowers.
In this post, I’ll walk through all the main types of slices in Perl — from the basics to the modern conveniences added in recent versions — using a consistent, real-world-ish example. Whether you’re new to slices or already slinging %hash{...} like a pro, I hope you’ll find something useful here.
Let’s imagine you’re writing code to manage employees in a company. You’ve got an array of employee names and a hash of employee details.
my @employees = qw(alice bob carol dave eve);
my %details = (
alice => 'Engineering',
bob => 'Marketing',
carol => 'HR',
dave => 'Engineering',
eve => 'Sales',
);
We’ll use these throughout to demonstrate each kind of slice.
List slices are slices from a literal list. They let you pick multiple values from a list in a single operation:
my @subset = (qw(alice bob carol dave eve))[1, 3];
# @subset = ('bob', 'dave')
You can also destructure directly:
my ($employee1, $employee2) = (qw(alice bob carol))[0, 2];
# $employee1 = 'alice', $employee2 = 'carol'
Simple, readable, and no loop required.
Array slices are just like list slices, but from an array variable:
my @subset = @employees[0, 2, 4];
# @subset = ('alice', 'carol', 'eve')
You can also assign into an array slice to update multiple elements:
@employees[1, 3] = ('beatrice', 'daniel');
# @employees = ('alice', 'beatrice', 'carol', 'daniel', 'eve')
Handy for bulk updates without writing explicit loops.
This is where some people start to raise eyebrows — but hash slices are perfectly valid Perl and incredibly useful.
Let’s grab departments for a few employees:
my @departments = @details{'alice', 'carol', 'eve'};
# @departments = ('Engineering', 'HR', 'Sales')
The @ sigil here indicates that we’re asking for a list of values, even though %details is a hash.
You can assign into a hash slice just as easily:
@details{'bob', 'carol'} = ('Support', 'Legal');
This kind of bulk update is especially useful when processing structured data or transforming API responses.
Starting in Perl 5.20 , you can use %array[...] to return index/value pairs — a very elegant way to extract and preserve positions in a single step.
my @indexed = %employees[1, 3];
# @indexed = (1 => 'bob', 3 => 'dave')
You get a flat list of index/value pairs. This is particularly helpful when mapping or reordering data based on array positions.
You can even delete from an array this way:
my @removed = delete %employees[0, 4];
# @removed = (0 => 'alice', 4 => 'eve')
And afterwards you’ll have this:
# @employees = (undef, 'bob', 'carol', 'dave', undef)
The final type of slice — also added in Perl 5.20 — is the %hash{...} key/value slice. This returns a flat list of key/value pairs, perfect for passing to functions that expect key/value lists.
my @kv = %details{'alice', 'dave'};
# @kv = ('alice', 'Engineering', 'dave', 'Engineering')
You can construct a new hash from this easily:
my %engineering = (%details{'alice', 'dave'});
This avoids intermediate looping and makes your code clear and declarative.
| Type | Syntax | Returns | Added in |
|---|---|---|---|
| List slice | (list)[@indices] |
Values | Ancient |
| Array slice | @array[@indices] |
Values | Ancient |
| Hash slice | @hash{@keys} |
Values | Ancient |
| Index/value array slice | %array[@indices] |
Index-value pairs | Perl 5.20 |
| Key/value hash slice | %hash{@keys} |
Key-value pairs | Perl 5.20 |
If someone tells you that @hash{...} or %array[...] doesn’t work — they’re either out of date or mistaken. These forms are standard, powerful, and idiomatic Perl.
Slices make your code cleaner, clearer, and more concise. They let you express what you want directly, without boilerplate. And yes — they’re perfectly interview-appropriate.
So next time you’re reaching for a loop to pluck a few values from a hash or an array, pause and ask: could this be a slice?
If the answer’s yes — go ahead and slice away.
The post A Slice of Perl first appeared on Perl Hacks.
Back in January, I wrote a blog post about adding JSON-LD to your web pages to make it easier for Google to understand what they were about. The example I used was my ReadABooker site, which encourages people to read more Booker Prize shortlisted novels (and to do so by buying them using my Amazon Associate links).
I’m slightly sad to report that in the five months since I implemented that change, visits to the website have remained pretty much static and I have yet to make my fortune from Amazon kickbacks. But that’s ok, we just use it as an excuse to learn more about SEO and to apply more tweaks to the website.
I’ve been using the most excellent ARefs site to get information about how good the on-page SEO is for many of my sites. Every couple of weeks, ARefs crawls the site and will give me a list of suggestions of things I can improve. And for a long time, I had been putting off dealing with one of the biggest issues – because it seemed so difficult.
The site didn’t have enough text on it. You could get lists of Booker years, authors and books. And, eventually, you’d end up on a book page where, hopefully, you’d be tempted to buy a book. But the book pages were pretty bare – just the title, author, year they were short-listed and an image of the cover. Oh, and the all-important “Buy from Amazon” button. AHrefs was insistent that I needed more text (at least a hundred words) on a page in order for Google to take an interest in it. And given that my database of Booker books included hundreds of books by hundreds of authors, that seemed like a big job to take on.
But, a few days ago, I saw a solution to that problem – I could ask ChatGPT for the text.
I wrote a blog post in April about generating a daily-updating website using ChatGPT. This would be similar, but instead of writing the text directly to a Jekyll website, I’d write it to the database and add it to the templates that generate the website.
Adapting the code was very quick. Here’s the finished version for the book blurbs.
#!/usr/bin/env perl
use strict;
use warnings;
use builtin qw[trim];
use feature 'say';
use OpenAPI::Client::OpenAI;
use Time::Piece;
use Encode qw[encode];
use Booker::Schema;
my $sch = Booker::Schema->get_schema;
my $count = 0;
my $books = $sch->resultset('Book');
while ($count < 20 and my $book = $books->next) {
next if defined $book->blurb;
++$count;
my $blurb = describe_title($book);
$book->update({ blurb => $blurb });
}
sub describe_title {
my ($book) = @_;
my ($title, $author) = ($book->title, $book->author->name);
my $debug = 1;
my $api_key = $ENV{"OPENAI_API_KEY"} or die "OPENAI_API_KEY is not set\n";
my $client = OpenAPI::Client::OpenAI->new;
my $prompt = join " ",
'Produce a 100-200 word description for the book',
"'$title' by $author",
'Do not mention the fact that the book was short-listed for (or won)',
'the Booker Prize';
my $res = $client->createChatCompletion({
body => {
model => 'gpt-4o',
# model => 'gpt-4.1-nano',
messages => [
{ role => 'system', content => 'You are someone who knows a lot about popular literature.' },
{ role => 'user', content => $prompt },
],
temperature => 1.0,
},
});
my $text = $res->res->json->{choices}[0]{message}{content};
$text = encode('UTF-8', $text);
say $text if $debug;
return $text;
}There are a couple of points to note:
I then produced a similar program that did the same thing for authors. It’s similar enough that the next time I need something like this, I’ll spend some time turning it into a generic program.
I then added the new database fields to the book and author templates and re-published the site. You can see the results in, for example, the pages for Salman Rushie and Midnight’s Children.
I had one more slight concern going into this project. I pay for access to the ChatGPT API. I usually have about $10 in my pre-paid account and I really had no idea how much this was going to cost me. I needed have worried. Here’s a graph showing the bump in my API usage on the day I ran the code for all books and authors:

But you can also see that my total costs for the month so far are $0.01!
So, all-in-all, I call that a success and I’ll be using similar techniques to generate content for some other websites.
The post Generating Content with ChatGPT first appeared on Perl Hacks.
Back in January, I wrote a blog post about adding JSON-LD to your web pages to make it easier for Google to understand what they were about. The example I used was my ReadABooker site, which encourages people to read more Booker Prize shortlisted novels (and to do so by buying them using my Amazon Associate links).
I’m slightly sad to report that in the five months since I implemented that change, visits to the website have remained pretty much static and I have yet to make my fortune from Amazon kickbacks. But that’s ok, we just use it as an excuse to learn more about SEO and to apply more tweaks to the website.
I’ve been using the most excellent ARefs site to get information about how good the on-page SEO is for many of my sites. Every couple of weeks, ARefs crawls the site and will give me a list of suggestions of things I can improve. And for a long time, I had been putting off dealing with one of the biggest issues – because it seemed so difficult.
The site didn’t have enough text on it. You could get lists of Booker years, authors and books. And, eventually, you’d end up on a book page where, hopefully, you’d be tempted to buy a book. But the book pages were pretty bare – just the title, author, year they were short-listed and an image of the cover. Oh, and the all-important “Buy from Amazon” button. AHrefs was insistent that I needed more text (at least a hundred words) on a page in order for Google to take an interest in it. And given that my database of Booker books included hundreds of books by hundreds of authors, that seemed like a big job to take on.
But, a few days ago, I saw a solution to that problem – I could ask ChatGPT for the text.
I wrote a blog post in April about generating a daily-updating website using ChatGPT. This would be similar, but instead of writing the text directly to a Jekyll website, I’d write it to the database and add it to the templates that generate the website.
Adapting the code was very quick. Here’s the finished version for the book blurbs.
#!/usr/bin/env perl
use strict;
use warnings;
use builtin qw[trim];
use feature 'say';
use OpenAPI::Client::OpenAI;
use Time::Piece;
use Encode qw[encode];
use Booker::Schema;
my $sch = Booker::Schema->get_schema;
my $count = 0;
my $books = $sch->resultset('Book');
while ($count < 20 and my $book = $books->next) {
next if defined $book->blurb;
++$count;
my $blurb = describe_title($book);
$book->update({ blurb => $blurb });
}
sub describe_title {
my ($book) = @_;
my ($title, $author) = ($book->title, $book->author->name);
my $debug = 1;
my $api_key = $ENV{"OPENAI_API_KEY"} or die "OPENAI_API_KEY is not set\n";
my $client = OpenAPI::Client::OpenAI->new;
my $prompt = join " ",
'Produce a 100-200 word description for the book',
"'$title' by $author",
'Do not mention the fact that the book was short-listed for (or won)',
'the Booker Prize';
my $res = $client->createChatCompletion({
body => {
model => 'gpt-4o',
# model => 'gpt-4.1-nano',
messages => [
{ role => 'system', content => 'You are someone who knows a lot about popular literature.' },
{ role => 'user', content => $prompt },
],
temperature => 1.0,
},
});
my $text = $res->res->json->{choices}[0]{message}{content};
$text = encode('UTF-8', $text);
say $text if $debug;
return $text;
}
There are a couple of points to note:
I then produced a similar program that did the same thing for authors. It’s similar enough that the next time I need something like this, I’ll spend some time turning it into a generic program.
I then added the new database fields to the book and author templates and re-published the site. You can see the results in, for example, the pages for Salman Rushie and Midnight’s Children.
I had one more slight concern going into this project. I pay for access to the ChatGPT API. I usually have about $10 in my pre-paid account and I really had no idea how much this was going to cost me. I needed have worried. Here’s a graph showing the bump in my API usage on the day I ran the code for all books and authors:
But you can also see that my total costs for the month so far are $0.01!
So, all-in-all, I call that a success and I’ll be using similar techniques to generate content for some other websites.
The post Generating Content with ChatGPT first appeared on Perl Hacks.
A month after the last newsletter - maybe I’m starting to hit a rhythm. Or maybe it’s just a fluke. Only time will tell.
As I mentioned in a brief update last month, I gave a talk to the Toronto Perl Mongers about Data Munging With Perl. I didn’t go to Toronto - I talked to them (and people all across the world) over Zoom. And that was a slightly strange experience. I hadn’t realised just how much I like the interaction of a live presentation. At the very least, it’s good to know whether or not your jokes are landing!
But I got through it, and people have said they found it interesting. I talked about how the first edition of the book came about and explained why I thought the time was right for a second edition. I gave a brief overview of the kinds of changes that I’m making for the second edition. I finished by announcing that a “work in progress” version of the second edition is available from LeanPub (and that was a surprisingly good idea, judging by the number of people who have bought it!) We finished the evening with a few questions and answers.
You can order the book from LeanPub. And the slides, video and a summary of the talk are all available from my talks site (an occasional project where I’m building an archive of all of the talks I’ve given over the last 25 years).
Of course, I still have to finish the second edition. And, having found myself without a regular client a few weeks ago, I’ve been spending most of my time on that. It’s been an interesting journey - seeing just how much has changed over the last quarter of a century. There have been changes in core Perl syntax, changes in recommended CPAN module and changes in the wider industry (for example, 25 years ago, no-one had heard of YAML or JSON).
I’m still unable to give a date for the publication of the final version. But I can feel it getting closer. I think that the next time I write one of these newsletters, I’ll include an extract from one of the chapters that has a large number of changes.
I’ve been doing other things as well. In the last newsletter, I wrote about how I had built a website in a day with help from ChatGPT. The following week, I went a bit further and built another website - and this time ChatGPT didn’t just help me create the site, but it also updates the site daily with no input at all from me.
The site is at cool-stuff.co.uk, and I blogged about the project at Finding cool stuff with ChatGPT. My blog post was picked up by the people at dev.to for their Top 7 Featured DEV Posts of the Week feature. Which was nice :-)
I said that ChatGPT was updating the site without any input from me. Well, originally, that wasn’t strictly true. Although ChatGPT seemed to understand the assignment (finding an interesting website to share every day), it seemed to delight in finding every possible loophole in the description of the data format I wanted to get back. This meant that on most days, my code was unable to parse the response successfully and I had a large number of failed updates. It felt more than a little like a story about a genie who gives you three wishes but then does everything in their power to undermine those wishes.
Eventually, I discovered that you can give ChatGPT a JSON Schema definition and it will always create a response that matches that definition (see Structured Outputs). Since I implemented that, I’ve had no problem. You might be interested in how I did that in my Perl program.
Paul Cochrane has been writing an interesting series of posts about creating a new map for Mohammad Anwar’s Map::Tube framework. But in the process, he seems to have written a very useful guide on how to write a new CPAN module using modern tools and a test-driven approach. Two articles have been published so far, but he’s promising a total of five.
Building Map::Tube::<*> maps, a HOWTO: first steps
Building Map::Tube::<*> maps, a HOWTO: extending the network
Occasionally, when I’m giving a talk I’ll wear a favourite t-shirt that has a picture of a space shuttle on it, along with the text “A spaceship has landed on Earth, it came from Rockwell”. Whenever I wear it, I can guarantee that at least a couple of people will comment on it.
The t-shirt is based on an advert from the first issue of a magazine called OMNI, which was published in 1978. Because of the interest people show in the t-shirt, I’ve put up a website at itcamefromrockwell.com. The site includes a link to a blog post that goes into more detail about the advert, and also a link to a RedBubble shop where you can buy various items using the design.
If you’re at all interested in the history of spaceflight, then you might find the site interesting.
Hope you found something of interest in today’s newsletter. I’ll be back in a week or two with an update on the book.
Cheers,
Dave…
This isn’t a real newsletter. I just had a couple of quick updates that I didn’t want to keep until I write another newsletter.
I gave my “Still Munging Data With Perl” talk to an audience over Zoom, last Thursday. I thought it went well and people have been kind enough to say nice things about it to me. I’ve added a page about the talk to my talks site (talks.davecross.co.uk/talk/still-munging-data-with-perl/). Currently it has the slides, but I expect to add the video in the next few days.
As part of the talk, I announced that the second edition of Data Munging With Perl is on sale through LeanPub as a “work in progress” edition. That means you can pay me now and you’ll get the current version of the book - but over the next few weeks you’ll get updated versions as the second edition gets more and more complete. See leanpub.com/datamungingwithperl.
That’s all for now. I hope to have a proper newsletter for you next week.
Cheers,
Dave…
Maybe the secret to posting more often is to have interesting and/or useful things to say. I’m not sure what that says about my extremely intermittent posting frequency over the last five years or so.
But it looks like I’m on a roll. I have a few things I want to share with you. I’ve been down a bit of a rabbit hole thinking about the various different ways that websites are built these days.
Over the last few months, I’ve become lightly involved with the Clapham Society - that’s a local group that spreads news about life in Clapham and gets involved in campaigning against things that are likely to have a negative effect on that life.
They wanted a new website. Now, I know a bit about how websites work, but I’d never describe myself as a web designer and I just don’t have the capacity to get involved in a project that needs to move a lot of content from an old website to a new one. But I had some conversations with them about what they were looking for in a new website and I helped them choose an agency that would build the new site (they went with Agency for Good, who seem to be very… well… good!) I also helped out wrangling their old domain name from their previous web hosting company and helped them get set up on Google Workspace (using a free account from Google for Nonprofits).
Anyway… that all went very well. But I started thinking about all of the horror stories you hear about small businesses that get caught up with cowboy website development companies and then end up with a website that doesn’t work for them, costs too much money and is hard to maintain.
And that led to me writing my Website Guide (aka “What to Ask Your Website Company”). The aim was to write in non-technical language and provide a short and easy-to-understand guide that would give small business owners[*] the information they need to make better decisions when choosing a company to design and build their website.
Yes, it’s a bit outside my usual areas of expertise, but I think it works. I’m considering turning it into a short e-book next.
I realised this had all got a bit meta. I had built a website about how to get people to build you a good website. So I decided to lean into that and wrote a blog post called “How I Build Websites in 2025” which explained the process and tools I had used to build the websites about building websites. Are you still with me?
Then, yesterday, I decided to take things a bit further. Like most geeks, I have several domains just lying around not doing anything useful and I wanted to know if I could quickly spin up a website on one of them that could bring me a bit of income (or, perhaps, get enough traffic that I could sell it on in a few months).
So in six hours or so, I built Balham.org. Well, I say “I”, but I couldn’t have done it without a lot of help from ChatGPT. That help basically came in three areas:
High-level input. Answering questions like “What sort of website should we build?”, “What pages should we have?” and “How do we make money out of this?”
Data and content. How long would it take you to get a list of 20 popular Balham businesses and create a YAML file suitable for use with Jekyll? ChatGPT did it in a minute or so.
Jekyll advice. I’m starting to understand Jeykyll pretty well, but this project went way beyond my knowledge many times. ChatGPT really helped by telling me how to achieve some of my goals.
We did what we set out to do. The site is there and is, I think, useful. Now we have the phase where so many projects stall (I nearly wrote “my projects” there, but I think this is a problem for many people). We need to promote the site and start making money from it. That’s probably another session with ChatGPT next weekend.
And finally on this subject, for now, at least, today I wrote a blog post about yesterday’s project - “Building a website in a day — with help from ChatGPT“.
[*] Owners of small businesses, not … well, you know what I mean!
I’m still working on the second edition of Data Munging with Perl. But, more urgently, I’m currently working on the slides for the talk I’m giving about the book next Thursday, 27 March to the Toronto Perl Mongers. I’m not flying to Toronto, it’s a virtual talk that I’ll be giving over Zoom. There are already over a hundred people signed up - which is all very exciting. Why not join us? You can sign up at the link below.
The book will be out as soon as possible after the talk (but, to be clear, that’s probably weeks, not days). Subscribers to this newsletter will be the first people to know when it’s published.
Anyway, that’s all I have time for today. I really need to get back to writing these slides. Thanks for taking an interest.
Cheers,
Dave…

A few days ago, I looked at an unused domain I owned — balham.org — and thought: “There must be a way to make this useful… and maybe even make it pay for itself.”
So I set myself a challenge: one day to build something genuinely useful. A site that served a real audience (people in and around Balham), that was fun to build, and maybe could be turned into a small revenue stream.
It was also a great excuse to get properly stuck into Jekyll and the Minimal Mistakes theme — both of which I’d dabbled with before, but never used in anger. And, crucially, I wasn’t working alone: I had ChatGPT as a development assistant, sounding board, researcher, and occasional bug-hunter.
Balham is a reasonably affluent, busy part of south west London. It’s full of restaurants, cafés, gyms, independent shops, and people looking for things to do. It also has a surprisingly rich local history — from Victorian grandeur to Blitz-era tragedy.
I figured the site could be structured around three main pillars:
Throw in a curated homepage and maybe a blog later, and I had the bones of a useful site. The kind of thing that people would find via Google or get sent a link to by a friend.
I wanted something static, fast, and easy to deploy. My toolchain ended up being:
The site is 100% static, with no backend, no databases, no CMS. It builds automatically on GitHub push, and is entirely hosted via GitHub Pages.
I gave us about six solid hours to build something real. Here’s what we did (“we” meaning me + ChatGPT):
The domain was already pointed at GitHub Pages, and I had a basic “Hello World” site in place. We cleared that out, set up a fresh Jekyll repo, and added a _config.yml that pointed at the Minimal Mistakes remote theme. No cloning or submodules.
We decided to create four main pages:
We used the layout: single layout provided by Minimal Mistakes, and created custom permalinks so URLs were clean and extension-free.
This was built from scratch using a YAML data file (_data/businesses.yml). ChatGPT gathered an initial list of 20 local businesses (restaurants, shops, pubs, etc.), checked their status, and added details like name, category, address, website, and a short description.
In the template, we looped over the list, rendered sections with conditional logic (e.g., don’t output the website link if it’s empty), and added anchor IDs to each entry so we could link to them directly from the homepage.
Built exactly the same way, but using _data/events.yml. To keep things realistic, we seeded a small number of example events and included a note inviting people to email us with new submissions.
We wanted the homepage to show a curated set of businesses and events. So we created a third data file, _data/featured.yml, which just listed the names of the featured entries. Then in the homepage template, we used where and slugify to match names and pull in the full record from businesses.yml or events.yml. Super DRY.
We added a map of Balham as a hero image, styled responsively. Later we created a .responsive-inline-image class to embed supporting images on the history page without overwhelming the layout.
This turned out to be one of the most satisfying parts. We wrote five paragraphs covering key moments in Balham’s development — Victorian expansion, Du Cane Court, The Priory, the Blitz, and modern growth.
Then we sourced five CC-licensed or public domain images (from Wikimedia Commons and Geograph) to match each paragraph. Each was wrapped in a <figure> with proper attribution and a consistent CSS class. The result feels polished and informative.
We went through all the basics:
We added GA4 tracking using Minimal Mistakes’ built-in support, and verified the domain with Google Search Console. A sitemap was submitted, and indexing kicked in within minutes.
We ran Lighthouse and WAVE tests. Accessibility came out at 100%. Performance dipped slightly due to Google Fonts and image size, but we did our best to optimise without sacrificing aesthetics.
We added a site-wide footer call-to-action inviting people to email us with suggestions for businesses or events. This makes the site feel alive and participatory, even without a backend form.
This started as a fun experiment: could I monetise an unused domain and finally learn Jekyll properly?
What I ended up with is a genuinely useful local resource — one that looks good, loads quickly, and has room to grow.
If you’re sitting on an unused domain, and you’ve got a free day and a chatbot at your side — you might be surprised what you can build.
Oh, and one final thing — obviously you can also get ChatGPT to write a blog post talking about the project :-)
Originally published at https://blog.dave.org.uk on March 23, 2025.
A few days ago, I looked at an unused domain I owned — balham.org — and thought: “There must be a way to make this useful… and maybe even make it pay for itself.”
So I set myself a challenge: one day to build something genuinely useful. A site that served a real audience (people in and around Balham), that was fun to build, and maybe could be turned into a small revenue stream.
It was also a great excuse to get properly stuck into Jekyll and the Minimal Mistakes theme — both of which I’d dabbled with before, but never used in anger. And, crucially, I wasn’t working alone: I had ChatGPT as a development assistant, sounding board, researcher, and occasional bug-hunter.
Balham is a reasonably affluent, busy part of south west London. It’s full of restaurants, cafés, gyms, independent shops, and people looking for things to do. It also has a surprisingly rich local history — from Victorian grandeur to Blitz-era tragedy.
I figured the site could be structured around three main pillars:
Throw in a curated homepage and maybe a blog later, and I had the bones of a useful site. The kind of thing that people would find via Google or get sent a link to by a friend.
I wanted something static, fast, and easy to deploy. My toolchain ended up being:
The site is 100% static, with no backend, no databases, no CMS. It builds automatically on GitHub push, and is entirely hosted via GitHub Pages.
I gave us about six solid hours to build something real. Here’s what we did (“we” meaning me + ChatGPT):
The domain was already pointed at GitHub Pages, and I had a basic “Hello World” site in place. We cleared that out, set up a fresh Jekyll repo, and added a _config.yml that pointed at the Minimal Mistakes remote theme. No cloning or submodules.
We decided to create four main pages:
index.md)directory/index.md)events/index.md)history/index.md)We used the layout: single layout provided by Minimal Mistakes, and created custom permalinks so URLs were clean and extension-free.
This was built from scratch using a YAML data file (_data/businesses.yml). ChatGPT gathered an initial list of 20 local businesses (restaurants, shops, pubs, etc.), checked their status, and added details like name, category, address, website, and a short description.
In the template, we looped over the list, rendered sections with conditional logic (e.g., don’t output the website link if it’s empty), and added anchor IDs to each entry so we could link to them directly from the homepage.
Built exactly the same way, but using _data/events.yml. To keep things realistic, we seeded a small number of example events and included a note inviting people to email us with new submissions.
We wanted the homepage to show a curated set of businesses and events. So we created a third data file, _data/featured.yml, which just listed the names of the featured entries. Then in the homepage template, we used where and slugify to match names and pull in the full record from businesses.yml or events.yml. Super DRY.
We added a map of Balham as a hero image, styled responsively. Later we created a .responsive-inline-image class to embed supporting images on the history page without overwhelming the layout.
This turned out to be one of the most satisfying parts. We wrote five paragraphs covering key moments in Balham’s development — Victorian expansion, Du Cane Court, The Priory, the Blitz, and modern growth.
Then we sourced five CC-licensed or public domain images (from Wikimedia Commons and Geograph) to match each paragraph. Each was wrapped in a <figure> with proper attribution and a consistent CSS class. The result feels polished and informative.
We went through all the basics:
title and description in front matter for each pagerobots.txt, sitemap.xml, and a hand-crafted humans.txt.html extensionsWe added GA4 tracking using Minimal Mistakes’ built-in support, and verified the domain with Google Search Console. A sitemap was submitted, and indexing kicked in within minutes.
We ran Lighthouse and WAVE tests. Accessibility came out at 100%. Performance dipped slightly due to Google Fonts and image size, but we did our best to optimise without sacrificing aesthetics.
We added a site-wide footer call-to-action inviting people to email us with suggestions for businesses or events. This makes the site feel alive and participatory, even without a backend form.
This started as a fun experiment: could I monetise an unused domain and finally learn Jekyll properly?
What I ended up with is a genuinely useful local resource — one that looks good, loads quickly, and has room to grow.
If you’re sitting on an unused domain, and you’ve got a free day and a chatbot at your side — you might be surprised what you can build.
Oh, and one final thing – obviously you can also get ChatGPT to write a blog post talking about the project :-)
The post Building a website in a day — with help from ChatGPT appeared first on Davblog.

I built and launched a new website yesterday. It wasn’t what I planned to do, but the idea popped into my head while I was drinking my morning coffee on Clapham Common and it seemed to be the kind of thing I could complete in a day — so I decided to put my original plans on hold and built it instead.
The website is aimed at small business owners who think they need a website (or want to update their existing one) but who know next to nothing about web development and can easily fall prey to the many cowboy website companies that seem to dominate the “making websites for small companies” section of our industries. The site is structured around a number of questions you can ask a potential website builder to try and weed out the dodgier elements.
I’m not really in that sector of our industry. But while writing the content for that site, it occurred to me that some people might be interested in the tools I use to build sites like this.
I generally build websites about topics that I’m interested in and, therefore, know a fair bit about. But I probably don’t know everything about these subjects. So I’ll certainly brainstorm some ideas with ChatGPT. And, once I’ve written something, I’ll usually run it through ChatGPT again to proofread it. I consider myself a pretty good writer, but it’s embarrassing how often ChatGPT catches obvious errors.
I’ve used DALL-E (via ChatGPT) for a lot of image generation. This weekend, I subscribed to Midjourney because I heard it was better at generating images that include text. So far, that seems to be accurate.
I don’t write much raw HTML these days. I’ll generally write in Markdown and use a static site generator to turn that into a real website. This weekend I took the easy route and used Jekyll with the Minimal Mistakes theme. Honestly, I don’t love Jekyll, but it integrates well with GitHub Pages and I can usually get it to do what I want — with a combination of help from ChatGPT and reading the source code. I’m (slowly) building my own Static Site Generator ( Aphra) in Perl. But, to be honest, I find that when I use it I can easily get distracted by adding new features rather than getting the site built.
As I’ve hinted at, if I’m building a static site (and, it’s surprising how often that’s the case), it will be hosted on GitHub Pages. It’s not really aimed at end-users, but I know to you use it pretty well now. This weekend, I used the default mechanism that regenerates the site (using Jekyll) on every commit. But if I’m using Aphra or a custom site generator, I know I can use GitHub Actions to build and deploy the site.
If I’m writing actual HTML, then I’m old-skool enough to still use Bootstrap for CSS. There’s probably something better out there now, but I haven’t tried to work out what it is (feel free to let me know in the comments).
For a long while, I used jQuery to add Javascript to my pages — until someone was kind enough to tell me that vanilla Javascript had mostly caught up and jQuery was no longer necessary. I understand Javascript. And with help from GitHub Copilot, I can usually get it doing what I want pretty quickly.
Many years ago, I spent a couple of years working in the SEO group at Zoopla. So, now, I can’t think about building a website without considering SEO.
I quickly lose interest in the content side of SEO. Figuring out what my keywords are and making sure they’re scattered through the content at the correct frequency, feels like it stifles my writing (maybe that’s an area where ChatGPT can help) but I enjoy Technical SEO. So I like to make sure that all of my pages contain the correct structured data (usually JSON-LD). I also like to ensure my sites all have useful OpenGraph headers. This isn’t really SEO, I guess, but these headers control what people see when they share content on social media. So by making that as attractive as possible (a useful title and description, an attractive image) it encourages more sharing, which increases your site’s visibility and, in around about way, improves SEO.
I like to register all of my sites with Ahrefs — they will crawl my sites periodically and send me a long list of SEO improvements I can make.
I add Google Analytics to all of my sites. That’s still the best way to find out how popular your site it and where your traffic is coming from. I used to be quite proficient with Universal Analytics, but I must admit I haven’t fully got the hang of Google Analytics 4 yet-so I’m probably only scratching the surface of what it can do.
I also register all of my sites with Google Search Console. That shows me information about how my site appears in the Google Search Index. I also link that to Google Analytics — so GA also knows what searches brought people to my sites.
I think that covers everything-though I’ve probably forgotten something. It might sound like a lot, but once you get into a rhythm, adding these extra touches doesn’t take long. And the additional insights you gain make it well worth the effort.
If you’ve built a website recently, I’d love to hear about your approach. What tools and techniques do you swear by? Are there any must-have features or best practices I’ve overlooked? Drop a comment below or get in touch-I’m always keen to learn new tricks and refine my process. And if you’re a small business owner looking for guidance on choosing a web developer, check out my new site-it might just save you from a costly mistake!
Originally published at https://blog.dave.org.uk on March 16, 2025.
I built and launched a new website yesterday. It wasn’t what I planned to do, but the idea popped into my head while I was drinking my morning coffee on Clapham Common and it seemed to be the kind of thing I could complete in a day – so I decided to put my original plans on hold and built it instead.
The website is aimed at small business owners who think they need a website (or want to update their existing one) but who know next to nothing about web development and can easily fall prey to the many cowboy website companies that seem to dominate the “making websites for small companies” section of our industries. The site is structured around a number of questions you can ask a potential website builder to try and weed out the dodgier elements.
I’m not really in that sector of our industry. But while writing the content for that site, it occurred to me that some people might be interested in the tools I use to build sites like this.
I generally build websites about topics that I’m interested in and, therefore, know a fair bit about. But I probably don’t know everything about these subjects. So I’ll certainly brainstorm some ideas with ChatGPT. And, once I’ve written something, I’ll usually run it through ChatGPT again to proofread it. I consider myself a pretty good writer, but it’s embarrassing how often ChatGPT catches obvious errors.
I’ve used DALL-E (via ChatGPT) for a lot of image generation. This weekend, I subscribed to Midjourney because I heard it was better at generating images that include text. So far, that seems to be accurate.
I don’t write much raw HTML these days. I’ll generally write in Markdown and use a static site generator to turn that into a real website. This weekend I took the easy route and used Jekyll with the Minimal Mistakes theme. Honestly, I don’t love Jekyll, but it integrates well with GitHub Pages and I can usually get it to do what I want – with a combination of help from ChatGPT and reading the source code. I’m (slowly) building my own Static Site Generator (Aphra) in Perl. But, to be honest, I find that when I use it I can easily get distracted by adding new features rather than getting the site built.
As I’ve hinted at, if I’m building a static site (and, it’s surprising how often that’s the case), it will be hosted on GitHub Pages. It’s not really aimed at end-users, but I know how to use it pretty well now. This weekend, I used the default mechanism that regenerates the site (using Jekyll) on every commit. But if I’m using Aphra or a custom site generator, I know I can use GitHub Actions to build and deploy the site.
If I’m writing actual HTML, then I’m old-skool enough to still use Bootstrap for CSS. There’s probably something better out there now, but I haven’t tried to work out what it is (feel free to let me know in the comments).
For a long while, I used jQuery to add Javascript to my pages – until someone was kind enough to tell me that vanilla Javascript had mostly caught up and jQuery was no longer necessary. I understand Javascript. And with help from GitHub Copilot, I can usually get it doing what I want pretty quickly.
Many years ago, I spent a couple of years working in the SEO group at Zoopla. So, now, I can’t think about building a website without considering SEO.
I quickly lose interest in the content side of SEO. Figuring out what my keywords are and making sure they’re scattered through the content at the correct frequency, feels like it stifles my writing (maybe that’s an area where ChatGPT can help) but I enjoy Technical SEO. So I like to make sure that all of my pages contain the correct structured data (usually JSON-LD). I also like to ensure my sites all have useful OpenGraph headers. This isn’t really SEO, I guess, but these headers control what people see when they share content on social media. So by making that as attractive as possible (a useful title and description, an attractive image) it encourages more sharing, which increases your site’s visibility and, in around about way, improves SEO.
I like to register all of my sites with Ahrefs – they will crawl my sites periodically and send me a long list of SEO improvements I can make.
I add Google Analytics to all of my sites. That’s still the best way to find out how popular your site it and where your traffic is coming from. I used to be quite proficient with Universal Analytics, but I must admit I haven’t fully got the hang of Google Analytics 4 yet—so I’m probably only scratching the surface of what it can do.
I also register all of my sites with Google Search Console. That shows me information about how my site appears in the Google Search Index. I also link that to Google Analytics – so GA also knows what searches brought people to my sites.
I think that covers everything—though I’ve probably forgotten something. It might sound like a lot, but once you get into a rhythm, adding these extra touches doesn’t take long. And the additional insights you gain make it well worth the effort.
If you’ve built a website recently, I’d love to hear about your approach. What tools and techniques do you swear by? Are there any must-have features or best practices I’ve overlooked? Drop a comment below or get in touch—I’m always keen to learn new tricks and refine my process. And if you’re a small business owner looking for guidance on choosing a web developer, check out my new site—it might just save you from a costly mistake!
The post How I build websites in 2025 appeared first on Davblog.
I don’t really do New Year’s resolutions. But I like to start the year with a few aspirations of things I’ll try to do better in the coming year. This year, one of those was to post to this newsletter more frequently. So I have no idea how it’s fast approaching the end of February and this is the first newsletter of 2025 (and six months - almost to the day - since I last sent a newsletter).
So can I be the last person to wish you a Happy New Year? We’re eight weeks into the year. How’s it going for you so far?
Some of you might be old enough to remember my book Data Munging With Perl. Towards the end of last year, I decided it was about time to produce a second edition. It’s a book I’m proud of and I think it still contains a lot of good advice, but a lot of the actual Perl in the book is looking rather dated now and it will be nice to have a version that updates a lot of the code.
I talked about my plans in a lightning talk at the London Perl Workshop last year. At the time, I mentioned that I hoped the book would be available “by Christmas”. But then I went and took on a full-time client that rather ate into the time I had available to work on the book. So Christmas came and went without the book being published. But I’m still making progress and I hope it’ll be available before too long. I have a bit of a fixed deadline as I’m giving a talk about the book to the Toronto Perl Mongers at the end of March, and I’d like it to be available by then.
The talk will be given over Zoom and it’s free. So please feel free if you’re interested in the book. It’s on 27th March at 20:00 UTC.
I’ve been part of the Perl community for far longer than I care to admit. But my contributions have mostly been non-technical. So I was surprised when a member of the Perl Steering Council contacted me at the start of the year and asked for my help working on a new website that would make it easier for the community to see upcoming changes that had been proposed for the Perl language.
The PPC process has been in place pretty much since the Perl Steering Council was formed in 2020. But it had been suggested that only having the documents available through the GitHub website means that many people who might otherwise be interested, aren’t reading the proposals. So the PSC had decided it would be useful to turn that repo into a real website. And because of my experience with GitHub Pages and GitHub Actions, I would probably be a good person to get a first draft up and running quickly.
So that’s what I did. You can read more details about how it all works on my Perl Hack blog (part 1, part 2) and you can form your own opinion on how successful I’ve been by looking at the site at https://perl.github.io/PPCs/
I’ve written a few other blog posts over the last six months that you might find interesting:
We moved the London Perl Mongers website to GitHub Pages (I’m starting to detect a bit of a theme here!)
I explained a CPAN module I had written that makes it easy to add JSON-LD to a website
I’ve been using various AI tools to increase my productivity. But I’m really only dabbling at the moment. Other people are obviously doing more than I am. So this article really interested me. I’m definitely going to try to incorporate some of these suggestions into my workflow.
I’ve also been given access to the preview version of GitHub Spark. So I’ll be trying that out over the coming weeks.
In the past, I’ve definitely been guilty of buying domain names for projects that I only have the vaguest of plans for. But over the last five years I’ve made a determined effort to stop doing that and to let old domains lapse once it becomes obvious that I’m not going to complete the project they were bought for. I think I’ve let somewhere between a third and a half of my domains lapse.
But sometimes a domain just has to be bought.
And I realised last week that South Sudan were allowing domain registrations at the top level of their ccTLD.
So… davecro.ss.
That’s all for today. Let’s hope it’s not six months until my next newsletter.
Cheers,
Dave…
I’ve been a member of Picturehouse Cinemas for something approaching twenty years. It costs about £60 a year and for that, you get five…
I’ve been a member of Picturehouse Cinemas for something approaching twenty years. It costs about £60 a year and for that, you get five free tickets and discounts on your tickets and snacks. I’ve often wondered whether it’s worth paying for, but in the last couple of years, they’ve added an extra feature that makes it well worth the cost. It’s called Film Club and every week they have two curated screenings that members can see for just £1. On Sunday lunchtime, there’s a screening of an older film, and on a weekday evening (usually Wednesday at the Clapham Picturehouse), they show something new. I’ve got into the habit of seeing most of these screenings.
For most of the year, I’ve been considering a monthly post about the films I’ve seen at Film Club, but I’ve never got around to it. So, instead, you get an end-of-year dump of the almost eighty films I’ve seen.
The post Picturehouse Film Club appeared first on Davblog.