Dumps with complete page edit history can be downloaded too, as far as I can see, so no need to crawl that.
clb92
I am:
@[email protected] (MAIN LEMMY PROFILE)
@[email protected] (Main Mastodon profile)
@[email protected]
@[email protected]
@[email protected]
And /u/clb92 on Reddit (and many other places)
- 0 Posts
- 77 Comments
valid reasons for not wanting the whole database e.g. storage constraints
If you’re training AI models, surely you have a couple TB to spare. It’s not like Wikipedia takes up petabytes or anything.
clb92@feddit.dkto
Technology@lemmy.ml•Top 10 website in the world ( July 2025 )English
471·4 months agoWhy would anyone crawl Wikipedia when you can freely download the complete databases in one go, likely served on a CDN…
But sure, crawlers, go ahead and spend a week doing the same thing in a much more expensive, disruptive and error-prone way…
clb92@feddit.dkto
Self Hosted - Self-hosting your services.@lemmy.ml•My biggest annoyances with NGINX-managerEnglish
1·4 months agoYou don’t have to fully restart caddy. You can tell it to reload the caddyfile.
clb92@feddit.dkto
Free and Open Source Software@beehaw.org•FOSS Android File Manager with SSH?English
1·5 months agoOh, it’s not actually FOSS, I think, sorry. Disregard this.
Total Commander is what I use. It’s a dual-pane file manager that has support SFTP, WebDAV, SMB and more with the official plugins it has. It sometimes feels a bit dated, but most other file managers I’ve tried felt simplistic and dumb compared to it. It has lots of advanced features too.
Works great for me, thanks.
Added a button on my Stream Deck too, which disables blocking on my two Technitium instances for 5 minutes.
I have it integrated into HomeAssistant so I have a “Disable DNS Blocking” button
I need that. I already have a bunch of physical buttons on my desk, which do things via Home Assistant, so that’d be an obvious one for me to add next.
I don’t beleave it
clb92@feddit.dkto
blueteamsec@infosec.pub•Polish Train Maker Is Suing the Hackers Who Exposed Its Anti-Repair TricksEnglish
11·5 months agoMan, Newag is a scummy company, even more than I previously thought after seeing that talk by the hackers.
clb92@feddit.dkto
Technology@lemmy.ml•Bad vibes: How an AI agent coded its way to disasterEnglish
1·5 months agoThus whole marketing campaign has worked better than they could have ever hoped for.
I’m planning to have a NAS at my parents’ place too, and I will probably just set it up with Tailscale.
clb92@feddit.dkto
Selfhosted@lemmy.world•Sonarr - How to troubleshoot fake downloadsEnglish
5·6 months agoI use Deluge, where it’s a bit more difficult, because it doesn’t have such filtering built in. I had to use the Execute plugin, and have it execute a script that checks the download filter upon completion, and deletes the download if it contains one or more dangerous filetypes.
Is there any way to have Radarr/Sonarr automatically remove it from the queue if there are no importable files, instead of waiting for manual intervention?
clb92@feddit.dkto
Programmer Humor@lemmy.ml•Finally a professional car for programmers !
27·6 months agoWhat you posted there is a dokker image.
Ask your doctor if Moctopril™ is right for you!
VLC can play from DLNA, so perhaps there’s a way you can use that?
There’s a Java version higher than 8?
clb92@feddit.dkto
Selfhosted@lemmy.world•Let’s Encrypt Begins Supporting IP Address CertificatesEnglish
13·6 months agoPretty sure it remains $1. But it’s specifically only 6-9 digit numeric .xyz domains.
PNG is back!
I didn’t even know it was gone
People were very quick to declare it gone for good.



Yet another useless article that hinges entirely on the word ‘could’. I can make stuff up too:
“1 windmill placed in every front yard could slash power prices by 99.9%”
“Solar panels mounted on every cat and dog could solve all our power needs”
“Painting all buildings in the world with reflective paint could save the polar ice caps by 2027”