talkgroup

A filerepository library / how to organize ?

Tags: #<Tag:0x00007f736bd904f0> #<Tag:0x00007f736bd900e0> #<Tag:0x00007f736bda7e70>

I’d like to start building a file repository of archive.org eBooks and other publicly accessible eTexts from the internet. While it certainly would house a lot of open content works, im not sure why it shouldn’t also house things that are given away as gratis online either with standard copyright terms.

Subject matter spread should be anything that is relevant or interesting to talkgroup members.

I regularly come across things like this that I feel it’s important to mirror. For example as I mull about the inform programming languages I often refer back to this netbook here: http://adamcadre.ac/gull

It’s possibly the only text I found that explains in easy to read terms how to use the multimedia functions of the glulx virtual machine or zmachine for beginners in mind; and it doesn’t have a whole lot of mirrors.

I also for example came across a description of “What is to be done?” by Nikolay Gavrilovich Chernyshevsky while reading another book, and I discovered it’s in the public domain on archive.org. It’s landed on my “to read” pile and I think that should be fair game too.

But now to the meat and potatoes.

I’ve been pondering how to organize this before I start asking @tim for space on allthecodes.

I am thinking maybe a directory structure based on the Universal Decimal Classification and a root index as a kind of card catalog. Maybe an awk parseable text database. Which should make the metadata importable into any long term data projects.

Does that make sense to people? Would another structure make more sense for a file store of documents?

1 Like

I haven’t articulated this, since I want to say it in a way that shows evidence of my claims, and not just “information wants to be free, man!” But so much of the last century’s works have been forgotten, helped to fade due to industry practices.

And so… I fully support sharing. Copyright in a blight on our ability to self-organize, etc. I don’t want anyone getting hurt, but I personally am fine walking into grey territories for knowledge. Forgiveness > permission, and all that.

Whoo, that’s a big, cool index system to use!

I’ll think on it. I’m trying to come up with these flat hierarchies for interi.org, so basically things won’t ever get more than two sections deep from root. Instead of depth, I go for broad and try to infer from metadata.

So, I’d have a bunch of documents I found and am sharing. I’d metatate them with all the classifications I could, and then let machines take care of the rest.

I don’t awk often, but I do search Hugo sites all the time, using my local file index provider (my non-generic example is search in Nautilus). I clone so many folks’ repos to debug, but they keep their content in the same repos, so I get really fascinating (serendipitous!) results back!

I think it works for portability and practical local use. :slight_smile:

My thinking is that the file repo should work as both as an individual standalone product, and as something easily importable/referenced/integrated by bigger knowledge engines. Which is why i’d like to have some sort of internal hiearchy with machine & human readible index per se. Keeing it human browseable I think is going to require a structure with some death, if the collection gets any size.

My concern there, is unlike with Hugo we could end up with some documents that aren’t easily parseable. (Image based PDFs for example). Thus the need for some sort of minimal index, capturing some searcheable metadata.

I think that is “dearth”, but easily one of my favorite sentences! Thrembode, Master Necro, on stand-by!

Dur, I wasn’t done. @trashHeap, gather me six interesting and disparate documents/artifacts, please. That will help get me on your page, and give us something to play with. :slight_smile:

2 Likes

I might try and build a sample of what im after to demonstrate, but ill let it cook in my noggin for a day or two too, to make sure im properly convinced.

2 Likes

Ahem topic timers… :slight_smile:

yall are a good influence on me because my brain goes straight to cloud(butt) services. i just looked at a friend’s demo app for a new service from the company that makes elasticsearch and i forgot the name and can’t find it on their website, weirdly. it crawls your (publicly available) data and lets you search?