torrentfreak.com
2 votes c/freepost Posted by devtrix — 2 votes, 8 commentsSource

Confirmed: https://archive.li/eIajX (zero videos)

I will try to set aside some time to add to this project idea. It could use a subproject to scrape other video libraries for libre content and download to local. Also, could have another feature to upload to archive (the original case), peertube, whatever.

don’t know if a question was implicit or not, I’m currently doing some work there..

this is rather easy and boring. to filter for license in videos will, depending on framework (or no framework) chosen to scrape and crawl tricky. Downloading is easy, even the license could be embeded in the website for the video or the video itself (in metadata). getting enough space and time to let this run and obey the robots codex will be the most annoying part.

Oh cool. I figured someone must have started working on something like that.

If it can run on a cronjob or in a shell and chunk it so it can keep to batches of less than ~40 GB, I could run it on a server.

There was an interpretation problem – I am not working on this specifically, but I think in general the way to do it is easy. I’m more or less interested in following this some day, because I want to write what I would call backup solution. Main purpose to batch-share an arbitrary amount of files you are in control of on some platform(s) and get them into another platform/network (GNUnet FS in my case). Crawlers and scrapers would be involved.

some work there referred to “I’m working on things where I can relate to what you attempt”.

What I’m interested in is if there’s anything beyond libmagic and libextractor to get the license of files. Otherwise a database would have either to be created and/or queried and filled. For youtube and sites mentioning the license on the same page or in meta tag it will be easy.

(blame markdown for the terrible, non markup text)

(I realized “easy” is subjective… using for example scrapy + beautifulsoup4 it should be achievable)

by query, youtube has around 414.000 CC licensed videos.

Ah, sorry for the misunderstanding. But some great thoughts here. =)

Yes, I agree much of it seems like easy tasks. I consider this high on the overall set of tasks for doing, but personally have several others to do first.

TL;DR edit: It was suggested to me not to do it now in public. Because this is a simple topic you can find everything you’ll need out there. Sorry -.-

What the frick https://www.blender.org/media-exposure/youtube-blocks-blender-videos-worldwide/