Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

https://lemmy.world/post/6301281

Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration - Lemmy.world

I’m excited to announce the first alpha preview of this project that I’ve been working on for the past 4 months. I’m initially posting about this in a few small communities, and hoping to get some input from early adopters and beta testers. ### What is a DHT crawler? The DHT crawler is Bitmagnet’s killer feature that (currently) makes it unique. Well, almost unique, read on… So what is it? You might be aware that you can enable DHT in your BitTorrent client, and that this allows you find peers who are announcing a torrent’s hash to a Distributed Hash Table (DHT), rather than to a centralized tracker. DHT’s lesser known feature is that it allows you to crawl the info hashes it knows about. This is how Bitmagnet’s DHT crawler works works - it crawls the DHT network, requesting metadata about each info hash it discovers. It then further enriches this metadata by attempting to classify it and associate it with known pieces of content, such as movies and TV shows. It then allows you to search everything it has indexed. This means that Bitmagnet is not reliant on any external trackers or torrent indexers. It’s a self-contained, self-hosted torrent indexer, connected via the DHT to a global network of peers and constantly discovering new content. The DHT crawler is not quite unique to Bitmagnet; another open-source project, magnetico was first (as far as I know) to implement a usable DHT crawler, and was a crucial reference point for implementing this feature. However that project is no longer maintained, and does not provide the other features such as content classification, and integration with other software in the ecosystem, that greatly improve usability. ### Currently implemented features of Bitmagnet: - A DHT crawler - A generic BitTorrent indexer: Bitmagnet can index torrents from any source, not only the DHT network - currently this is only possible via the /import endpoint; more user-friendly methods are in the pipeline - A content classifier that can currently identify movie and television content, along with key related attributes such as language, resolution, source (BluRay, webrip etc.) and enriches this with data from The Movie Database - An import facility for ingesting torrents from any source, for example the RARBG backup - A torrent search engine - A GraphQL API: currently this provides a single search query; there is also an embedded GraphQL playground at /graphql - A web user interface implemented in Angular: currently this is a simple single-page application providing a user interface for search queries via the GraphQL API - A Torznab-compatible endpoint for integration with the Serverr stack ### Interested? If this project interests you then I’d really appreciate your input: - How did you get along with following the documentation and installation instructions? Were there any pain points? - There’s a roadmap of high-priority features on the website - what do you see as the highest priority for near-term development? - If you’re a developer, are you interested in contributing to the project? Thanks for your attention. If you’re interested in this project and would like to help it gain momentum then please give it a star on GitHub, and expect further updates soon!

Looks super interesting; starred!
This sounds amazing, definitely going to add this to my servarr setup next few days.
I use magnetico and have no need for the bells and whistles, but that seems really interesting!
Being relatively new to the self hosted experience and still working through how everything in an arr setup interacts, along with what issues can occur and how to troubleshoot/fix them, this sounds incredibly useful. I'll definitely be looking into integrating this into my own setup and providing feedback when I can!
Does it infiniely crawl, storing all metadata about every torrent it finds forever?
You had me GraphQL 🥰
This sounds awesome, I’ll give it a try! Would this work in i2p?
This looks really cool! How resource intensive is this? What sort of storage requirements are there for this to be a reasonably reliable method of acquiring media? I’m probably just gonna find out myself. I’ve recently fully switched over to usenet, but this could make torrents pretty compelling again.
Looks like a fun project, but will you be providing any info on setting it up from scratch? I just don’t have an interest in docker containers.
@mgdigital, first thing I’be noticed: reliance on “heavier” database stack (pg + redis), at least from the first glance at docker-compose. My suggestion would be to have an option for minimalist setup with sqlite and without redis if possible. That would work better for those of us flying with minimal hardware (rpi, old PC and such).
A dht crawler is inherently an intensive service to run, magnetico used sqlite and would take 10 minutes just to load the splash page that includes the total count of discovered torrents.

seems to work well

just one question, is it expected to have 10,000 out of 12,000 as unknown?

Hi, yep that’s expected. Torrents will only move out of “Unknown” once the classifier is able to categorise them. The classifier currently only supports movie and TV show content, and can recognise these with quite high accuracy assuming a well-named torrent (and a badly named torrent is unlikely to be a high quality release). The other content types (movies, games etc) can currently only be populated via an import (see the tutorial on the website). A priority feature is classifiers for other content types - however we will likely always have a lot of torrents ending up in “Unknown” given the poor naming of many crawled items. Another roadmap feature, smart deletion, could help in future with getting rid of all the rubbish whose contents cannot be inferred from the torrent name.

Dude this is amazing! Exactly the sort of thing I’ve been hoping would pop up to further “decentralize” the torrent search experience.

So I’m trying to run it on my machine through the docker-compose option, and I’m seeing something weird. It shows as successfully running, but when I go to the port it should be running on, I get “unable to connect” on my browser.

When I check my containers running, it shows the 3 bitmagnet containers, but the port doesn’t show.

i.imgur.com/D4R1Le5.png

It’s only once you install something like this that you realize just how many torrents are porno.
Perry's Perspectives

Dr. Cox introduces Perry's Perspectives to Dr. Dorian in sitcom Scrubs, including the "bring back the porn" notion.

YouTube

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

Fewer Letters More Letters Plex Brand of media server package SSD Solid State Drive mass storage VPN Virtual Private Network

[Thread #191 for this sub, first seen 5th Oct 2023, 14:25] [FAQ] [Full list] [Contact] [Source code]

Decronym

Hi, am i missing something, the bitmagnet image keep restarting when i check with “docker ps”, the other 2 containers are working as intended. And port 3333 doesn’t show anything.
What are your logs showing? docker logs -f bitmagnet

Great project !

Naming conventions are missing some important information like bitrate, color depth, and most importantly language and subtitles.

Do you plan to scrape additional infos from known torrent sites (searching for torrent hashes for well named torrents) ?

Maybe I’m misunderstanding but wouldn’t it just be easier to use a good private tracker, assuming you can get an invite?
Yes, of course.