BTRFS and ZFS support real deduplication via copy on write, and would eliminate all current disadvantages of symlink and hardlink deduplication. It just works.
Why have it be one huge python source file? This is a serious code smell imo, and something you really should avoid doing as this can be a major maintenance burden.
Hey fellow scener, cool project!
Just a few thoughts/questions: