Ed2k-hash: Difference between revisions

Jump to navigation Jump to search
m (→‎Which software can be used to generate them?: ed2k_hash: it's not only for Linux)
m (Clarifications)
Line 5: Line 5:
In short, a hash is a checksum. It is used to detect errors in data by checking the integrity of the message/file. There are various different hash algorithms used for differing purposes, some that are focused more on security and some on speed. AniDB sports both kinds; CRC is not particularly secure at all, while MD5, SHA-1 and the MD4/ed2k hashes are much more secure from a cryptographic standpoint. They are not failsafe though, but having more than one hash makes it exponentially harder to counterfeit a file. You might be able to fool one of the hashes, but generally not two, or as is the case with most of the files found in AniDB: four hashes!
In short, a hash is a checksum. It is used to detect errors in data by checking the integrity of the message/file. There are various different hash algorithms used for differing purposes, some that are focused more on security and some on speed. AniDB sports both kinds; CRC is not particularly secure at all, while MD5, SHA-1 and the MD4/ed2k hashes are much more secure from a cryptographic standpoint. They are not failsafe though, but having more than one hash makes it exponentially harder to counterfeit a file. You might be able to fool one of the hashes, but generally not two, or as is the case with most of the files found in AniDB: four hashes!


The ed2k hash is based on the ''md4 algorithm'', but rather than providing a single hash of the entire file, it breaks the file up into 9500kb ''chunks'' and produces a final hash based on the md4 sums of the chunks. While no longer considered secure from a cryptographic perspective, for the purpose of uniquely identifying files it's more than adequate. It is often listed as part of an ''ed2k link'', which also includes a size in bytes, and a name.
The ed2k hash is based on the ''md4 algorithm'', but rather than providing a single hash of the entire file, it breaks the file up into 9500kb ''chunks'' and produces a final hash based on the md4 sums of the chunks. While md4 longer considered secure from a cryptographic perspective, for the purpose of uniquely identifying files it's more than adequate. It is often listed as part of an ''ed2k link'', which also includes a size in bytes, and a name.


== Why does anidb require ed2k-hashes? ==
== Why does anidb require ed2k-hashes? ==
Line 18: Line 18:


* The combination of ed2k-hash and file size makes ed2k effective for uniquely identifying files.
* The combination of ed2k-hash and file size makes ed2k effective for uniquely identifying files.
* Since ed2k-hashes can be passed back and forth within a URL with both the hash and file size in a widely recognized format it is a convenient method for adding or checking files.
* Since ed2k-hashes can be passed back and forth within a URI with both the hash and file size in a widely recognized format it is a convenient method for adding or checking files.
* Other hashes are good for validating if a file is corrupt if you already know what file you are comparing it against, but cannot necessarily globally identify a file in the system like the ed2k.
* Other hashes are good for validating if a file is corrupt if you already know what file you are comparing it against, but cannot necessarily globally identify a file in the system like the ed2k.
* AniDB was designed around ed2k, although other hashes have been added to the file records for validation, the internal structure is based on ed2k.  If the site goes through a complete redesign then maybe another hash will be made the primary hash, but at this point, this is not likely to change.
* AniDB was designed around ed2k, although other hashes have been added to the file records for validation, the internal structure is based on ed2k.  If the site goes through a complete redesign then maybe another hash will be made the primary hash, but at this point, this is not likely to change.