476
edits
m (→Which software can be used to generate them?: ed2k_hash: it's not only for Linux) |
m (Clarifications) |
||
Line 5: | Line 5: | ||
In short, a hash is a checksum. It is used to detect errors in data by checking the integrity of the message/file. There are various different hash algorithms used for differing purposes, some that are focused more on security and some on speed. AniDB sports both kinds; CRC is not particularly secure at all, while MD5, SHA-1 and the MD4/ed2k hashes are much more secure from a cryptographic standpoint. They are not failsafe though, but having more than one hash makes it exponentially harder to counterfeit a file. You might be able to fool one of the hashes, but generally not two, or as is the case with most of the files found in AniDB: four hashes! | In short, a hash is a checksum. It is used to detect errors in data by checking the integrity of the message/file. There are various different hash algorithms used for differing purposes, some that are focused more on security and some on speed. AniDB sports both kinds; CRC is not particularly secure at all, while MD5, SHA-1 and the MD4/ed2k hashes are much more secure from a cryptographic standpoint. They are not failsafe though, but having more than one hash makes it exponentially harder to counterfeit a file. You might be able to fool one of the hashes, but generally not two, or as is the case with most of the files found in AniDB: four hashes! | ||
The ed2k hash is based on the ''md4 algorithm'', but rather than providing a single hash of the entire file, it breaks the file up into 9500kb ''chunks'' and produces a final hash based on the md4 sums of the chunks. While | The ed2k hash is based on the ''md4 algorithm'', but rather than providing a single hash of the entire file, it breaks the file up into 9500kb ''chunks'' and produces a final hash based on the md4 sums of the chunks. While md4 longer considered secure from a cryptographic perspective, for the purpose of uniquely identifying files it's more than adequate. It is often listed as part of an ''ed2k link'', which also includes a size in bytes, and a name. | ||
== Why does anidb require ed2k-hashes? == | == Why does anidb require ed2k-hashes? == | ||
Line 18: | Line 18: | ||
* The combination of ed2k-hash and file size makes ed2k effective for uniquely identifying files. | * The combination of ed2k-hash and file size makes ed2k effective for uniquely identifying files. | ||
* Since ed2k-hashes can be passed back and forth within a | * Since ed2k-hashes can be passed back and forth within a URI with both the hash and file size in a widely recognized format it is a convenient method for adding or checking files. | ||
* Other hashes are good for validating if a file is corrupt if you already know what file you are comparing it against, but cannot necessarily globally identify a file in the system like the ed2k. | * Other hashes are good for validating if a file is corrupt if you already know what file you are comparing it against, but cannot necessarily globally identify a file in the system like the ed2k. | ||
* AniDB was designed around ed2k, although other hashes have been added to the file records for validation, the internal structure is based on ed2k. If the site goes through a complete redesign then maybe another hash will be made the primary hash, but at this point, this is not likely to change. | * AniDB was designed around ed2k, although other hashes have been added to the file records for validation, the internal structure is based on ed2k. If the site goes through a complete redesign then maybe another hash will be made the primary hash, but at this point, this is not likely to change. |
edits