Ed2k-hash: Difference between revisions

From AniDB
Jump to navigation Jump to search
mNo edit summary
No edit summary
(10 intermediate revisions by 7 users not shown)
Line 4: Line 4:
In short, a hash is a checksum. It is used to detect errors in data by checking the integrity of the message/file. There are various different hash algorithms used for differing purposes, some that are focused more on security and some on speed. AniDB sports both kinds; CRC is not particularly secure at all, while MD5, SHA-1 and the MD4/ed2k hashes are much more secure from a cryptographic standpoint. They are not fail-safe though, but having more than one hash makes it exponentially harder to counterfeit a file. You might be able to fool one of the hashes, but generally not two, or as is the case with most of the files found in AniDB: four hashes!
In short, a hash is a checksum. It is used to detect errors in data by checking the integrity of the message/file. There are various different hash algorithms used for differing purposes, some that are focused more on security and some on speed. AniDB sports both kinds; CRC is not particularly secure at all, while MD5, SHA-1 and the MD4/ed2k hashes are much more secure from a cryptographic standpoint. They are not fail-safe though, but having more than one hash makes it exponentially harder to counterfeit a file. You might be able to fool one of the hashes, but generally not two, or as is the case with most of the files found in AniDB: four hashes!


The ed2k hash is based on the ''md4 algorithm'', but rather than providing a single hash of the entire file, it breaks the file up into 9500kb ''chunks'' and produces a final hash based on the md4 sums of the chunks. While md4 longer considered secure from a cryptographic perspective, for the purpose of uniquely identifying files it's more than adequate. It is often listed as part of an ''ed2k link'', which also includes a size in bytes, and a name.
The ed2k hash is based on the ''md4 algorithm'', but rather than providing a single hash of the entire file, it breaks the file up into 9500kb ''chunks'' and produces a final hash based on the md4 sums of the chunks. While md4 is no longer considered secure from a cryptographic perspective, for the purpose of uniquely identifying files it's more than adequate. It is often listed as part of an ''ed2k link'', which also includes a size in bytes, and a name.


== Why does AniDB require ed2k-hashes? ==
== Why does AniDB require ed2k-hashes? ==
Line 21: Line 21:
== Which software can be used to generate them? ==
== Which software can be used to generate them? ==
If you have the file(s) on your hard disk or on CD you can use all kinds of tools to generate the ed2k-hash and other additional info:
If you have the file(s) on your hard disk or on CD you can use all kinds of tools to generate the ed2k-hash and other additional info:
* [[Avdump]] is the preferred tool as it can automatically creq files.
* [[Avdump2]] is the preferred tool as it can automatically creq files.
* [[AniDB O'Matic]] by BennieB/PetriW can generate ed2k-hashes. It generates ed2k-hashes, md5, sha-1, crc32 in one go and also lists stuff like codec, resolution, bitrates, ...
* [[AniDB O'Matic]] by BennieB/PetriW can generate ed2k-hashes. It generates ed2k-hashes, md5, sha-1, crc32 in one go and also lists stuff like codec, resolution, bitrates, ...
* [http://ed2k-tools.sourceforge.net/index.shtml ed2k_hash] another tool which is commandline based (current Windows and OSX versions have a GUI as well) and also available for Unix operating systems (Linux/BSD/etc).
* [http://ed2k-tools.sourceforge.net/index.shtml ed2k_hash] another tool which is commandline based (current Windows and OSX versions have a GUI as well) and also available for Unix operating systems (Linux/BSD/etc).
* [http://malich.content.no-ip.org/filehash/filehash-0.5.3.zip Filehash] a little Java program written by Malich. For further info on it read [http://forum.anidb.net/viewtopic.php?t=77 here]
* [http://malich.content.no-ip.org/filehash/filehash-0.5.3.zip Filehash] a little Java program written by Malich. For further info on it read {{OldThreadLink|77|here}}
* [http://www.slavasoft.com/hashcalc/ Hashcalc] can create md5, sha-1, crc32, ed2k-hashes and various other hashes in 1 go.
* [http://www.slavasoft.com/hashcalc/ Hashcalc] can create md5, sha-1, crc32, ed2k-hashes and various other hashes in 1 go.
* [http://rhash.anz.ru/ RHash] multiplatform open source console utility for computing md5/sha1/crc32 hashes, EDonkey and Magnet links for a directory tree.


== How is an ed2k hash calculated exactly? ==
== How is an ed2k hash calculated exactly? ==
A file is hashed in 9728000 byte ''chunks'', using the md4 algorithm, and produces a 128 bit hash for each chunk. For files with only one chunk, the ed2k hash ''is'' the md4 of the file, however for hashes with 2 or more chunks the the hash of each chunk is appended to those before it, and an further md4 of the hashes themselves provides the ed2k hash of the file. Pseudo code is given below:
A file is hashed in 9728000 byte ''chunks'', using the md4 algorithm, and produces a 128 bit hash for each chunk. For files with only one chunk, the ed2k hash ''is'' the md4 of the file, however for hashes with 2 or more chunks the the hash of each chunk is appended to those before it, and an further md4 of the hashes themselves provides the ed2k hash of the file. Pseudo code is given below:


:if filesize is less than <font color="blue">or equal to</font> 9728000:
:if filesize is less than {{colour|blue|or equal to}} 9728000:
:: return md4 of file
:: return md4 of file
:for chunk of size upto 9728000 in file:
:for chunk of size upto 9728000 in file:
:: append md4 of chunk to hashlist
:: append md4 of chunk to hashlist
:<font color="red">if filesize is a multiple of 9728000:
:{{colour|red|if filesize is a multiple of 9728000:}}
::append md4 of null to hashlist</font>
::{{colour|red|append md4 of null to hashlist}}
:return md4 of hashlist
:return md4 of hashlist


Line 41: Line 42:


=== List of which clients use which method ===
=== List of which clients use which method ===
[[Avdump]] use both methods. It will creq files from blue to red and store blue in file description ([http://anidb.net/f7047 Alternative ed2k]).
[[Avdump2]] uses both methods. It will creq files from blue to red and store blue in file description ({{AniDBShortLink|f7047|Alternative ed2k}}).
*<font color="blue">edonkey2000 v0.5.0 to v1.4.3</font>
*{{colour|blue|edonkey2000 v0.5.0 to v1.4.3}}
*<font color="blue">mldonkey (2.5.30.17 tested)</font>
*{{colour|blue|mldonkey (2.5.30.17 tested)}}
*<font color="blue">shareaza (1.8 tested)</font>
*{{colour|blue|shareaza (1.8 tested)}}
*<font color="blue">HashCalc Version (2.01 tested)</font>
*{{colour|blue|HashCalc Version (2.01 tested)}}
*<font color="blue">edonkey-tool-hash (0.4.0 tested)</font>
*{{colour|blue|edonkey-tool-hash (0.4.0 tested)}}
*<font color="blue">fsum (2.51 tested)</font>
*{{colour|blue|fsum (2.51 tested)}}
*<font color="blue">ed2k_hash (0.4.0 tested)</font>
*{{colour|blue|ed2k_hash (0.4.0 tested)}}
*<font color="blue">hashgen (0.0.6 tested)</font>
*{{colour|blue|hashgen (0.0.6 tested)}}
*<font color="red">edonkey2000 until v0.5.0</font>
*{{colour|red|edonkey2000 until v0.5.0}}
*<font color="red">emule (0.46c tested)</font>
*{{colour|red|emule (0.46c tested)}}
*<font color="red">AOM (0.5.5.239 tested)</font>
*{{colour|red|AOM (0.5.5.239 tested)}}
*<font color="red">webaom (v1.13 tested)</font>
*{{colour|red|webaom (v1.13 tested)}}
*<font color="red">ed2k code by Stephane D'Alu (1.4 tested)</font>
*{{colour|red|ed2k code by Stephane D'Alu (1.4 tested)}}
*{{colour|red|jacksum (1.7.0 tested)}}
*{{colour|red|rhash (1.1.9 tested)}}


=== List of affected files, by fileID ===
=== List of affected files, by fileID ===
Line 107: Line 110:
|-
|-
|http://anidb.net/f174421 || 223744000 || cd87540a7b48e87e78e7c714fbf7581e || a7c9a857b6d584bae0568495b92ea609
|http://anidb.net/f174421 || 223744000 || cd87540a7b48e87e78e7c714fbf7581e || a7c9a857b6d584bae0568495b92ea609
|-
|- style="background-color: #eee;"
|http://anidb.net/f200233 || 58368000 || cfff67163f6ca9bbed26211e140b10e1 || 0943c164b8a076c88551f0b2c1757436
|http://anidb.net/f200233 || 58368000 || cfff67163f6ca9bbed26211e140b10e1 || 0943c164b8a076c88551f0b2c1757436
|-
|-
|http://anidb.net/f220069 || 389120000 || a182915f5cd114937f760246da92234b || 43324839e08afca1d83a890468f87f5e
|http://anidb.net/f220069 || 389120000 || a182915f5cd114937f760246da92234b || 43324839e08afca1d83a890468f87f5e
|-
|- style="background-color: #eee;"
|http://anidb.net/f240925 || 126464000 || bae493f413037066ce5a597be4d97e8f || 725a209cf70715b3fdfb03348bb040c6
|http://anidb.net/f240925 || 126464000 || bae493f413037066ce5a597be4d97e8f || 725a209cf70715b3fdfb03348bb040c6
|-
|-
Line 117: Line 120:
|}
|}


[[Category: Definitions]][[Category: Guidelines]]
==See also==
*[[Export ED2K Links]]
[[Category:Definitions]][[Category:Guidelines]]

Revision as of 00:51, 17 February 2012

What's an ed2k hash?

In short, a hash is a checksum. It is used to detect errors in data by checking the integrity of the message/file. There are various different hash algorithms used for differing purposes, some that are focused more on security and some on speed. AniDB sports both kinds; CRC is not particularly secure at all, while MD5, SHA-1 and the MD4/ed2k hashes are much more secure from a cryptographic standpoint. They are not fail-safe though, but having more than one hash makes it exponentially harder to counterfeit a file. You might be able to fool one of the hashes, but generally not two, or as is the case with most of the files found in AniDB: four hashes!

The ed2k hash is based on the md4 algorithm, but rather than providing a single hash of the entire file, it breaks the file up into 9500kb chunks and produces a final hash based on the md4 sums of the chunks. While md4 is no longer considered secure from a cryptographic perspective, for the purpose of uniquely identifying files it's more than adequate. It is often listed as part of an ed2k link, which also includes a size in bytes, and a name.

Why does AniDB require ed2k-hashes?

The main reason for this is that it avoids adding of double database entries. AniDB will not allow you to add a file with the same ed2k-hash as an existing one.

The file size and ed2k-hash of a file is used to identify it globally.

You are allowed to add files without ed2k-hashes to AniDB, however you should edit those files later and add the missing ed2k-hashes. Once you added a certain number of files without ed2k-hashes you may no longer add new files without ed2k-hashes until you edit your old files first.

Why use ed2k instead of another type of hash?

  • The combination of ed2k-hash and file size makes ed2k effective for uniquely identifying files.
  • Since ed2k-hashes can be passed back and forth within a URI with both the hash and file size in a widely recognized format it is a convenient method for adding or checking files.
  • Other hashes are good for validating if a file is corrupt if you already know what file you are comparing it against, but cannot necessarily globally identify a file in the system like the ed2k.
  • AniDB was designed around ed2k, although other hashes have been added to the file records for validation, the internal structure is based on ed2k. If the site goes through a complete redesign then maybe another hash will be made the primary hash, but at this point, this is not likely to change.

Which software can be used to generate them?

If you have the file(s) on your hard disk or on CD you can use all kinds of tools to generate the ed2k-hash and other additional info:

  • Avdump2 is the preferred tool as it can automatically creq files.
  • AniDB O'Matic by BennieB/PetriW can generate ed2k-hashes. It generates ed2k-hashes, md5, sha-1, crc32 in one go and also lists stuff like codec, resolution, bitrates, ...
  • ed2k_hash another tool which is commandline based (current Windows and OSX versions have a GUI as well) and also available for Unix operating systems (Linux/BSD/etc).
  • Filehash a little Java program written by Malich. For further info on it read here (old forum)
  • Hashcalc can create md5, sha-1, crc32, ed2k-hashes and various other hashes in 1 go.
  • RHash multiplatform open source console utility for computing md5/sha1/crc32 hashes, EDonkey and Magnet links for a directory tree.

How is an ed2k hash calculated exactly?

A file is hashed in 9728000 byte chunks, using the md4 algorithm, and produces a 128 bit hash for each chunk. For files with only one chunk, the ed2k hash is the md4 of the file, however for hashes with 2 or more chunks the the hash of each chunk is appended to those before it, and an further md4 of the hashes themselves provides the ed2k hash of the file. Pseudo code is given below:

if filesize is less than or equal to 9728000:
return md4 of file
for chunk of size upto 9728000 in file:
append md4 of chunk to hashlist
if filesize is a multiple of 9728000:
append md4 of null to hashlist
return md4 of hashlist

Note that there are two different ways in practice that implementations treat the 9728000 byte boundary, given as either the red code or the blue code above, black is common to both. In practice this difference only affects a tiny number of files, however is the one case where two 'valid' ed2k hashes might be produced from one file. See forum topic 1 on this issue as well.

List of which clients use which method

Avdump2 uses both methods. It will creq files from blue to red and store blue in file description (Alternative ed2k ).

  • edonkey2000 v0.5.0 to v1.4.3
  • mldonkey (2.5.30.17 tested)
  • shareaza (1.8 tested)
  • HashCalc Version (2.01 tested)
  • edonkey-tool-hash (0.4.0 tested)
  • fsum (2.51 tested)
  • ed2k_hash (0.4.0 tested)
  • hashgen (0.0.6 tested)
  • edonkey2000 until v0.5.0
  • emule (0.46c tested)
  • AOM (0.5.5.239 tested)
  • webaom (v1.13 tested)
  • ed2k code by Stephane D'Alu (1.4 tested)
  • jacksum (1.7.0 tested)
  • rhash (1.1.9 tested)

List of affected files, by fileID

File Size (bytes) Blue method Red method
File of zeros 9728000 d7def262a127cd79096a108e7a9fc138 fc21d9af828f92a8df64beac3357425d
File of zeros 19456000 194ee9e4fa79b2ee9f8829284c466051 114b21c63a74b6ca922291a11177dd5c
http://anidb.net/f7047 145920000 1c2b1a6b142955d84af5d3210d3ece6f 4f79548623c6099896a489257163764e
http://anidb.net/f24359 136192000 f869547f07275eda0067694540c9dc93 df294338b38a29f81ad84f1f364b4504
http://anidb.net/f31383 136192000 f869547f07275eda0067694540c9dc93 df294338b38a29f81ad84f1f364b4504
http://anidb.net/f48530 175104000 a641926fa474fafe3f2c12676ea66b8e c110c2b684aaa391a980bde4e6ee9f1f
http://anidb.net/f51131 107008000 5911beead79e0fa9043baee70683ef56 aa399ff3a0ab9f8eb939dbcd7b7d0ec3
http://anidb.net/f51330 175104000 fa7fbadaed151b003032985eae5c3420 148b2cf54cb4d66f70939ec5224d7961
http://anidb.net/f55744 97280000 4c8a9540fe5aa2f4d9f8cac835a071a6 2fcd55bdeae2a92cc99d70763a64f048
http://anidb.net/f56411 165376000 b7b2f5eaea94bf89f5e03a775d9d9478 f498072c0849cee180e4a1a7d34a26d2
http://anidb.net/f57766 145920000 7a54eda5d89ed525974487aa94515701 85d995b678284e7db5d52df1375971a9
http://anidb.net/f73921 184832000 f8fdbb017cd74ff66882a7c6f33fae24 fc9210c307f99ed7339556d5f05f3d59
http://anidb.net/f78552 194560000 8f1fb4062cd1e8f578013e9b7719d05a f93db3a2ed31e7f48fe35945e4c5c6e8
http://anidb.net/f80216 165376000 d61b705c59199666e164a274e7f91bec ddcec8fdcddd43276a2c173498345789
http://anidb.net/f92884 68096000 c310997efade26c107e44036c1fd0dc3 b83bdad42c5ae5204bea3a25959e2180
http://anidb.net/f123554 243200000 822fc0f338fe8e43d96b9a99fe9632ce ee3557fe68ccd056302710a185f4445f
http://anidb.net/f126410 184832000 e19581a5518a11fbd50c1f23f9c21b95 87ac7a62de204473d3f5448214f4207c
http://anidb.net/f130233 184832000 91ae6f6b3bb42e0792c63efb9f1aa81e 8ad09e1b46b695ebae0243b1856e801c
http://anidb.net/f142402 155648000 4538e1fded9d7661e1ad7d56e7406054
http://anidb.net/f165143 165376000 0ced631bb9010d3ccd331689a2fb02de 6a092c056bc46e7a08d63408f918ba52
http://anidb.net/f166304 184832000 aab2ce19d5b786af20d6e4a15f63552f aa9930ccd300a2feac30b0e49830c321
http://anidb.net/f174421 223744000 cd87540a7b48e87e78e7c714fbf7581e a7c9a857b6d584bae0568495b92ea609
http://anidb.net/f200233 58368000 cfff67163f6ca9bbed26211e140b10e1 0943c164b8a076c88551f0b2c1757436
http://anidb.net/f220069 389120000 a182915f5cd114937f760246da92234b 43324839e08afca1d83a890468f87f5e
http://anidb.net/f240925 126464000 bae493f413037066ce5a597be4d97e8f 725a209cf70715b3fdfb03348bb040c6
http://anidb.net/f243620 311296000 8d81479f7b5ba92c3094630899b5ec7a 1cc497d5f73d9365e88dcdaa4207559e

See also