Avdump3

From AniDB
Revision as of 21:01, 2 October 2020 by Worf (talk | contribs) (→‎Download)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

AVDump is a tool to extract meta information from media files while at the same time calculating multiple hashes. Based on that information reports can be generated in multiple forms. Of particular interest is the ability to send those reports back to AniDB and thereby quickly filling in missing metadata for new files.

Quickstart

Since you’re here you probably just want to send AVMF Packages (or Dumps) to AniDB which will add metadata to previously (or soon to be) added files. If that doesn’t mean anything to you, you probably want to start with https://wiki.anidb.net/Content:Files and https://wiki.anidb.net/Auto-creqing and https://wiki.anidb.net/Tutorial:How_to_Add_Files_for_Dummies

Anyway lets keep this short:

  1. Login into AniDB with your account and visit https://anidb.net/user/setting, there goto the Account tab and set a password for “UDP API Key”
  2. Make sure Net Core 3.1 or higher is installed
  3. Download the latest Version of AVD3 with the link in the section below and extract it somewhere.
  4. Start a terminal and navigate to AVDump3
  5. Use the following arguments as the bare minimum
    Windows: AVDump3CL.exe --Auth=<YourUserName>:<YourUdpApiPassword> <APathToTheFiles>
    Linux: dotnet AVDump3CL.dll --Auth=<YourUserName>:<YourUdpApiPassword> <APathToTheFiles>
  6. Optional: See sections below to improve usage experience
    1. For example, to avoid dumping the same files over and over again, add the following parameter: --DoneLogPath=done.txt
  7. Optional: Strongly consider adding --UploadErrors to the arguments. For more information see https://wiki.anidb.net/Avdump3#UploadErrors_Argument

Download

BETA DOWNLOAD (ZIP) (Build 8188) MD5 = 3720b40762d322abed66f478adc1cefe

What’s new compared to AVD2?

AVD3 is a complete rewrite of AVD2 which uses .NET Core instead of .NET Framework and treats Linux as a first class citizen. So it should run on Linux just as well as on Windows. To name a few other big differences:

  • Can process multiple files in parallel
  • Uses native code to speed up hashing significantly
  • More efficient reading
  • Can move/rename files based on scripts
  • Redone commandline arguments
  • More hash algorithms
  • Latest version of MediaInfoLib (MIL) is being used (currently 20.08)
  • Support for .vtt subtitle files
  • Support for 32bit has been dropped
  • Support for MacOS is not yet available (To add support https://github.com/DvdKhl/AVDump3/blob/master/AVDump3NativeLib/src/AVD3MirrorBuffer.c needs to be implemented, help would be appreciated)

Supported formats

Most major file formats are supported to some degree. While hashes are created for all file types, stream details remain shakey for others (notably 'swf'). Because of that for some filetypes (underlined) only the hashes get auto-creqed.

  • Video files: asf/wmv, avi, flv, m2ts, mk3d, mkv, mov, mp4, mpg/mpeg, ogm, ogv, qt, rm/rmvb, swf, ts, webm
  • Subtitle files: ass, idx, js, lrc, mks, pjs, rt, smi, srt, ssa, sub, sup, tmp, tts, txt, vtt, xss
  • Audio files: aac, ac3, dts, dtshd, flac, m4a, mka, mp3, ogg, ra, thd, wav, wma
  • Archive files: 7z, ace, rar, zip
  • Linker files: mkv, smil

UploadErrors Argument

When enabled errors while processing or program crashes are uploaded to AniDB which makes it far easier to discover and fix bugs.

Error reports sent look like the following:

<AVD3CLException thrownOn="2020-08-11 18:31:34.2837">
    <Information>
        <EntryAssemblyVersion>3.0.8163.0</EntryAssemblyVersion>
        <LibVersion>3.0.8163.0</LibVersion>
        <Session>395e5893-145d-4a19-8fc9-1ddc0f7feed6</Session>
        <Framework>3.1.7</Framework>
        <OSVersion>Microsoft Windows NT 6.2.9200.0</OSVersion>
        <IntPtr.Size>8</IntPtr.Size>
        <Is64BitOperatingSystem>true</Is64BitOperatingSystem>
        <Is64BitProcess>true</Is64BitProcess>
        <ProcessorCount>40</ProcessorCount>
        <UserInteractive>true</UserInteractive>
        <SystemPageSize>4096</SystemPageSize>
        <WorkingSet>239337472</WorkingSet>
    </Information>
    <Message>CreatingInfoProviders</Message>
    
        <FileName>Hidden(7E940D28B4A58A87C4C19B39EB6EA5E2B11CC7B13762DE44BB7338E752F6676722B227DBA942302F956AF1CE040C57D9295CF53AB33B5DE45FD4FAB6FB462E47)</FileName>
    
    <Cause>
        <InvalidOperationException>
            <Message>MediaInfoLib couldn't open the file</Message>
            <Stacktrace>
                <Frame>at AVDump3Lib.Information.InfoProvider.MediaInfoLibProvider..ctor(String filePath) in D:\Projects\C#\AVDump3\AVDump3Lib\Information\InfoProvider\MediaInfoLibProvider.cs:line 522</Frame>
                <Frame>at AVDump3Lib.Information.AVD3InformationModule.<>c.<.ctor>b__4_4(InfoProviderSetup setup) in D:\Projects\C#\AVDump3\AVDump3Lib\Information\AVD3InformationModule.cs:line 30</Frame>
                <Frame>at AVDump3Lib.Information.InfoProvider.InfoProviderFactory.Create(InfoProviderSetup setup) in D:\Projects\C#\AVDump3\AVDump3Lib\Information\InfoProvider\InfoProviderFactory.cs:line 40</Frame>
                <Frame>at AVDump3CL.AVD3CLModule.<>c__DisplayClass34_0.<CreateFileMetaInfo>b__0(IInfoProviderFactory x) in D:\Projects\C#\AVDump3\AVDump3CL\AVD3CLModule.cs:line 449</Frame>
                <Frame>at System.Linq.Enumerable.SelectIListIterator`2.ToArray()</Frame>
                <Frame>at System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)</Frame>
                <Frame>at AVDump3CL.AVD3CLModule.CreateFileMetaInfo(String filePath, ImmutableArray`1 blockConsumers) in D:\Projects\C#\AVDump3\AVDump3CL\AVD3CLModule.cs:line 449</Frame>
            </Stacktrace>
            
        </InvalidOperationException>
    </Cause>
    <Stacktrace />
</AVD3CLException>

The value within Hidden() is a hash so it is not possible to get the original data back from that value. When --IncludePersonalData is added those values will be shown in clear text.

Attention: Please be aware that when --IncludePersonalData is active as well, additional data like the commandline arguments (except for passwords) and path information is sent as well!

Arguments

For more detailed information please run AVD3 with --Help!

Special Arguments

  • FROMFILE

If the first argument is called FROMFILE and the next argument is a file path, every line of that file is interpreted as a single argument. A line which starts with // is interpreted as a comment and ignored. See https://wiki.anidb.net/Talk:Avdump3 for an example.

  • PRINTARGS

Prints the effective arguments used into the terminal.

FileDiscovery

Parameter Shorthand Description Usage Default Value
--Recursive -R Recursively descent into Subdirectories --Recursive False
--ProcessedLogPath --PLPath Appends the full filepath to the specified path --ProcessedLogPath=<FilePath1>[:<FilePath2>...]
--SkipLogPath --SLPath Filepaths contained in the specified file will not be processed --SkipLogPath=<FilePath1>[:<FilePath2>...]
--DoneLogPath --DLPath Will set --SkipLogPath and --ProcessedLogPath to the specified filepath --DoneLogPath=<Filepath>
--WithExtensions --WExts Only/Don't Process files with selected Extensions --WithExtensions=[-]<Extension1>[,<Extension2>...]
--Concurrent --Conc Sets the maximal number of files which will be processed concurrently. First param (max) sets a global limit. (path,max) pairs sets limits per path. --Concurrent=<max>[:<path1>,<max1>;<path2>,<max2>...] 1

Processing

Parameter Shorthand Description Usage Default Value
--ProducerMinReadLength How much data in MiB the reader has to read each time at minimum --ProducerMinReadLength 1
--ProducerMaxReadLength How much data in MiB the reader is allowed to read each time at most --ProducerMaxReadLength 8
--PrintAvailableSIMDs Print available CPU SIMDs --PrintAvailableSIMDs False
--PauseBeforeExit --PBExit Pause console before exiting --PauseBeforeExit False
--BufferLength --BLength Circular buffer size for hashing --BufferLength=<Size in MiB> 64
--Consumers --Cons Select consumers to use. Use without arguments to list available consumers --Consumers=<ConsumerName1>[,<ConsumerName2>...]

FileMove

Parameter Shorthand Description Usage Default Value
--Test Test FileMove Settings --FileMove.Test False
--LogPath A line is written for each file that has been moved/renamed. (OldPath => NewPath) --FileMove.LogPath=<FilePath>
--Mode Determines how the Pattern Argument is going to be interpreted:
Inline: Script is directly entered as the argument
File: A path pointing to the script file
Placeholder: See example for --Pattern
CSharpScript: Script in C#
DotNetAssembly: Net Core assembly to be loaded
--FileMove.Mode=<None|PlaceholderInline|PlaceholderFile|CSharpScriptInline|CSharpScriptFile|DotNetAssembly> None
--Pattern Available Placeholders ${Name}:
FileSize, FullName, FileName, FileExtension, FileNameWithoutExtension, DirectoryName, SuggestedExtension,
Hash-<Name>-<2|4|8|10|16|32|32Hex|32Z|36|62|64>-<OC|UC|LC>
--FileMove.Pattern=${DirectoryName}\${FileNameWithoutExtension}${SuggestedExtension} ${DirectoryName}\${FileNameWithoutExtension}${FileExtension}
--DisableFileMove Don't move the file even if the Pattern says so --FileMove.DisableFileMove False
--DisableFileRename Don't rename the file even if the Pattern says so --FileMove.DisableFileRename False
--Replacements Replace substrings in the returned filepath --FileMove.Replacements=<Match1>=<Replacement1>[;<Match2>=<Replacement2>...]

Reporting

Parameter Shorthand Description Usage Default Value
--PrintHashes Print calculated hashes in hexadecimal format to console --PrintHashes False
--PrintReports Print generated reports to console --PrintReports False
--Reports Select reports to use. Use without arguments to list available reports --Reports
--ReportDirectory --RDir Reports will be saved to the specified directory --ReportDirectory=<Directory> <The directory AVD3 is invoked in>
--ReportFileName Reports will be saved/appended to the specified filename
Placeholders mentioned in --FileMove.Pattern can be used as well.
Additional placeholders: ReportName, ReportFileExtension
--ReportFileName=<FileName> ${FileName}.${ReportName}.${ReportFileExtension}
--ExtensionDifferencePath --EDPath Logs the filepath if the detected extension does not match the actual extension --EDPath=extdiff.txt
--CRC32Error Searches the filename for the calculated CRC32 hash. If not present or different a line with the caluclated hash and the full path of the file is appended to the specified path
The regex pattern should contain the placeholder ${CRC32} which is replaced by the calculated hash prior matching.
Consumer CRC32 will be force enabled!
--CRC32Error=<Filepath>,<RegexPattern> (, (?i)${CRC32})

Diagnostics

Parameter Shorthand Description Usage Default Value
--Version Print the program version to console --Version False
--SaveErrors Errors occuring during program execution will be saved to disk --SaveErrors False
--SkipEnvironmentElement Skip the environment element in error files --SkipEnvironmentElement False
--IncludePersonalData Various places may include personal data. Currently this only affects error files, which will then include the full filepath --IncludePersonalData Fale
--ErrorDirectory If --SaveErrors is specified the error files will be placed in the specified path --ErrorDirectory=<DirectoryPath> <The directory AVD3 is invoked in>
--NullStreamTest Use Memory as the DataSource for HashSpeed testing. Overrides any FileDiscovery Settings! --NullStreamTest=<StreamCount>:<StreamLength in MiB>:<ParallelStreamCount>

Display

Parameter Shorthand Description Usage Default Value
--HideBuffers Hides buffer bars --HideBuffers False
--HideFileProgress Hides file progress --HideFileProgress False
--HideTotalProgress Hides total progress --HideTotalProgress False
--ShowDisplayJitter Displays the time taken to calculate progression stats and drawing to console --ShowDisplayJitter False
--ForwardConsoleCursorOnly The cursor position of the console will not be explicitly set. This option will disable most progress output --ForwardConsoleCursorOnly False

AniDBAvmf

Parameter Shorthand Description Usage Default Value
--LocalPort --LPort Local UDP port used for ACReqing --LPort=<localport>
--ACreqErrorPath A line is added to the specified file for every error which occurred during an Avmf Package transmission --ACreqErrorPath=<FilePath>
--Authentication --Auth Enables ACReqing when valid credentials are provided
Visit https://anidb.net/user/setting (Account Tab) to set the api key
--Auth=<username>:<api_key>
--HostEndPoint --Host Change endpoint of AniDB UDP API server
AddressFamily: Can be 4=IPv4, 6=IPv6 or U=Unspecified
--Host=<hostname>:<hostport>[:<AddressFamily>] api.anidb.info:9002:U
--Timeout --TOut Sets the retry count and the timeout before resending the dump --TOut=<seconds>:<retries> 20:3

AniDBMisc

Parameter Shorthand Description Usage Default Value
--UploadErrors Enables the automatic upload of program errors. Please be aware that if --IncludePersonalData is enabled, personal data is uploaded as well! --UploadErrors False
--Ed2kLogPath Appends the ED2K-Link after a file has been processed into specified file separated by a line feed character (\n)
Consumer ED2K will be force enabled!
--Ed2KLogPath=<path>
--PrintEd2kLink Prints the ED2K-Link after a file has been processed into the console.
Consumer ED2K will be force enabled!
--PrintEd2kLink False

Auto-creqing

Simple diagram of the way auto-creqing with Avdump3 works

Avdump3 provides metadata for the AniDB auto-creqing system. Some more or less important notes:

  • To be able to use this feature you’ll need an AniDB account and you have to define the UDP API Key in your profile.
  • All data sent to the server will be logged with IP and uid.
  • There is no direct connection between data sent to AniDB and creqs generated. The data received will just be stored for later processing.
  • There is no way to check the current status for a dump. Usually, it should take at least 24 hours from the moment you dump a file till the data actually changes. If any irregularities occur, or when there is too much data pending, it will take more time.
  • You may dump files currently not in the database. The data is still stored and will be used later if/after the file has been registered.
  • The creqs generated will report the user who sent the data first (for a specific file) as the creqer.
  • Files creqed by the new system will be locked, meaning some fields will not be possible to change. Notify a moderator if you are sure that some of the data locked for a specific file is wrong.

Why didn't this file get dumped?

  • The file in AniDB is registered with wrong size and/or ED2K hash.
  • The package never reached the server.
  • The decryption or decompression failed at server side (rare).
  • The dump is not valid XML (rare).

Why isn't this file verified?

Even if the file is dumped it doesn't mean it will get auto-creqed (and verified). Here are the reasons:

  • The file is corrupt/invalid.
  • The provided data is considered "incoherent". (Indicates a bug in Avdump.)
  • The dump was marked unfit for file verification by an AniDB Moderator.

Development

Source code for the Core project can be found at GitHub: https://github.com/DvdKhl/AVDump3

The source code for the AniDB Module is not available to the public.

Planned Features

Graphical User Interface

File moving/renaming with AniDB Http Api support

Changelog

0.1.8213.0 : 2020.09.27

  • Fixed ED2k hash calculation for files with a file size that is a multiple of 9728000 bytes (Issue #48)
    • Included alternative ED2k link in ed2k log
    • Added <ed2k_alt/> node to output xml
  • Don't crash when a file to be processed cannot be opened (Issue #53)
  • Switched from FileShare.Read to FileShare.ReadWrite for writing to files
  • Added TiB/s
  • Set UTF8 as Console output when UTF8OUT is added as an arg (console wide!)
  • Added PARSEARGS which will make AVD3 parse the provided arguments itself (windows only), fixing the double quote escape problem
  • Don't crash when the console cursor cannot be manipulated
  • Output final progress before termination
  • Fixed various small bugs

0.1.8188.0 : 2020.08.23

  • Fixed various small bugs

0.1.8187.0 : 2020.08.22

  • Fixed various small bugs

0.1.8185.0 : 2020.08.17

  • Updated MIL to version 20.08
  • Fixed various small bugs

0.1.8173.0 : 2020.08.15

  • Fixed ReportFileName throwing an exception when used
  • EffectiveCommandLineArguments now with sub elements

0.1.8167.0 : 2020.08.14

  • Fixed crash while displaying the progress
  • Switched to portable pdb (linenumbers in linux)
  • Implemented "UserValueType" for setting properties (Issue #45)
    • Passwords are now marked with PasswordType for UserValueType
    • Errors now report effective arguments when IncludePersonalData is enabled
      • Properties with type Password are hidden

0.1.8163.0 : 2020.08.11

  • Skip reporting PathTooLongException exception when generating a report (Issue #42)
  • Sanity check for ogg bitstreams (Issue #41)
  • Own class to write lines to text files instead of using File.AppendAllText (Issue #40)
  • Fixed Bar display exception (Issue #39)
  • Fixed Exception when generating AniDBReport (Issue #A23)
  • Fixed System.NullReferenceException when ED2K consumer was not enabled (Issue #A22)

0.1.8152.0 : 2020.08.10

  • Use comma instead of colon for --CRC32Error (Issue #37)
  • Added suggestion when Mirrored Buffer cannot be created (Issue #36)
  • Unified FileMove.Pattern and ReportFileName Placeholders (Issue #35)
  • When the terminal window is too small (<72) the output went haywire (Issue #34)
  • Added --PrintEd2kLink (Issue #33)
  • Added --Version (Issue #32)

0.1.8134.0 : 2020.08.09

  • semi-public pre-release