Auto-creqing: Difference between revisions

From AniDB
Jump to navigation Jump to search
No edit summary
 
(12 intermediate revisions by 5 users not shown)
Line 1: Line 1:
See [[Avdump#Autocreqing]].
{{TOCright}}


Autocreqing deserves its own write-up separate from Avdump...
==How it works==
[[Image:Autocreq.gif|thumb|Simple diagram of how ''the new system'' works]]
 
===Old system===
The "old" auto-creq system is part of the TCP API and is still in use. The API basically allows editing/creqing of many AniDB data types but has a special option for files and file tracks; auto-creq (which means ''this creq was automatically generated by the client''). An auto-creq does not necessary have to be handled by a moderator; it will be granted automatically if it was submitted by a trusted client '''and''' passes some basic sanity tests. (Automatic granting is however delayed 24h to be able to stop/revert creqs from "bad" clients.)
 
This means that all TCP API clients can automatically creq files as they see fit and there is no way to control this from the sever side except banning clients completely. This has never been a big issue, though, since it is only one TCP API client (AOM).
 
===New system===
The "new" system, which is used by Avdump, is not part of any API and not documented anywhere. (The server side part is however related the UDP API server). Instead of checking against AniDB data and issue creqs directly when necessary, Avdump just sends the information (per file) to the server without any regard. It is entirely up to the server to decide what to do with the raw data that is received. This new approach was implemented to shift the control from the clients to the server.
 
The different parts of the system is; Avdump, AVMF server, AVMF service, auto-granter and the {{AniDBLink|avmf|avmf page}}.
 
====Avdump====
Written in C++.
# Hash the file.
# Extract a/v metadata.
# Store metadata in the [[AVMF]] format, called a ''dump''.
# Compress, encrypt and send the data to the AVMF server.
 
====AVMF server====
Runs in the same process as the UDP API Server, written in Java. Is supposed to be very light and simple to minimize possible issues (downtime).
# Accept/read new AVMF package.
# Decrypt, decompress the package. If it should fail the package will just be disregarded.
# Validate the dump (XML validation).
#: Invalid dumps will only be stored for Avdump debugging purposes.
# Map the dump to a AniDB file and store the dump in database.
#: If no file is found, the dump is marked unknown. It will be mapped to a file when/if the file is added (by the site code). The AVMF Service does also do some maintenance in this regard.
 
====AVMF service====
Runs every 5 min in the same process as the UDP API Server, written in Java.
# Query new AVMF dumps which has been mapped to at least 10 minutes old files (grace period).
## Check the metadata against the mapped AniDB file.
## Create a automatic change request if needed.
## Mark the dump as used.
 
====Autogranter====
Cron job written in Perl, actually a part of the ''old system''. The main reasons for a separate granter is to have a double sanity check and enough delay to notice (and halt) possible errors.
# Query automatic change requests older than 24h.
## Check that the request is sane and untouched (by owner/mod).
## Grant the request if OK.
 
====Avmf page====
A page that lists all dumps, with tons of filters. The main use of this page is to (some are mod only):
* check if a file has been dumped.
* who dumped it? when? how many has confirmed it? version of Avdump?
* check why a file has not been verified
* unlock a file
* statistics
* clean/moderate
 
===Conflicting data===
The main challenge, for both systems, lies in the fact that extracting metadata is not an exact science, or at least not trivial: Different clients are bound to come up with different data which means that the system has to handle conflicting data one way or the other. This issue has basically been evaded by only having one trusted client (and version) which is Avdump (.31) ATM. AOM does still issue auto-creqs but only hash sums and duration are included and only for files that are not already verified by Avdump. Data from Avdump will override data from AOM if they should differ (never happened for hash sums).
 
Avdump is supposed to be a shared library that all clients can use for auto-creqing, though. It is however not clear how it should be done yet.
 
==AVMF states==
 
===Automatic states===
These states are reserved for the AVMF server/service. The only exception is the state ''new'': Dumps will stay even when files are deleted from the database. The state of those dumps will however be reset to ''new''. This is done by the site code.
 
{| class="wikitable sortable"
|-
! State !! ID !! class="unsortable"|Description
|-
| new || 0 || The dump has not been handled by the AVMF service yet; it's either too new or not supported (file format). It's also possible that the dump has been ''reset'' by a moderator.
|-
| used || 11 || A file is considered ''verified'' if it has one ''used'' dump. There can only be one dump with this state for each file at once. In the process of changing the state of a dump to ''used'', automatic change requests will be generated if needed.
|-
| deprecated || 13 || The dump was in use at some point, but has been automatically replaced by an other. Dumps of newer versions (produced by newer versions of Avdump) will automatically replace dumps of older versions ''unless it requires new creqs''. Dumps with this state might be purged regularly at some point, but that is not enabled ATM.
|-
| candidate || 23 || The dump is not used because the related file is already verified by another dump of an older version of Avdump. The service will never issue creqs for files that has already been verified. A moderator can however delete the ''used'' dump (or smite it) and then ''reset'' the candidate if needed.
|-
| unseen || 21 || The dump is not used because the related file is already verified by another dump of the same version of Avdump. It basically means that two copies of the same Avdump version has generated two different dumps (usually encoding issues due locale). It is the ''most seen'' dump that was supposed to be used in cases like these, however, ''first come, first served'' is currently implemented.
|-
| stalled || 33 || The dump is stalled due a pending change request of the related file. Stalled dumps will be reset to ''new'' automatically after some time.
|-
| incoherent || 47 || The dump is not trusted, and thus not used, because the track bitrates does not match up to the file size. Too much overhead in other words. The threshold is 6% ATM.
|-
| exception || -11 || The AVMF server threw an exception when handling an incoming dump.
|-
| failed || 45 || The AVMF service failed to process the dump.
|-
| codec || 31 || Unknown codec. The AVMF service has to be updated.
|-
| failed-xml || 41 ||
|-
| failed-db || 43 ||
|}
 
===Manual states===
These states can only be set by moderators. They should be avoided as far as possible; keeping the dump ''used'' is preferred.
{| class="wikitable sortable"
|-
! State !! ID !! class="unsortable"|Description
|-
| reset || 83 || A moderator wants the dump to be re-handled by the AVMF service, usually because he/she wants to unlock the related file temporarily. Lifecycle: new -> used -> reset -> new -> used. It would be better to improve/change the interface (regarding locks) instead of using this state.
|-
| smitten ||  || The dump is considered invalid by a moderator. This state can be used by developers to pick up and handle issues with Avdump. It is better to use [[Avdump issues]] to keep track of these dumps, though.
|-
| forgotten ||  || When a Avdump issue has been resolved the related ''smitten'' dumps should be deleted. Dumps related to issues that's never resolved should be marked ''forgotten''. (It is in other words just a second state of ''smitten''). This state is not really used.
|-
| deleted ||  || Not a state. Moderators are allowed to delete dumps completely if needed. Usually only done when cleaning up after old Avdump versions.
|}
 
[[Category:Avdump]]
[[Category:Development]]

Latest revision as of 12:21, 17 February 2011

How it works

Simple diagram of how the new system works

Old system

The "old" auto-creq system is part of the TCP API and is still in use. The API basically allows editing/creqing of many AniDB data types but has a special option for files and file tracks; auto-creq (which means this creq was automatically generated by the client). An auto-creq does not necessary have to be handled by a moderator; it will be granted automatically if it was submitted by a trusted client and passes some basic sanity tests. (Automatic granting is however delayed 24h to be able to stop/revert creqs from "bad" clients.)

This means that all TCP API clients can automatically creq files as they see fit and there is no way to control this from the sever side except banning clients completely. This has never been a big issue, though, since it is only one TCP API client (AOM).

New system

The "new" system, which is used by Avdump, is not part of any API and not documented anywhere. (The server side part is however related the UDP API server). Instead of checking against AniDB data and issue creqs directly when necessary, Avdump just sends the information (per file) to the server without any regard. It is entirely up to the server to decide what to do with the raw data that is received. This new approach was implemented to shift the control from the clients to the server.

The different parts of the system is; Avdump, AVMF server, AVMF service, auto-granter and the avmf page.

Avdump

Written in C++.

  1. Hash the file.
  2. Extract a/v metadata.
  3. Store metadata in the AVMF format, called a dump.
  4. Compress, encrypt and send the data to the AVMF server.

AVMF server

Runs in the same process as the UDP API Server, written in Java. Is supposed to be very light and simple to minimize possible issues (downtime).

  1. Accept/read new AVMF package.
  2. Decrypt, decompress the package. If it should fail the package will just be disregarded.
  3. Validate the dump (XML validation).
    Invalid dumps will only be stored for Avdump debugging purposes.
  4. Map the dump to a AniDB file and store the dump in database.
    If no file is found, the dump is marked unknown. It will be mapped to a file when/if the file is added (by the site code). The AVMF Service does also do some maintenance in this regard.

AVMF service

Runs every 5 min in the same process as the UDP API Server, written in Java.

  1. Query new AVMF dumps which has been mapped to at least 10 minutes old files (grace period).
    1. Check the metadata against the mapped AniDB file.
    2. Create a automatic change request if needed.
    3. Mark the dump as used.

Autogranter

Cron job written in Perl, actually a part of the old system. The main reasons for a separate granter is to have a double sanity check and enough delay to notice (and halt) possible errors.

  1. Query automatic change requests older than 24h.
    1. Check that the request is sane and untouched (by owner/mod).
    2. Grant the request if OK.

Avmf page

A page that lists all dumps, with tons of filters. The main use of this page is to (some are mod only):

  • check if a file has been dumped.
  • who dumped it? when? how many has confirmed it? version of Avdump?
  • check why a file has not been verified
  • unlock a file
  • statistics
  • clean/moderate

Conflicting data

The main challenge, for both systems, lies in the fact that extracting metadata is not an exact science, or at least not trivial: Different clients are bound to come up with different data which means that the system has to handle conflicting data one way or the other. This issue has basically been evaded by only having one trusted client (and version) which is Avdump (.31) ATM. AOM does still issue auto-creqs but only hash sums and duration are included and only for files that are not already verified by Avdump. Data from Avdump will override data from AOM if they should differ (never happened for hash sums).

Avdump is supposed to be a shared library that all clients can use for auto-creqing, though. It is however not clear how it should be done yet.

AVMF states

Automatic states

These states are reserved for the AVMF server/service. The only exception is the state new: Dumps will stay even when files are deleted from the database. The state of those dumps will however be reset to new. This is done by the site code.

State ID Description
new 0 The dump has not been handled by the AVMF service yet; it's either too new or not supported (file format). It's also possible that the dump has been reset by a moderator.
used 11 A file is considered verified if it has one used dump. There can only be one dump with this state for each file at once. In the process of changing the state of a dump to used, automatic change requests will be generated if needed.
deprecated 13 The dump was in use at some point, but has been automatically replaced by an other. Dumps of newer versions (produced by newer versions of Avdump) will automatically replace dumps of older versions unless it requires new creqs. Dumps with this state might be purged regularly at some point, but that is not enabled ATM.
candidate 23 The dump is not used because the related file is already verified by another dump of an older version of Avdump. The service will never issue creqs for files that has already been verified. A moderator can however delete the used dump (or smite it) and then reset the candidate if needed.
unseen 21 The dump is not used because the related file is already verified by another dump of the same version of Avdump. It basically means that two copies of the same Avdump version has generated two different dumps (usually encoding issues due locale). It is the most seen dump that was supposed to be used in cases like these, however, first come, first served is currently implemented.
stalled 33 The dump is stalled due a pending change request of the related file. Stalled dumps will be reset to new automatically after some time.
incoherent 47 The dump is not trusted, and thus not used, because the track bitrates does not match up to the file size. Too much overhead in other words. The threshold is 6% ATM.
exception -11 The AVMF server threw an exception when handling an incoming dump.
failed 45 The AVMF service failed to process the dump.
codec 31 Unknown codec. The AVMF service has to be updated.
failed-xml 41
failed-db 43

Manual states

These states can only be set by moderators. They should be avoided as far as possible; keeping the dump used is preferred.

State ID Description
reset 83 A moderator wants the dump to be re-handled by the AVMF service, usually because he/she wants to unlock the related file temporarily. Lifecycle: new -> used -> reset -> new -> used. It would be better to improve/change the interface (regarding locks) instead of using this state.
smitten The dump is considered invalid by a moderator. This state can be used by developers to pick up and handle issues with Avdump. It is better to use Avdump issues to keep track of these dumps, though.
forgotten When a Avdump issue has been resolved the related smitten dumps should be deleted. Dumps related to issues that's never resolved should be marked forgotten. (It is in other words just a second state of smitten). This state is not really used.
deleted Not a state. Moderators are allowed to delete dumps completely if needed. Usually only done when cleaning up after old Avdump versions.