340
edits
Epoximator (talk | contribs) |
m (not finished) |
||
Line 10: | Line 10: | ||
== General Workflow == | == General Workflow == | ||
Involved Parties: | |||
* client (avdump), locally on user machine | |||
* main server (anidb2), keeps file meta data incl. fingerprints | |||
* matching redirector (anidb2), a simple load balancer which redirects all requests to the corresponding matching server(s) (IDENT goes to all, STORE to one), TCP and UDP interface (UDP for external queries, TCP for cron job/batch queries) | |||
* matching server(s) (anidb3/sig server), java standalone app does all the fingerprint<->fingerprint matching, TCP interface | |||
Workflow - Synchronous: | |||
* user runs client on local audio files | |||
* client sends file meta data, including foosic audio fingerprint, to main server via UDP API | |||
** optional intermediate step to speedup client processing: calculate content hash first and generate and submit the fingerprint only if the content hash is unknown to AniDB or no fingerprint is listed on AniDB for that file | |||
* main server does not do any processing on the fingerprint and simply stores it in ostfiletb | |||
Workflow - Asynchronous: | |||
* cronjob/daemon on main server regularly checks for newly added foosic fingerprints and sends them, together with the ostfile id to the matching redirector via TCP | |||
** flooding is impossible due to the synchronous nature of the TCP matching redirector api | |||
* matching redirector forwards the IDENT query to all matching servers via TCP | |||
** TCP connections to all matching servers should be kept alive inbetween queries (prevent TCP connection handshake overhead) | |||
* matching servers try to match the fingerprint to all potentially interesting fingerprints in their database | |||
** pre-match filtering: via length, avg dom and avg fit | |||
** post-match filtering: via hard cut-off-point for matching, this will definitely be >=0,50 anything lower would be useless. It will be possible to increase this value without problems at any point in time. However, reducing it would be a problem. We might therefore want to just start with 0,50 pr 0,60 and increase it if we feel that it is necessary to reduce the amount of matches returned. | |||
** potentially further internal optimizations, i.e. by identifying group representatives for certain well matching groups of files and only matching taking the representatives into account when matching | |||
* each matching server replies with a list of matching ostfile ids together with the magnitude of error per match and some general usage data (for load balancing). | |||
** neither fingerprints nor matching results are stored on the matching servers | |||
** matching servers keep some internal usage statistics for potential future optimization | |||
* matching redirector collects the replies from all matching servers and collates them into one reply which is then returned to the main server cron daemon. | |||
** for the main server/cron daemon it is not visible which match came from which matching server | |||
* main server stores the matching data in the db | |||
** (ofid1 int4, ofid2 int4, matching float) | |||
* main server uses matching data to support manual file<->song matching via the webinterface | |||
** the user will be able select the cut-off-point for the matching value on-the-fly in order to reduce false-positives or increase recall | |||
Workflow - New | |||
OLD: | |||
alternatively: | alternatively: | ||
Line 85: | Line 113: | ||
Every query should contain as additional paramets: | Every query should contain as additional paramets: | ||
* client={str client name}&clientver={int client version} | * client={str client name}&clientver={int client version} | ||
Access to the STORE command will be limited to the main server's cron job by this method. | |||
==== | ==== Querying a foosic fingerprint (don't add if unknown) ==== | ||
(used by external clients and the main server's cron job) | |||
Client: | Client: | ||
* | * MATCH ofid={int4 ostfile id}&foosic={str ascii hex representation of fingerprint} | ||
Server Reply: | Server Reply: | ||
* | * Matchings found | ||
: 200 MATCHED | |||
: {int result count}|{int compare count}|{int time taken in ms} | |||
** this line will be suppressed by the matching redirector which processes it to decide where to store a new fingerprint (load balancing) | |||
: ({int error}|{int ofid}\n)* | |||
* No matchings found | |||
: 300 UNMATCHED | |||
==== | ==== Submitting a new foosic fingerprint ==== | ||
(this | (this is only used by the main server's cron job, access is restricted) | ||
Client: | Client: | ||
* | * STORE ofid={int4 ostfile id}&foosic={str ascii hex representation of fingerprint} | ||
Server Reply: | Server Reply: | ||
* | * Fingerprint was not yet in DB | ||
** | : 210 STORED | ||
: {int2 ident of matching server this fingerprint is stored on} | |||
** this line is inserted by the matching redirector, the data is only interesting for the main server's cron job | |||
* | * Fingerprint was already in DB | ||
** | : 310 ALREADY STORED | ||
: {int2 ident of matching server this fingerprint is stored on} | |||
** this line is inserted by the matching redirector, the data is only interesting for the main server's cron job | |||
==== | |||
(used | ==== Submitting a new foosic fingerprint ==== | ||
(this is only used by the main server's cron job, access is restricted) | |||
Client: | Client: | ||
* | * DELETE ofid={int4 ostfile id} | ||
Server Reply: | Server Reply: | ||
* | * Fingerprint was in DB | ||
: 220 DELETED | |||
* Fingerprint was not in DB | |||
: 320 NOT FOUND | |||
==== Query the current server load/utilization ==== | ==== Query the current server load/utilization ==== | ||
(this is only used by the match redirector, access is restricted) | |||
Client: | Client: | ||
Line 127: | Line 173: | ||
Server Reply: | Server Reply: | ||
: 299 LOAD STAT | |||
: {int2 load factor}|{int4 number of fingerprints in db}|{int2 system load} | |||
** load factor: a simply multiplicative constant which is used to distinguish between fast and slow server hardware. This can i.e. be used to store twice as many fingerprints on one server compared to others. The fingerprint count is converted according to the following formula prior to comparison/selection of least used server. | ** load factor: a simply multiplicative constant which is used to distinguish between fast and slow server hardware. This can i.e. be used to store twice as many fingerprints on one server compared to others. The fingerprint count is converted according to the following formula prior to comparison/selection of least used server. | ||
*** relative fingerprint number/load = number of fingerprints in db * (load factor / 100) | *** relative fingerprint number/load = number of fingerprints in db * (load factor / 100) | ||