OstDB DEV Foosic: Difference between revisions

Jump to navigation Jump to search
m
no edit summary
mNo edit summary
mNo edit summary
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{TOCright}}
{{TOCright}}
Protocol for foosic audio fingerprint matching related client<->server communication via UDP and TCP.
Protocol for foosic audio fingerprint matching related client<->server communication via UDP and TCP.


VERSION 2
VERSION 2


Servers:
Servers:
* dedicated Java matching server daemon TCP on anidb3 (sig) (does not accept client requests directly)
* dedicated Java matching server daemon TCP on anidb3 (sig) (does not accept client requests directly)
* dedicated Java redirector/load balancer daemon TCP and UDP on anidb2 (main/api) (UDP for single queries, TCP for batch runs)
* dedicated Java redirector/load balancer daemon TCP and UDP on anidb2 (main/api) (UDP for single queries, TCP for batch runs)


Clients:
Clients:
* cronjob on anidb2
* cronjob on anidb2
* clients on user systems
* clients on user systems


== General Workflow ==
== General Workflow ==
=== Workflow Draft ===
=== Workflow Draft ===
==== Involved Parties ====
==== Involved Parties ====
* client (avdump), locally on user machine
* client (Avdump), locally on user machine
* main server (anidb2), keeps file meta data incl. fingerprints
* main server (anidb2), keeps file meta data includes fingerprints
* matching redirector (anidb2), a simple load balancer which redirects all requests to the corresponding matching server(s) (MATCH goes to all, STORE to one), TCP and UDP interface (UDP for external queries, TCP for cron job/batch queries)
* matching redirector (anidb2), a simple load balancer which redirects all requests to the corresponding matching server(s) (MATCH goes to all, STORE to one), TCP and UDP interface (UDP for external queries, TCP for cron job/batch queries)
* matching server(s) (anidb3/sig server), Java standalone app does all the fingerprint<->fingerprint matching, TCP interface
* matching server(s) (anidb3/sig server), Java standalone app does all the fingerprint<->fingerprint matching, TCP interface


==== Workflow - Matching ====
==== Workflow - Matching ====
===== Synchronous Part =====
===== Synchronous Part =====
* user runs client on his local audio files
* user runs client on his local audio files
Line 33: Line 27:
** optional intermediate step to speedup client processing: calculate content hash first and generate and submit the fingerprint only if the content hash is unknown to AniDB or no fingerprint is listed on AniDB for that file
** optional intermediate step to speedup client processing: calculate content hash first and generate and submit the fingerprint only if the content hash is unknown to AniDB or no fingerprint is listed on AniDB for that file
* main server does not do any processing on the fingerprint and simply stores it in ostfiletb
* main server does not do any processing on the fingerprint and simply stores it in ostfiletb


===== Asynchronous Part =====
===== Asynchronous Part =====
Line 60: Line 53:
* main server uses matching data to support manual file<->song matching via the web interface
* main server uses matching data to support manual file<->song matching via the web interface
** the user will be able select the cut-off-point for the matching value on-the-fly in order to reduce false-positives or increase recall
** the user will be able select the cut-off-point for the matching value on-the-fly in order to reduce false-positives or increase recall


==== Workflow - Deletion ====
==== Workflow - Deletion ====
Line 71: Line 63:
* the matching servers will delete all local data for the given ofid
* the matching servers will delete all local data for the given ofid
* the results are collated by the matching redirector and returned to the main server's cron job
* the results are collated by the matching redirector and returned to the main server's cron job


== Possible Extension ==
== Possible Extension ==
(for client features; UDP API)
(for client features; UDP API)
* command to fetch audio meta data by ostfile id or size+content hash
* command to fetch audio meta data by ostfile id or size+content hash
* command to add audio file to mylist by ostfile id or size+content hash
* command to add audio file to MyList by ostfile id or size+content hash
* same commands as available via TCP (used by the anidb2 cronjob) also available via UDP for use by other clients.
* same commands as available via TCP (used by the anidb2 cronjob) also available via UDP for use by other clients.
** i.e. to allow lookups by foosic fingerprint. For that a client would first contact the UDP API (anidb2) with the content hash and if the content hash is unknown to AniDB it would send the fingerprint to the matching redirector (anidb2), which would delegate to the matching servers (anidb3), to get one or more ostfile id(s) and then use those to query song data from the UDP API.
** i.e. to allow lookups by foosic fingerprint. For that a client would first contact the UDP API (anidb2) with the content hash and if the content hash is unknown to AniDB it would send the fingerprint to the matching redirector (anidb2), which would delegate to the matching servers (anidb3), to get one or more ostfile id(s) and then use those to query song data from the UDP API.
*** this is not meant for avdump, but it might be interesting for direct integration into player software, i.e. a winamp/amarok plugin, would work somewhat like the already available musicbrainz plugins
*** this is not meant for Avdump, but it might be interesting for direct integration into player software, i.e. a winamp/amarok plugin, would work somewhat like the already available musicbrainz plugins.


== General Considerations ==
== General Considerations ==
Line 85: Line 76:
** the number of fingerprints to consider for each matching lookup could be reduced further by only taking one representative fingerprint for each closely matching group of fingerprints into account.
** the number of fingerprints to consider for each matching lookup could be reduced further by only taking one representative fingerprint for each closely matching group of fingerprints into account.
*** i.e. if there are 20 files for one song and the fingerprints of 18 of those files match with a confidence of NN% (the actual confidence number to use will be hard to decide on, might be something like 98%) then the median of that group (the file which has the closest cumulated match with all other files of that group) could be picked as a representative and all other 17 fingerprints could be skipped during matching lookups. If we always want to return all matches, then the remaining 17 fingerprints could be matched in a second run once the matching with the representative yielded a value above the cut-off-point.
*** i.e. if there are 20 files for one song and the fingerprints of 18 of those files match with a confidence of NN% (the actual confidence number to use will be hard to decide on, might be something like 98%) then the median of that group (the file which has the closest cumulated match with all other files of that group) could be picked as a representative and all other 17 fingerprints could be skipped during matching lookups. If we always want to return all matches, then the remaining 17 fingerprints could be matched in a second run once the matching with the representative yielded a value above the cut-off-point.
*** depending on the number of encodes per song and the closeness of their match such an optimization might well reduce the number of fingerprints to consider per lookup by a factor of 20 or more
*** depending on the number of encodes per song and the closeness of their match such an optimization might well reduce the number of fingerprints to consider per lookup by a factor of 20 or more.
*** possible storage: additional grouprep int4 field which stores ofid of group representative if an entry is part of a group. Group representatives and fingerprints not belonging to any group would have a value of 0. The initial matching lookup could simply restrict the SELECT to WHERE grouprep=0 (in addition to the len, dom and fit constraints).
*** possible storage: additional grouprep int4 field which stores ofid of group representative if an entry is part of a group. Group representatives and fingerprints not belonging to any group would have a value of 0. The initial matching lookup could simply restrict the SELECT to WHERE grouprep=0 (in addition to the len, dom and fit constraints).
* as further optimization may become necessary someday, we should collect some usage statistics per fingerprint in order to identify hotspots and areas of very low interest. Data to collect could be:
* as further optimization may become necessary someday, we should collect some usage statistics per fingerprint in order to identify hotspots and areas of very low interest. Data to collect could be:
Line 96: Line 87:
** loadbalancing matching redirector listens for foosic fingerprint lookups each received lookup is send to _all_ matching servers
** loadbalancing matching redirector listens for foosic fingerprint lookups each received lookup is send to _all_ matching servers
** each matching server replies with ostfile ids and match confidence or with "unknown"
** each matching server replies with ostfile ids and match confidence or with "unknown"
** matching redirector merges all ostfile id replies together (sorted by match confidence) and returns reply to client
** matching redirector merges all ostfile id replies together (sorted by match confidence) and returns reply to client.
** if none of the matching servers indicated that it has the exact fingerprint in his local storage, the matching redirector tells a matching server with free resources to store the fingerprint.
** if none of the matching servers indicated that it has the exact fingerprint in his local storage, the matching redirector tells a matching server with free resources to store the fingerprint.
*** the decision is made based on the observed fingerprint distribution of the matching servers. The MATCH reply from the matching servers lists the number of fingerprints which needed to be taken into account for the specific match. The server with the smallest number of fingerprints with similar length, avg. fit and avg. dom would be the best place to store the new fingerprint. Other factors could also be taken into account.
*** the decision is made based on the observed fingerprint distribution of the matching servers. The MATCH reply from the matching servers lists the number of fingerprints which needed to be taken into account for the specific match. The server with the smallest number of fingerprints with similar length, avg. fit and avg. dom would be the best place to store the new fingerprint. Other factors could also be taken into account.
Line 102: Line 93:


== Protocol ==
== Protocol ==
=== Broken Clients ===
=== Broken Clients ===
* we probably want to require a client string and client version in every query (similar to UDP API) to be able to ban badly broken clients, should the need arise someday.
* we probably want to require a client string and client version in every query (similar to UDP API) to be able to ban badly broken clients, should the need arise someday.


=== Protocol Draft ===
=== Protocol Draft ===
 
Every query should contain as additional parameters:
Every query should contain as additional paramets:
client={str client name}&clientver={int client version}
* client={str client name}&clientver={int client version}


Access to the STORE command will be limited to the main server's cron job by this method.
Access to the STORE command will be limited to the main server's cron job by this method.


==== Querying a foosic fingerprint (don't add if unknown) ====
==== Querying a foosic fingerprint (don't add if unknown) ====
Used by:
Used by:
* external clients  
* external clients  
* main server's cron job
* main server's cron job
* matching redirector (forwarded)
* matching redirector (forwarded)


Client:
Client:
* MATCH ofid={int4 ostfile id}&foosic={str ascii hex representation of fingerprint}[&store=1]
MATCH ofid={int4 ostfile id}&foosic={str ascii hex representation of fingerprint}[&store=1]
** the store=1 parameter is filtered out and interpreted by the matching redirector, only the main server's cron job is allowed to set store=1
: The <tt>store=1</tt> parameter is filtered out and interpreted by the matching redirector, only the main server's cron job is allowed to set <tt>store=1</tt>.
 


Server Reply:
Server Reply:
Line 140: Line 125:
if the ofid being queried is stored on that specific matching server
if the ofid being queried is stored on that specific matching server
:* the matching redirector will filter 201 replies, a normal client will never see them.
:* the matching redirector will filter 201 replies, a normal client will never see them.


* No matchings found
* No matchings found
Line 153: Line 137:
if the ofid being queried is stored on that specific matching server
if the ofid being queried is stored on that specific matching server
:* the matching redirector will filter 201 replies, a normal client will never see them.
:* the matching redirector will filter 201 replies, a normal client will never see them.


==== Submitting a new foosic fingerprint ====
==== Submitting a new foosic fingerprint ====
Used by:
Used by:
* matching redirector, access is restricted
* matching redirector, access is restricted


Client:
Client:
* STORE ofid={int4 ostfile id}&foosic={str ascii hex representation of fingerprint}
STORE ofid={int4 ostfile id}&foosic={str ascii hex representation of fingerprint}
 


Server Reply:
Server Reply:
Line 171: Line 151:
* Fingerprint was already in DB
* Fingerprint was already in DB
: 310 ALREADY STORED
: 310 ALREADY STORED


==== Submitting a new foosic fingerprint ====
==== Submitting a new foosic fingerprint ====
Used by:
Used by:
* main server's cron job, access is restricted
* main server's cron job, access is restricted
* matching redirector (forwarded), access is restricted
* matching redirector (forwarded), access is restricted


Client:
Client:
* DELETE ofid={int4 ostfile id}[&{int2 ident of matching server this fingerprint is stored on}]
DELETE ofid={int4 ostfile id}[&{int2 ident of matching server this fingerprint is stored on}]
 


Server Reply:
Server Reply:
Line 190: Line 166:
* Fingerprint was not in DB
* Fingerprint was not in DB
: 320 NOT FOUND
: 320 NOT FOUND


==== Query the current server load/utilization ====
==== Query the current server load/utilization ====
Used by:
Used by:
* matching redirector, access is restricted
* matching redirector, access is restricted


Client:
Client:
* LOADSTAT
LOADSTAT
 


Server Reply:
Server Reply:
1,633

edits

Navigation menu

MediaWiki spam blocked by CleanTalk.
MediaWiki spam blocked by CleanTalk.