OstDB DEV Foosic: Difference between revisions

Line 11: Line 11:
== General Workflow ==
== General Workflow ==


Involved Parties:
=== Workflow Draft ===
 
==== Involved Parties ====
* client (avdump), locally on user machine
* client (avdump), locally on user machine
* main server (anidb2), keeps file meta data incl. fingerprints
* main server (anidb2), keeps file meta data incl. fingerprints
Line 17: Line 19:
* matching server(s) (anidb3/sig server), java standalone app does all the fingerprint<->fingerprint matching, TCP interface
* matching server(s) (anidb3/sig server), java standalone app does all the fingerprint<->fingerprint matching, TCP interface


Workflow - Synchronous:
 
==== Workflow - Matching ====
 
===== Synchronous Part =====
* user runs client on local audio files
* user runs client on local audio files
* client sends file meta data, including foosic audio fingerprint, to main server via UDP API
* client sends file meta data, including foosic audio fingerprint, to main server via UDP API
Line 23: Line 28:
* main server does not do any processing on the fingerprint and simply stores it in ostfiletb
* main server does not do any processing on the fingerprint and simply stores it in ostfiletb


Workflow - Asynchronous:
 
* cronjob/daemon on main server regularly checks for newly added foosic fingerprints and sends them, together with the ostfile id to the matching redirector via TCP
===== Asynchronous Part =====
* cronjob/daemon on main server regularly checks for newly added foosic fingerprints and sends them, together with the ostfile id to the matching redirector via TCP with store set to 1
** flooding is impossible due to the synchronous nature of the TCP matching redirector api
** flooding is impossible due to the synchronous nature of the TCP matching redirector api
* matching redirector forwards the IDENT query to all matching servers via TCP
* matching redirector forwards the IDENT query to all matching servers via TCP
Line 33: Line 39:
** potentially further internal optimizations, i.e. by identifying group representatives for certain well matching groups of files and only matching taking the representatives into account when matching
** potentially further internal optimizations, i.e. by identifying group representatives for certain well matching groups of files and only matching taking the representatives into account when matching
* each matching server replies with a list of matching ostfile ids together with the magnitude of error per match and some general usage data (for load balancing).
* each matching server replies with a list of matching ostfile ids together with the magnitude of error per match and some general usage data (for load balancing).
** neither fingerprints nor matching results are stored on the matching servers
** neither fingerprints nor matching results are stored on the matching servers at this point
** matching servers keep some internal usage statistics for potential future optimization
** matching servers keep some internal usage statistics for potential future optimization
* matching redirector collects the replies from all matching servers and collates them into one reply which is then returned to the main server cron daemon.
** if a matching server already stores this exact ofid, it will add a note about this fact to it's reply.
** for the main server/cron daemon it is not visible which match came from which matching server
* the matching redirector collects the replies from all matching servers and collates them into one reply for _later_ transmittion to the main server.
* the matching redirector then decides which matching server should store the new fingerprint based on the usage statistics returned by each matching server and sends that server a STORE command. That servers identifier is then included with the reply to the main server.
* the matching redirector replies to the main server (matching results + ident of matching server)
** for the main server/cron daemon it is not visible which _match_ came from which matching server
* main server stores the matching data in the db
* main server stores the matching data in the db
** (ofid1 int4, ofid2 int4, matching float)
** (ofid1 int4, ofid2 int4, matching float) as ostfilefoosictb
** (...,foosicfp bytea,foosicsrv int2,...) in ostfiletb
* main server uses matching data to support manual file<->song matching via the webinterface
* main server uses matching data to support manual file<->song matching via the webinterface
** the user will be able select the cut-off-point for the matching value on-the-fly in order to reduce false-positives or increase recall
** the user will be able select the cut-off-point for the matching value on-the-fly in order to reduce false-positives or increase recall




Workflow - New
==== Workflow - Deletion ====
(external clients have no deletion permission)
* an ost file is deleted on the main server
* it is appended to a special deletion queue table
* a cron job processes the table in regular intervals and sends DELETE requests for each deleted ofid to the matching redirector.
** if the storage location of the ofids fingerprint is known to the main server, it will include that info in the DELETE request
* the matching redirector will either forward the DELETE request to the specified matching server or to all matching servers, if none was specified
* the matching servers will delete all local data for the given ofid
* the results are collated by the matching redirector and returned to the main server's cron job




OLD:
=== OLD ===


alternatively:
alternatively:
MediaWiki spam blocked by CleanTalk.
MediaWiki spam blocked by CleanTalk.