OstDB DEV Foosic: Difference between revisions

From AniDB
Jump to navigation Jump to search
mNo edit summary
Line 19: Line 19:
* anidb2 uses unique id(s) to support manual file<->song matching via the webinterface
* anidb2 uses unique id(s) to support manual file<->song matching via the webinterface


alternatively:
* avdump sends data to anidb2 udpapi
* anidb2 udpapi stores all data
* anidb2 cronjob sends <tt>fingerprint</tt> to matching slaves
* matching slaves returns best matches: <tt>ofid,match value</tt>
* anidb2 cronjob sends <tt>fingerprint,ofid</tt> to the server with the worst results (or none) for storage.
:* this slave is now responsible for this fingerprint
:* storage would be <tt>ofid,length,avg_fit,avg_dom,fp</tt>
:* an identifier for the slave should be stored in ostfiletb for later administration of slaves/fingerprints
* anidb2 cronjob adds new ostfile relations based on the results
--[[User:Epoximator|Epoximator]] 08:39, 10 May 2007 (UTC)


== Possible Extension ==
== Possible Extension ==

Revision as of 08:39, 10 May 2007

Protocol for foosic client<->server communication via UDP and TCP

VERSION 1

Server: dedicated java daemon (UDP and TCP), on anidb3 (sig) (UDP for single queries, TCP for batch runs)

Client: cronjob/daemon on anidb2


General Workflow

  • user runs avdump on local audio files
  • avdump sends file meta data, including foosic audio fingerprint, to anidb2 via UDP api
  • cronjob/daemon on anidb2 regularly checks for newly added foosic fingerprints and sends them to anidb3 (while making sure not to flood)
  • anidb3 tries to match the fingerprint.
  • if a match is found a list of matching ids is returned together with the magnitude of error per match.
  • if no match is found, anidb3 creates a new unique id for the fingerprint and returns the unique id.
  • anidb2 stores unique id(s) in db
  • anidb2 uses unique id(s) to support manual file<->song matching via the webinterface


alternatively:

  • avdump sends data to anidb2 udpapi
  • anidb2 udpapi stores all data
  • anidb2 cronjob sends fingerprint to matching slaves
  • matching slaves returns best matches: ofid,match value
  • anidb2 cronjob sends fingerprint,ofid to the server with the worst results (or none) for storage.
  • this slave is now responsible for this fingerprint
  • storage would be ofid,length,avg_fit,avg_dom,fp
  • an identifier for the slave should be stored in ostfiletb for later administration of slaves/fingerprints
  • anidb2 cronjob adds new ostfile relations based on the results

--Epoximator 08:39, 10 May 2007 (UTC)

Possible Extension

(for client features; UDP API)

  • command to fetch audio meta data by ostfile id, size+content hash or foosic id
  • command to add audio file to mylist by ostfile id or size+content hash (foosic server; UDP)
  • same commands as available via TCP (used by the anidb2 cronjob) also available via UDP for use by other clients. i.e. to allow lookups by foosic fingerprint. for that a client would first contact the UDP API (anidb2) with the content hash and if the content hash is unknown to anidb it would send the fingerprint to the foosic server (anidb3) to get one or more foosic id(s) and then use those to query song data from the UDP API. (this is not meant for avdump, but it might be interesting for direct integration into player software, i.e. a winamp/amarok plugin, would work somewhat like the already available musicbrainz plugins)


General Considerations for Future Expansion

  • it is very important to effectively limit the number of fingerprints which need to be taken into account for each lookup. As such the file length and the average dom and fit should be stored in a way which allows easy and fast filtering via range queries on those 3 dimensions. so that'd probably mean it will be a: length int4, dom int4, fit int4, fingerprint bytea kind of table
  • it may become necessary to purge rarely accessed fingerprints from the db every now and then to limit the db size. in order to do that we'll need to keep some counters and dates. i'd suggest: seencount int4, addeddate timestamp, lastseen timestamp as the foosic server would require no authentication, the same user sending a fingerprint multiple times would increase the counter everytime
  • it may also become necessary to split the processing over multiple servers someday. this can be greatly simplified if the protocol is designed in a way which would allow the following setup.
    • loadbalancing server listens for foosic fingerprint lookups each received lookup is send to _all_ foosic servers
    • each foosic server replies with ids or with "unknown"
    • loadbalancer merges all id replies together (sorted by error rate) and returns reply to client
    • if all foosic servers replied unknown, loadbalancer tells least used server to store the fingerprint and returns the generated id to the client
  • this would mean that each query is processed in parallel by all available servers. the very nature of the search approach makes the entire approach very scalable.


Protocol

Broken Clients

  • we might want to require a client string and client version in every query (similar to udp api) to be able to ban badly broken clients, should the need arrise someday

Protocol Draft

Submitting a foosic fingerprint (add to db if unknown)

(maybe client=bla&clientver=NNN with each query)

Client:

  • SUBMIT foosic={ascii representation}

Server Reply:

  • Known fingerprint
    • 200 KNOWN\n{int error}|{int id}\n({int error}|{int id}\n)*
  • Unknown fingerprint
    • 210 STORED\n{int id}\n

Querying a foosic fingerprint (don't add if unknown)

(this would be used by a loadbalancer)

Client:

  • IDENT foosic={ascii representation}

Server Reply:

  • Known fingerprint
    • 200 KNOWN\n{int error}|{int id}\n({int error}|{int id}\n)*
  • Unknown fingerprint
    • 320 UNKNOWN\n

Submit a foosic fingerprint, forcing storage

(used if there is an incorrect match with another fingerprint (false positive) or by a loadbalancer for fingerprints which are unknown to all servers)

Client:

  • STORE foosic={ascii representation}

Server Reply:

  • 210 STORED\n{int id}\n