OstDB DEV: Difference between revisions

m
mNo edit summary
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{TOCright}}
{{TOCright}}
=General=
 
this is the place to contribute ideas on a possible future addition of anime OST data to AniDB.
==General==
This is the place to contribute ideas on a possible future addition of anime OST data to AniDB.


For other areas of active development on AniDB, check: [[Development]]
For other areas of active development on AniDB, check: [[Development]]


Directly related: [[Generic_PersonCompany_DEV]], [[OstDB_DEV_Foosic]]
Directly related: [[Generic PersonCompany DEV]], [[OstDB DEV Foosic]]


=Vision=
==Vision==
The general idea would be that AniDB clients would be extended with audio file support and would automatically provide AniDB with lots of raw data on audio files being collected by it's userbase. For so far unknown audio files interested users (aka work monkeys) would either use a client or the web interface to specify the song (or add it, if it is not yet listed on AniDB).
The general idea would be that AniDB clients would be extended with audio file support and would automatically provide AniDB with lots of raw data on audio files being collected by it's user base. For so far unknown audio files interested users (aka work monkeys) would either use a client or the web interface to specify the song (or add it, if it is not yet listed on AniDB).
Known audio files could automatically be added to the users my(ost)list, could be renamed or their ID3/Comment data could be updated.
Known audio files could automatically be added to the users my(ost)list, could be renamed or their ID3/Comment data could be updated.


=Data=
==Data==
What are the things we should be able to store/provide?
What are the things we should be able to store/provide?


... list all entities and their attributes here ...
... list all entities and their attributes here ...


==Artist==
===Artist===
<tt>copied from [[Generic_PersonCompany_DEV#Artist|PersonDB - Artist]]</tt>
<tt>copied from [[Generic PersonCompany DEV#Artist|PersonDB - Artist]]</tt>


* relid - entity id
* relid - entity id
* typeid - type of entity
* typeid - type of entity
** 2 - person
** 2 - person
Line 30: Line 31:
* ...
* ...


==Artist Group (Band)==
===Artist Group (Band)===
 
Data stored:
Data stored:
* founded on
* founded on
Line 38: Line 38:
...
...


==Collection (Album)==
===Collection (Album)===
 
Data stored:
Data stored:
* number of tracks
* number of tracks
Line 45: Line 44:
...
...


==Song==
===Song===
Representation of a specific song. Each song may be included in an arbitrary number of collections (at least one). A song can also be directly added as op/ed for an anime (also stored: first ep the song was used in). A song is related to files. Each file can contain one or  more songs (i.e. full cd image). And each song has lots of files.
Representation of a specific song. Each song may be included in an arbitrary number of collections (at least one). A song can also be directly added as OP/ED for an anime (also stored: first ep the song was used in). A song is related to files. Each file can contain one or  more songs (i.e. full CD image). And each song has lots of files.


Each song automatically has one special "generic file", which can be used to add the song to my(ost)list without specifying any concrete file. The files known to AniDB for each song are not shown on the web interface. They can only be added to mylist by using an AniDB client.
Each song automatically has one special "generic file", which can be used to add the song to my(ost)list without specifying any concrete file. The files known to AniDB for each song are not shown on the web interface. They can only be added to MyList by using an AniDB client.


Data stored:
Data stored:
Line 61: Line 60:
* ...
* ...


==Audio File==
===Audio File===
Representation of an actual physical file which was encountered by an AniDB client at least once (it is not possible to add files manually). Files are initially not linked to any songs, albums or artists. Once known to AniDB a file can be manually linked to a song via the web interface or an AniDB client. Linking will be supported by the file meta data known to AniDB. I.e. if AniDB collected ID3 Tag data for song title, artist, tracknumber, album, ... the available data will be used to suggest some likely matchings for each file. It will be up to the user to verify the correctness of the suggested matchings.
Representation of an actual physical file which was encountered by an AniDB client at least once (it is not possible to add files manually). Files are initially not linked to any songs, albums or artists. Once known to AniDB a file can be manually linked to a song via the web interface or an AniDB client. Linking will be supported by the file meta data known to AniDB. I.e. if AniDB collected ID3 Tag data for song title, artist, track number, album, ... the available data will be used to suggest some likely matchings for each file. It will be up to the user to verify the correctness of the suggested matchings.


Users can add files to their my(ost)lists. Songs can also be added to mylist directly by using generic files.
Users can add files to their my(ost)lists. Songs can also be added to MyList directly by using generic files.


Data stored:
Data stored:
Line 72: Line 71:
* foosic fingerprint
* foosic fingerprint
* foosic ids
* foosic ids
* trmid (music brainz TRMid, unused atm)
* trmid (music brainz TRMid, unused ATM)
* audio codec
* audio codec
* bitrate (in kBit/s)
* bitrate (in kBit/s)
Line 89: Line 88:
...
...


=Implementation=
==Implementation==
 
===General===
==General==
One key factor to allow for a certain degree of automation is the automatic identification of audio files. There are some services out there like music brainz which do this but tend to list only the very well known OSTs. Reimplementing something like this for AniDB would be clearly infeasible. One possible approach would be to generate normal SHA1 hashes over the raw audio data (still in compressed form but without any ID3 Tags, Comments, ..., basically this would mostly mean skipping the header for hash generation). This could be extended by storing additional TRM IDs from music brainz, where available.
One key factor to allow for a certain degree of automation is the automatic identification of audio files. There are some services out there like music brainz which do this but tend to list only the very well known OSTs. Reimplementing something like this for AniDB would be clearly infeasible. One possible approach would be to generate normal SHA1 hashes over the raw audio data (still in compressed form but without any ID3 Tags, Comments, ..., basically this would mostly mean skipping the header for hash generation). This could be extended by storing additional TRM IDs from music brainz, where available.
Content hashes would differ for the same song from encode to encode. However, matching of audio files to songs could probably automated to a certain degree by using ID3/Comment values found on the files in question.
Content hashes would differ for the same song from encode to encode. However, matching of audio files to songs could probably automated to a certain degree by using ID3/Comment values found on the files in question.
Line 98: Line 96:
* (EXP) that's pretty similar to music brainz, being free definitely has some advantages. However, unfortunately it seems as if they're not offering any indexing server which would assign unique ids (like musicbrainz's TRM ids) to songs based on acoustic fingerprints. That means we'd have to do that ourself. Storing lots of ~500Byte fingerprints (which are not 100% equal for different files of the same song) and doing the loose matching would be quite demanding for the server. That could of course be handled by a separate server, but I wonder if it is really a good idea to duplicate existing services in such a way. On the other hand one of the main problems with music brainz is that they regularly purge old/rarely referenced songs from their database. Which could give us some real troubles for anime OSTs. If you try it, you'll notice that their coverage of anime OSTs is very poor.
* (EXP) that's pretty similar to music brainz, being free definitely has some advantages. However, unfortunately it seems as if they're not offering any indexing server which would assign unique ids (like musicbrainz's TRM ids) to songs based on acoustic fingerprints. That means we'd have to do that ourself. Storing lots of ~500Byte fingerprints (which are not 100% equal for different files of the same song) and doing the loose matching would be quite demanding for the server. That could of course be handled by a separate server, but I wonder if it is really a good idea to duplicate existing services in such a way. On the other hand one of the main problems with music brainz is that they regularly purge old/rarely referenced songs from their database. Which could give us some real troubles for anime OSTs. If you try it, you'll notice that their coverage of anime OSTs is very poor.


See also [[OstDB_DEV_Foosic]]
See also [[OstDB DEV Foosic]]
 
==Database==


===Approach 1===
===Database===
====Approach 1====
Here is one possible way of realizing the database structure, not exactly 100% correct UML but you should get the idea. Classes are supposed to represent database entities. Lots of attributes are still missing. But I'd like some feedback on whether this general structure would be viable.
Here is one possible way of realizing the database structure, not exactly 100% correct UML but you should get the idea. Classes are supposed to represent database entities. Lots of attributes are still missing. But I'd like some feedback on whether this general structure would be viable.


Line 130: Line 127:
* make song<->audio file a many-to-many relation (for audio files which contain more than one song)
* make song<->audio file a many-to-many relation (for audio files which contain more than one song)
* maybe try to unify typical data about a person in a new person table which can then be referred to by seiyu, artist and producer tables. (would also remove the need for a special artist<->seiyu relation)
* maybe try to unify typical data about a person in a new person table which can then be referred to by seiyu, artist and producer tables. (would also remove the need for a special artist<->seiyu relation)
* multipilcities of released-by relation between audio file and audio group are wrong (switched) in diagram
* multiplicities of released-by relation between audio file and audio group are wrong (switched) in diagram
* anime<->song relation with attributes (type: OP/ED, first-ep: eid)
* anime<->song relation with attributes (type: OP/ED, first-ep: eid)


 
Changes to diagram:
changes to diagram
* rev3:
* rev3:
** split ArtistOrBand into ArtistGroup and Artist
** split ArtistOrBand into ArtistGroup and Artist
staff
2,096

edits

MediaWiki spam blocked by CleanTalk.
MediaWiki spam blocked by CleanTalk.