OstDB DEV: Difference between revisions

m
no edit summary
mNo edit summary
Line 1: Line 1:
{{TOCright}}
{{TOCright}}
=General=
=General=
this is the place to contribute ideas on a possible future addition of anime OST data to anidb.
this is the place to contribute ideas on a possible future addition of anime OST data to AniDB.


For other areas of active development on AniDB, check: [[Development]]
For other areas of active development on AniDB, check: [[Development]]
Line 8: Line 8:


=Vision=
=Vision=
The general idea would be that AniDB clients would be extended with audio file support and would automatically provide anidb with lots of raw data on audio files being collected by it's userbase. For so far unknown audio files interested users (aka work monkeys) would either use a client or the webinterface to specify the song (or add it, if it is not yet listed on anidb).
The general idea would be that AniDB clients would be extended with audio file support and would automatically provide AniDB with lots of raw data on audio files being collected by it's userbase. For so far unknown audio files interested users (aka work monkeys) would either use a client or the web interface to specify the song (or add it, if it is not yet listed on AniDB).
Known audio files could automatically be added to the users my(ost)list, could be renamed or their ID3/Comment data could be updated.
Known audio files could automatically be added to the users my(ost)list, could be renamed or their ID3/Comment data could be updated.


Line 28: Line 28:
* description - Some artist specific info
* description - Some artist specific info
* [ratings&co] - not implemented in the ui  
* [ratings&co] - not implemented in the ui  
* ..
* ...


==Artist Group (Band)==
==Artist Group (Band)==
Line 43: Line 43:
* number of tracks
* number of tracks
* release date
* release date
...
...


Line 49: Line 48:
Representation of a specific song. Each song may be included in an arbitrary number of collections (at least one). A song can also be directly added as op/ed for an anime (also stored: first ep the song was used in). A song is related to files. Each file can contain one or  more songs (i.e. full cd image). And each song has lots of files.
Representation of a specific song. Each song may be included in an arbitrary number of collections (at least one). A song can also be directly added as op/ed for an anime (also stored: first ep the song was used in). A song is related to files. Each file can contain one or  more songs (i.e. full cd image). And each song has lots of files.


Each song automatically has one special "generic file", which can be used to add the song to my(ost)list without specifying any concrete file. The files known to AniDB for each song are not shown on the webinterface. They can only be added to mylist by using an AniDB client.
Each song automatically has one special "generic file", which can be used to add the song to my(ost)list without specifying any concrete file. The files known to AniDB for each song are not shown on the web interface. They can only be added to mylist by using an AniDB client.


Data stored:
Data stored:
Line 63: Line 62:


==Audio File==
==Audio File==
Representation of an actual physical file which was encountered by an AniDB client at least once (it is not possible to add files manually). Files are initially not linked to any songs, albums or artists. Once known to AniDB a file can be manually linked to a song via the webinterface or an AniDB client. Linking will be supported by the file meta data known to AniDB. I.e. if AniDB collected ID3 Tag data for song title, artist, tracknumber, album, ... the available data will be used to suggest some likely matchings for each file. It will be up to the user to verify the correctness of the suggested matchings.
Representation of an actual physical file which was encountered by an AniDB client at least once (it is not possible to add files manually). Files are initially not linked to any songs, albums or artists. Once known to AniDB a file can be manually linked to a song via the web interface or an AniDB client. Linking will be supported by the file meta data known to AniDB. I.e. if AniDB collected ID3 Tag data for song title, artist, tracknumber, album, ... the available data will be used to suggest some likely matchings for each file. It will be up to the user to verify the correctness of the suggested matchings.


Users can add files to their my(ost)lists. Songs can also be added to mylist directly by using generic files.
Users can add files to their my(ost)lists. Songs can also be added to mylist directly by using generic files.
Line 93: Line 92:


==General==
==General==
One key factor to allow for a certain degree of automation is the automatic identification of audio files. There are some services out there like music brainz which do this but tend to list only the very well known OSTs. Reimplementing something like this for anidb would be clearly inveasible. One possible approach would be to generate normal SHA1 hashes over the raw audio data (still in compressed form but without any ID3 Tags, Comments, ..., basically this would mostly mean skipping the header for hash generation). This could be extended by storing additional TRM IDs from music brainz, where available.
One key factor to allow for a certain degree of automation is the automatic identification of audio files. There are some services out there like music brainz which do this but tend to list only the very well known OSTs. Reimplementing something like this for AniDB would be clearly infeasible. One possible approach would be to generate normal SHA1 hashes over the raw audio data (still in compressed form but without any ID3 Tags, Comments, ..., basically this would mostly mean skipping the header for hash generation). This could be extended by storing additional TRM IDs from music brainz, where available.
Content hashes would differ for the same song from encode to encode. However, matching of audio files to songs could probably automated to a certain degree by using ID3/Comment values found on the files in question.
Content hashes would differ for the same song from encode to encode. However, matching of audio files to songs could probably automated to a certain degree by using ID3/Comment values found on the files in question.


Maybe a free acoustic fingerprinting algorithm could be used? http://www.foosic.org/libfooid.php
Maybe a free acoustic fingerprinting algorithm could be used? http://www.foosic.org/libfooid.php
* (EXP) that's pretty similar to music brainz, being free definitely has some advantages. However, unfortunately it seems as if they're not offering any indexing server which would assing unique ids (like musicbrainz's TRM ids) to songs based on acoustic fingerprints. That means we'd have to do that ourself. Storing lots of ~500Byte fingerprints (which are not 100% equal for different files of the same song) and doing the loose matching would be quite demanding for the server. That could of course be handled by a separate server, but I wonder if it is really a good idea to doublicate existing services in such a way. On the other hand one of the main problems with music brainz is that they regularly purge old/rarely referenced songs from their database. Which could give us some real troubles for anime OSTs. If you try it, you'll notice that their coverage of anime OSTs is very poor.
* (EXP) that's pretty similar to music brainz, being free definitely has some advantages. However, unfortunately it seems as if they're not offering any indexing server which would assign unique ids (like musicbrainz's TRM ids) to songs based on acoustic fingerprints. That means we'd have to do that ourself. Storing lots of ~500Byte fingerprints (which are not 100% equal for different files of the same song) and doing the loose matching would be quite demanding for the server. That could of course be handled by a separate server, but I wonder if it is really a good idea to duplicate existing services in such a way. On the other hand one of the main problems with music brainz is that they regularly purge old/rarely referenced songs from their database. Which could give us some real troubles for anime OSTs. If you try it, you'll notice that their coverage of anime OSTs is very poor.


See also [[OstDB_DEV_Foosic]]
See also [[OstDB_DEV_Foosic]]
Line 111: Line 110:
* if a ArtistOrBand entry is a Band it needs a name [[User:Ace|Ace]] (EXP: true)
* if a ArtistOrBand entry is a Band it needs a name [[User:Ace|Ace]] (EXP: true)
* maybe linking the MetaData and FileSys.Data to the submitting user? [[User:Ace|Ace]]
* maybe linking the MetaData and FileSys.Data to the submitting user? [[User:Ace|Ace]]
** (EXP) might come in handy in some cases, but on the other hand it'll increase the data size. but it's probably a good idea. It would make it easier to ensure that we're not counting the submittions of a user multiple times. and we could also display the user's personal metatags/filenames on his pages.
** (EXP) might come in handy in some cases, but on the other hand it'll increase the data size. but it's probably a good idea. It would make it easier to ensure that we're not counting the submissions of a user multiple times. and we could also display the user's personal metatags/filenames on his pages.


Needs Feedback:
Needs Feedback:
* maybe add a CollectionFile table for zip, rar, ... packed releases linked to AudioFile, Collection and Group? [[User:Ace|Ace]]
* maybe add a CollectionFile table for zip, rar, ... packed releases linked to AudioFile, Collection and Group? [[User:Ace|Ace]]
** (EXP) do we really need to support packed releases? most users would extract archives anyway. But even if we want to support them, we won't need another table for that. The song<->file relation is M:N, meaning that one file can already contain multiple songs. I wouldn't add archives though. If we really have to, the client software could extract archives on the fly and hash the audio which are inside. The one-file:multiple-songs case was rather meant for those lossless audio files which may contain an entire cd.
** (EXP) do we really need to support packed releases? most users would extract archives anyway. But even if we want to support them, we won't need another table for that. The song<->file relation is M:N, meaning that one file can already contain multiple songs. I wouldn't add archives though. If we really have to, the client software could extract archives on the fly and hash the audio which are inside. The one-file:multiple-songs case was rather meant for those lossless audio files which may contain an entire CD.


Don't Fix?:
Don't Fix?:
Line 130: Line 129:
* list bands/groups and members? -> artist<->artist relation "member of" and a type flag for artists: band/person?
* list bands/groups and members? -> artist<->artist relation "member of" and a type flag for artists: band/person?
* make song<->audio file a many-to-many relation (for audio files which contain more than one song)
* make song<->audio file a many-to-many relation (for audio files which contain more than one song)
* maybe try to unify typical data about a person in a new person table which can then be refered to by seiyu, artist and producer tables. (would also remove the need for a special artist<->seiyu relation)
* maybe try to unify typical data about a person in a new person table which can then be referred to by seiyu, artist and producer tables. (would also remove the need for a special artist<->seiyu relation)
* multipilcities of released-by relation between audio file and audio group are wrong (switched) in diagram
* multipilcities of released-by relation between audio file and audio group are wrong (switched) in diagram
* anime<->song relation with attributes (type: OP/ED, first-ep: eid)
* anime<->song relation with attributes (type: OP/ED, first-ep: eid)
1,633

edits

MediaWiki spam blocked by CleanTalk.
MediaWiki spam blocked by CleanTalk.