Maintenance DEV: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
{{TOCright}}
{{TOCright}}
== AniDB Stats ==
= AniDB Stats =


A very labour intensive task is the generation of all the statistics and counters for the anidb db entries. Optimization of this process is therefore on the todo list.
A very labour intensive task is the generation of all the statistics and counters for the anidb db entries. Optimization of this process is therefore on the todo list.


=== Data ===
== Data ==
The data which is currently collected by the main stats update script.
The data which is currently collected by the main stats update script.


==== Anime ====
=== Anime ===
* eps added for anime
* eps added for anime
* files added for anime
* files added for anime
Line 14: Line 14:
* total size of all files for this anime
* total size of all files for this anime


==== Group ====
=== Group ===
* animes subbed
* animes subbed
* files released
* files released
Line 20: Line 20:
* total size of all files by this group
* total size of all files by this group


==== Episode ====
=== Episode ===
* files added for this ep
* files added for this ep
* users collecting this ep
* users collecting this ep


==== File ====
=== File ===
* users with this file in mylist (split according to mylist state: unknown/hdd/cd/deleted)
* users with this file in mylist (split according to mylist state: unknown/hdd/cd/deleted)


==== User ====
=== User ===
* animes in mylist
* animes in mylist
* eps in mylist
* eps in mylist
Line 60: Line 60:




=== Current Approach ===
== Current Approach ==
3 times a week a script is run to update the counters. The script will read all relevant tables in chunks.
3 times a week a script is run to update the counters. The script will read all relevant tables in chunks.
I.e. in order to update the anime stats it will gather all episode, file and mylist information for the animes with aid 1-249, 250-499, 500-749, ... This is done in order to limit the memory usage during calculation.
I.e. in order to update the anime stats it will gather all episode, file and mylist information for the animes with aid 1-249, 250-499, 500-749, ... This is done in order to limit the memory usage during calculation.
Line 86: Line 86:
The key issue here is that these numbers rise all the time. In the early days we've run that script multiple times a day, then once a day, now 3 times a week. If things continue as they are now, we'll reach a point where we can't run it at all anymore the way it works right now.
The key issue here is that these numbers rise all the time. In the early days we've run that script multiple times a day, then once a day, now 3 times a week. If things continue as they are now, we'll reach a point where we can't run it at all anymore the way it works right now.


=== Possible Alternatives ===
== Possible Alternatives ==


Some general ideas. Some of them could also be combined.
Some general ideas. Some of them could also be combined.


==== On-The-Fly Updating / Triggers ====
=== On-The-Fly Updating / Triggers ===
We could reduce the intervall between runs of the statistic update script greatly if all important values would be updated on the fly. As we'd probably not handle all possible cases we might not be able to remove the script all together, but we might be able to run it only once a week or once a month.
We could reduce the intervall between runs of the statistic update script greatly if all important values would be updated on the fly. As we'd probably not handle all possible cases we might not be able to remove the script all together, but we might be able to run it only once a week or once a month.


Line 96: Line 96:
One approach to realize this with acceptable work effort required, would be a number of database triggers and corresponding PL/pgSQL functions which transparently update all relevant counters and stats.
One approach to realize this with acceptable work effort required, would be a number of database triggers and corresponding PL/pgSQL functions which transparently update all relevant counters and stats.


==== Read-Only database slave for stats work ====
=== Read-Only database slave for stats work ===
Another approach would be to introduce a read-only database slave (i.e. with [http://slony.info/ Slony]) and to execute all read queries of the stats update scripts on this database. As the scripts only write to the database if any value has actually changed this would greatly reduce the load on the main database.
Another approach would be to introduce a read-only database slave (i.e. with [http://slony.info/ Slony]) and to execute all read queries of the stats update scripts on this database. As the scripts only write to the database if any value has actually changed this would greatly reduce the load on the main database.
The issue here is whether we'd get the hardware resources to do this and whether it scales.
The issue here is whether we'd get the hardware resources to do this and whether it scales.


==== Small Updates ====
=== Small Updates ===
The script could be run at shorter intervalls and calculate only a part of the stats during each run. I.e. we could run it 3 times a day and process 250 animes in each run...
The script could be run at shorter intervalls and calculate only a part of the stats during each run. I.e. we could run it 3 times a day and process 250 animes in each run...


==== Dirty Flag ====
=== Dirty Flag ===
The current approach gathers data for all db entries. It doesn't matter whether any of their stats values are likely to have changed. This is especially problematic for the user stats. With each statsupdate we're collecting the data for all users, even though only a small percentage of them has done any changes to anidb. They might not even have logged in since the last stats update.
The current approach gathers data for all db entries. It doesn't matter whether any of their stats values are likely to have changed. This is especially problematic for the user stats. With each statsupdate we're collecting the data for all users, even though only a small percentage of them has done any changes to anidb. They might not even have logged in since the last stats update.


Line 111: Line 111:
* ... ?
* ... ?


==== ? ====
=== ? ===
what else?
what else?
MediaWiki spam blocked by CleanTalk.
MediaWiki spam blocked by CleanTalk.