There does seem a point about a few years ago where IMDB got a lot more aggressive about purging message boards of older messages. I don't really understand the heuristic around what gets deleted, as like you, I've gotten replies to messages posted *years* ago on low-traffic topics.
My guess is that there's some minimal level of retention allowed for all boards, so low traffic boards are able to retain some messages for long periods of time.
I do think they are too aggressive overall in purging boards relative to the amount of actual disk storage they actually consume, especially as storage has gotten cheaper and IMDB more commercial.
I'd wager that most messages are less than 1 kB of text, so a board with 100 top-level comments, each with 10 replies would only be a megabyte of storage. Even if you assume an overhead of 50% for indexing and metadata, you're still only at 1.5 MB per forum.
Wikipedia says IMDB has about 11 million main entries (titles and people, including episodes). Even though episodes don't have their own boards, if you use that figure with the above 1.5 MB per board, it's still a relatively paltry (by enterprise storage standards) of 16.5 TB of data for board entries.
And the reality is its likely much lower -- a huge number of main topics have 10 or fewer forum topics, probably with 5 or fewer replies, and including episodes for TV shows greatly inflates the overall number since they don't have their own forums. The 16.5 TB number is probably too high by half.
And they're owned by Amazon, which means they likely have access to their EC2 cloud storage platform, making the storage costs even lower.
reply
share