MovieChat Forums > General Discussion > Questions about archiving, and is there ...

Questions about archiving, and is there any way I can help?


How are you going about archiving IMDb posts, and is there any way for us to help? Also, are you accepting suggestions for particular boards to back up?

I have been working tirelessly to archive as many as possible of my favorite IMDb boards as possible before they disappear forever. I've mostly been doing it by using the website downloading program HTTrack to download each board in its entirety to my computer's hard drive, although I have also adding a couple of boards (in their entirety, with every thread) to Archive.org's Wayback Machine.

You can read about my efforts and see a list of the boards that I have saved to my hard drive so far at this link:

https://www.reddit.com/r/IMDbFilmGeneral/comments/5sok0m/the_imdb_adoptaboard_archiving_program_unofficial/

Is there any way that my archiving efforts can prove useful to you? For instance, if I were to upload ZIP files of the boards I have downloaded, could you integrate them into your archive?

reply

Hey, I love what you're doing here, and I'd like to suggest a couple director boards:

Yasujirô Ozu http://www.imdb.com/name/nm0654868/board
Akira Kurosawa http://www.imdb.com/name/nm0000041/board
Kenji Mizoguchi http://www.imdb.com/name/nm0003226/board
Masaki Kobayashi http://www.imdb.com/name/nm0462030/board

reply

If would be wonderful if you could archive Psycho (1960):
http://www.imdb.com/title/tt0054215/board/

[I haven't been able to get httrack to install properly to be able to do the archiving for myself. Very frustrating!]

reply

OK, following the directions/suggestions at the reddit page above *religiously* I was able to install httrack and have now got the Psycho board mirrored (all .8 gigs of it).

I'd like to thank hbenthow for suggesting this. Note that httrack would only install as a command-line-interface program on my Mac laptop and the 'options' you need given that are:
-r5 -%e0 -A5000 -%c1
which is pretty nerdy. If you're interested in doing any of this yourself, I'd therefore strongly recommend finding yourself a Windows computer so that you can install httrack in a GUI-form and so follow hbenthow's instructions more straightforwardly.

reply

Oops - didn't see you'd asked to put specific boards here. You can delete that thread I started! So far I've found:

Psycho - http://www.imdb.com/title/tt0054215/board/

Wild Wild West (60's TV) - www.imdb.com/title/tt0058855/board/
Sometimes a Great Notion (Never Give an Inch) - http://www.imdb.com/title/tt0067774/board/
Have Gun - Will Travel - http://www.imdb.com/title/tt0050025/board/
The Many Adventures of Winnie the Pooh - http://www.imdb.com/title/tt0076363/board/

reply

@swanstep, I highly recommend that you check every thread in the board you have downloaded. Due to IMDb's current instability, some threads (and some pages within some threads) show up as error pages.

If this happens, I recommend either manually downloading those pages with your browser, or re-downloading them with HTTrack, then copying the files into the folder that you downloaded the whole board into in order to "repair" your archive of the board.

To do this, first follow the instructions that I posted on the Reddit board - but with one exception. First of all, give the project some random name (NOT the name of the board that you downloaded - you don't want to replace that download). Then, just copy the link of each individual thread that you need downloaded (the first page of the thread only - HTTrack will automatically re-download every page of the thread) and don't alter the link in any way. You just want to download the individual thread. Once the download is complete, open it in your browser and make sure that there are no errors this time (if it isn't, re-download it). If there aren't any, open the following location in the download folders of both the whole board that you downloaded and the individual thread that you downloaded (the bold text stands in for something that will have a different name in each project):

[MAIN FOLDER]\www.imdb.com\title\[NUMBER VARIES]\board\thread

Then, sort the files by size, and copy all of the HTML files that are reasonably large (50 KB, 20 KB, etc), but NONE of the ones that are tiny (1 KB - 5 KBs) if there are any such tiny ones fro mthe folder of the single thread that you downloaded into the folder of the whole board that you download. If you are asked whether you want to replace the files, select yes. Afterwards, open the downloaded version of the whole board in your browser again, and check the thread in question. It shouldn't be an error page anymore.

reply

@htbenthow. Oh damn it, those d'oh pages show up in about every tenth thread I've archived (Amazon/IMDb finding another way to screw us?). I'm not sure that it's a good use of my (or anyone's) time to go through and fix all of these. Maybe it's better to accept just 90% accurate archiving (and make more archives in the time left to us).

reply

There is now a better and easier way to save IMDb boards. I highly recommend it. I've tried it, and it works beautifully.

See here (for chobar's reply):

http://www.moviechat.org/movies/general/posts/58a94ce6203e9300116f3b8d

And here:

https://www.reddit.com/r/IMDbFilmGeneral/comments/5uxby1/the_imdb_adoptaboard_archiving_program_unofficial/

reply