Posted by: Stephen Baker on November 22, 2009
Heather and I both got the word on Thursday that we won’t be part of BusinessWeek once Bloomberg takes over, on Dec. 1. (We’re both pleased with this outcome, though it’s no picnic watching the staff get decimated, with good friends and colleagues heading off in every direction.) In the coming week, I think I’ll write a nice long eulogy for this blog.
But in the meantime, a question: Does anyone know how to preserve and store our four and a half years of blog posts and comments? Our colleague Arik Hesseldahl said something about turning each month into a pdf. I’ll look into that (as soon as I close my last story tomorrow). But you have a specific how-to, I’m all ears. As I wrote in September on my Numerati blog, I’m not sure how committed Bloomberg will be to social media. There’s no telling when someone might pull the plug on a server housing the archives of a discontinued blog.
Been there, but once you start your next chapter you will feel a breathe of fresh air.
Hopefully, your blogs have been saved by the Internet Archive Org. Also I think there are paid services that can preserve your blogs. I have seen my website that I created fifteen years ago, so I can attest to the Internet Archive. But I can't recommend a paid service since I never thought about using one.
Thanks for all you have done and please continue your success elsewhere.
Stephen, from a technical perspective, someone on your IT support staff should be able to get you a copy of the database - that's easy. However, there may be copyright issues related to the posts themselves if you planned to re-post them (i.e. reconstruct the blog at another site).
It looks like you're on Movable Type. You should be able to export your posts as XML and move into Wordpress, or probably another MT blog (I've moved from Blogger to Wordpress myself, which wasn't difficult).
Here's a full explanation that will walk you through the process: http://codex.wordpress.org/Importing_from_Movable_Type_to_WordPress
Good luck.
Have you thought about asking someone who uses something like Sharepoint Designer to import it as a site? It wouldn't preserve the database but would grab the all the generated HTML as HTML. If I have time later I'll experiment.
Maybe you could export the whole thing as a large RSS file. It should be fairly easy then to find someone who could put together a script and re-import the articles into one of the free blog services.
seems to be documented: http://www.movabletype.org/documentation/appendices/import-export-format.html
Steve, was great seeing you.This might help - http://www.backupify.com/
No PDFs, please!
You can export the content and put it in a new blog; just create a parallel universe for it.
There is two aspects to saving a blog
1. Negotiate for the rights to the blog content. I'd think BW would be open to transferring them to you given the take-over and subsequent actions. Without the rights, though, it does not matter what the technical solution is
2. How accessible do you want the past blog posts to be? Index-able by Google? Assuming you have the rights, do you even want to continue posting? If so, you might follow @Chris' suggestion above to convert from Movable Type to Wordpress (or approach Movable Type and negotiate for them to host the old blog) Otherwise, creating PDF's might work.
Good luck!
You can find snapshots (through June 08, as of today) on the Wayback Machine at http://web.archive.org/web/*/http://www.businessweek.com/the_thread/blogspotting/
As others have commented, though, exporting the content directly from MT is the way to go, assuming you can reach an agreement on any copyright issues.
Since the main page for the blog has links to all the monthly archive pages, worst case you could get someone to write a simple crawler/scraper that extracts the content.
I easily moved my blog from MT to WP; there's an automated way to do that on WP. I agree with the posters above, though, in that the rights to the IP are the thing to get straightened out.
Stephen,
If you guys need a hand on the technical end of things - I'd be happy to see if there is anything I can do - although you'll most likely need to cooperation of the BW IT dept.
Email me = dane@simler.com
Stephen -- No need to "approach Movable Type" -- we'll approach you :). We (Six Apart, makers of Movable Type, TypePad, Vox) would be happy to help with a migration and get a new, independent site set up asap. It should be fairly straightforward to go from one MT site to another one and we'll do it for you. We are committed to helping every former or soon to be former journalist set up with a site so they can continue to do what they do best. I'll be in touch or feel free to email me a chris dot alden at sixapart dot com.
I created a static archive of this blog using httrack. It looks like it works fine!
You may download it from here: http://dl.dropbox.com/u/2437600/blogspotting.tgz (I'll delete it after a couple of days)
Fellow blogger Lauren Young here. I'm wondering what to do with the Working Parents archive, so please let me know. I'd hate for it to disappear...
Way to go, Christopher!
Stephen-- Your terrific blog has been on my required reading list, and I'm really going to miss you & Heather in this space. Good luck with your next venture, and please do let all your readers know how to find you.
Best,
Daria
Despite Jeff Jarvis' comment, here's a service that you can use to store/convert your blog content into a PDF or eBook - http://www.zinepal.com/
For what it's worth, you may want to do both - move the archives to another server - and offer the archives as a downloadable PDF or eBook.
Come home to WordPress ;-)
Lloyd Budd here from Automattic's WordPress.com. We're also eager to help in any way we can.
This is a discussion worth having far beyond Blogspotting. Over drinks an acquaintance with a background in library science commented recently that despite the massive amount of data we're producing, the question of its longevity is still very much up in the air.
I wrote a family blog for a long time, but archiving it with pictures intact is something else entirely. Even the long-term existence of the massive amount of pictures we're taking is in question.
I can still pull out 50, 60 and 70 year old pictures (and even some negatives) that are stored in a shoebox, but will my grandchildren be able to see all that images I took?
Chuck has a very good point. There's a ton of quality content being produced amidst all the user-generated content clutter.
But, will that content be available in 10-15-20 years? Als, we're all assuming that the http: - browser - URL underpinnings of the Web will remain as a standard. That's not a given.
What will the Web look like in 20 years when we're all using super-computer iPhones or other mobile gadgets?
Librarians and technology entrepreneurs definitely need to tackle the longevity issue of Web content.
Jeff, you are right. The answer is the simplest form has better chances of surviving time.
Here is my test: I get "the content" in some format. How much effort do I need in order to read it?
For example, a Wordpress export is good. But I need to setup a MySQL server, then Apache, PHP, and other stuff. Will all of them be there in 10 years, will the future versions be compatible with the export taken 10 years ago? Possibly, but I wouldn't bet on this.
If I had to choose, I'd go for plain HTML files. Not a safe bet, but the odds are much better.
Oh, and I'd put a tarball of all the HTML files on the web and let archive.org archive it and simple users to download and redistribute it.
Thanks a lot Stephen for such a nice post about how to save our blog posts and comments longer . I really appreciate your post . Keep blogging .
:-)
http://www.tvtubex.com
In Blogspotting Senior Writer Stephen Baker and Associate Editor Heather Green take a look at how cutting-edge technologies are changing business and society. Whether its blogs or wikis, data crunching or data targeting, technology’s advances are reshaping the world that we live in.