Armed Polite Society

Main Forums => The Roundtable => Topic started by: tyme on November 03, 2009, 12:56:36 PM

Title: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: tyme on November 03, 2009, 12:56:36 PM
Ext4 Journal checksumming has been a feature for a while but it's been disabled by default, meaning hardly anyone uses it.  The developers made it default in 2.6.32-rc.  Unfortunately, it is broken, and people have been getting filesystem corruption after a crash or other unclean shutdown (whenever the filesystem journal has corruption).

If you don't understand any of that, this probably does not affect you.

Anyone using ext4 with journal checksumming, including:
 - people with stable kernels who have an ext4 fs mounted with the journal_checksum option
 - people using 2.6.32-rc*

need to IMMEDIATELY disable it.
 - if running 2.6.32-rc, do a controlled reboot into either a) 2.6.31.x or lower, or b) 2.6.32-rc5-git6 or later
 - make sure no ext4 filesystems are mounted with the journal_checksum mount option

And, if you've gone through an unclean reboot since booting 2.6.32-rc[1-5], you need to reboot and force a filesystem check by touching /forcefsck.

Long details:
http://bugzilla.kernel.org/show_bug.cgi?id=14354#c167

There is an additional patch in comment 123 that's not clearly related but still might be a good idea anyway.
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: roo_ster on November 03, 2009, 01:30:41 PM
Well, I'll be dipped in apple butter.  Thanks for the heads-up.

Luckily all the linux boxes at work I built are ext3.

My netbook I went with ext2.

Newest distros I have installed are xubuntu 8.10 and CentOS5.1 (RHEL5.1).
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: tyme on November 03, 2009, 01:54:27 PM
Quote
My netbook I went with ext2.

Do you have an fsck fetish or something? :)
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: zahc on November 03, 2009, 02:10:37 PM
So this is a different problem than the delayed allocation "bug"?
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: tyme on November 03, 2009, 03:57:14 PM
Yes, completely different.  First of all, that one was not a bug.  :)

This one causes not just file corruption (which is expected if the computer crashes or gets the power yanked out without properly shutting down... that's what battery-backed disk controllers are for), but actual filesystem corruption, which can affect other files that weren't even being written to during the crash.
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: RevDisk on November 03, 2009, 04:45:32 PM
Ext4 Journal checksumming has been a feature for a while but it's been disabled by default, meaning hardly anyone uses it.  The developers made it default in 2.6.32-rc.  Unfortunately, it is broken, and people have been getting filesystem corruption after a crash or other unclean shutdown (whenever the filesystem journal has corruption).
[/quote
Quote
This one causes not just file corruption (which is expected if the computer crashes or gets the power yanked out without properly shutting down... that's what battery-backed disk controllers are for), but actual filesystem corruption, which can affect other files that weren't even being written to during the crash.

Ouch.  Thanks for the heads up.  I'm moreso a fan of ext3 on my sole linux box.  I don't mind living on the bleeding edge on most things.  Switches and file structures?  Not so much.
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: roo_ster on November 04, 2009, 10:40:22 AM
Do you have an fsck fetish or something? :)

SSHD.  I want to minimize writes for performance and longevity's sake.
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: Gewehr98 on November 04, 2009, 11:15:39 AM
Good call. 

I'm waiting for my experimental IDE Flash XP pagefile drive to throw in the towel, but it's still going strong.

Maybe they're getting better at SSD longevity?

I'm seriously tempted to use a SSD drive as primary when I do my Win 7 migration later... 
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: Nightfall on November 04, 2009, 12:14:40 PM
Phew, thanks for the heads up tyme. Thankfully I'm already running 2.61.31.
Title: Re: Bleeding edge Linux users beware (ext4 journal checksumming)
Post by: tyme on November 04, 2009, 07:03:56 PM
SSHD.  I want to minimize writes for performance and longevity's sake.

My undestanding is all the recent SSD firmwares will notice repeated writes to the same areas and spread them out over unused parts of the disk.

In that case, the SSD strategy for wear-leveling is copy-on-write at the hardware level.  It doesn't matter how abusive your filesystem is.  The time required to make an SSD fail is roughly ( (free space) / (average I/O write bandwidth) ) * (rated # of write cycles).

I wonder which is worse, journals vs not using noatime.