Thursday, July 11, 2013

btrfs is still considered experimental...

I started my Monday with a nice black screen saying that my debian did not boot properly. Upon reboot using 'recovery mode', I eventually discovered that the filesystem could not be mounted... sigh !

A year or so ago, I did setup my debian/squeeze (+bpo) using btrfs, after following an article from GNU/Linux Magazine which showed some cool feature of btrfs. However at that time, I did not read the warning message about 'btrfs being still experimental'.

Anyway, I need to get that fixed ASAP. kernel logs revealed something like:

 [...]
 parent transid verify failed on 201236885504 wanted 1822936 found 2125997

 parent transid verify failed on 201236885504 wanted 1822936 found 2125997
 btrfs: open_ctree failed


So I google for this error message, hoping someone already went through this. Found this thread:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14960.html

Even if this thread recommends two tools:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14974.html

This is not quite the same problem. After some more googling, I found out:

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2C_and_I_get_a_kernel_oops.21

But then again, different symptoms imply different solutions.

So what remains ? I posted on btrfs@ and went on #btrfs

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg25660.html

I was lucky and someone from #btrfs pointed me at the following trick from grub: try rebooting the system and edit the grub line to append 'bootflags=recovery'.

Luckily I had an old 3.4.0 (which is still more recent than the 3.2.0~bpo) to try out. Too bad 3.4.0 is still considered legacy from btrfs point of view and the -o recovery step failed.

So the next step is to get my hand at kernel 3.10 (which was released a week ago). No distribution had it pre-build, so I turned to knoppix 7.2.0 with kernel 3.9.6 to try out. Found a blank CD and burned the iso.

Once knoppix booted, I had access to a kernel 3.9.6 with -supposively- much better btrfs recovery mode.
After a keyboard change, I was ready to roll:

$ setxkbmap fr

So I tried:

$ sudo mount -v -t btrfs -o recovery -o degraded /dev/mapper/voxbox-root /tmp/bla


 device fsid abcdef01-2345-6789-aaaa-1234567890ab devid 1 transid 2207957 /dev/mapper/voxbox-root
 btrfs: enabling auto recovery
 btrfs: allowing degraded mounts
 parent transid verify failed on 201236885504 wanted 1822936 found 2125997
 parent transid verify failed on 201236885504 wanted 1822936 found 2125997
 btrfs: open_ctree failed

Ok so the problem is still present in kernel 3.9.6. As per people on #btrfs, I need to wait for the fusionio dev team to eventually show up... minutes felt like hours waiting for them.

While waiting for them, I discovered it is not easy (impossible?) to get an idea of how much data is stored on a volume. I tried pdisplay, vgdisplay, pvs but in all case you only know the actual allocated spaces. So how much disk do I need to restore my system, I can only have an upper guess from the allocated space:


http://www.redhat.com/archives/linux-lvm/2010-August/msg00011.html

Someone from the fusionio team finally shows up, told me to git pull / make & execute some commands:


$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
$ cd btrfs-progs/
$ make
$ ./btrfsck /dev/mapper/voxbox-root
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
Ignoring transid failure
Checking filesystem on /dev/mapper/voxbox-root
UUID: af7e6809-af9d-474f-a332-295cdba1c09f
checking extents
btrfsck: cmds-check.c:2063: check_owner_ref: Assertion `!(rec->is_root)' failed.
Aborted

So apparently this is not know issue. I was asked to use another git repository:


$ git clone -b for-knoppix https://github.com/josefbacik/btrfs-progs

Then did a couple of operation:


$ ./btrfsck /dev/mapper/voxbox-root
[...]
rec 201236885504 is a root isnt setup right, found ref? yes
btrfsck: cmds-check.c:2063: check_owner_ref: Assertion `!(rec->is_root)' failed.
Aborted
-> http://bpaste.net/show/DDBjB20mGNBZo2DJF7G3/

$ git pull
$ make
$ ./btrfsck /dev/mapper/voxbox-root


rec 201236885504's owner is 5, we want 7
rec 201236885504 is a root isnt setup right, found ref? yes
btrfsck: cmds-check.c:2064: check_owner_ref: Assertion `!(rec->is_root)' failed.
Aborted
-> http://bpaste.net/show/MRbYkkpRAkbwDcwdIGwW/

Apparently this looked easy (understandable), so I was asked to run:

$ ./btrfsck -s 1 /dev/mapper/voxbox-root
[...]
rec 201236885504's owner is 5, we want 7
rec 201236885504 is a root isnt setup right, found ref? yes

-> http://bpaste.net/show/6QZjOib5Rqx8xdipYOkB/

and


$ ./btrfsck -s 2 /dev/mapper/voxbox-root 
[...]
rec 201236885504's owner is 5, we want 7
rec 201236885504 is a root isnt setup right, found ref? yes
-> http://bpaste.net/show/SuzvqAZ5ZLjykXErA1yU/

But then it was no longer as easy, so the dev told me he would need a couple of hours to write a patch. Meanwhile I could run a restore operation at least to backup my data:


https://btrfs.wiki.kernel.org/index.php/Restore

So I went to a store and bought a USB drive (3Tb since I had no clue how space I would need). Format the NTFS partition, and replace it with EXT4, then mount it. Now I can run the restore:

$ ./btrfs restore /dev/mapper/voxbox-root /media/sdh/1/

24 hours later, the restore was complete ! All metadatas (ctime, atime...) are lost but at least I have all my files. And to answer my own question, I needed only a 2Tb USB drive:


$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb              2.7T  1.6T 1016G  62% /media/voxboxbackup


A couple of hours the dev guy (josef) shows up with a patch for me:

https://github.com/josefbacik/btrfs-progs/commit/1079ddc4d6c4df516c4e483278d2d6390bab0f93

So I git pull & make. Now I am on my way to actually running the repair operation:

$ ./btrfs --init-csum-tree /dev/mapper/voxbox-root

Creating a new CRC tree
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
Ignoring transid failure
Checking filesystem on /dev/mapper/voxbox-root
UUID: af7e6809-af9d-474f-a332-295cdba1c09f
checking extents
Reinit crc root

[more to come]