Thursday, July 16, 2009

First kernel patch

My real first kernel patch. I am quite exciting about it. Finally I got a Signed-off-by in the kernel commit log :)

Luckily enough, the patch got merged just before 2.6.30-rc3 was released.

commit ac046f1d6121ccdda6db66bd88acd52418f489b2
Author: Peng Tao
Date: Mon Jul 13 09:30:17 2009 -0400

ext4: fix null handler of ioctls in no journal mode

The EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND ioctls should not
flush the journal in no_journal mode. Otherwise, running resize2fs on
a mounted no_journal partition triggers the following error messages:

BUG: unable to handle kernel NULL pointer dereference at 00000014
IP: [] _spin_lock+0x8/0x19
*pde = 00000000
Oops: 0002 [#1] SMP

Signed-off-by: Peng Tao
Signed-off-by: "Theodore Ts'o"

Wednesday, July 15, 2009

Valerie's Advice to FS Developpers

From an interview from Linux journal.
Keep it in mind...

Keep talking to other file system developers face to face, keep experimenting with new file systems, keep talking to people in research and academia, keep paying attention to hardware trends. The way to avoid the “file systems are a solved problem” echo chamber is to stay in touch with both each other and the outside world.

Monday, July 13, 2009

Content Addressable Storage -- The Next Step

Recently, I was writing an slightly modified version of implementation of Content Addressable Storage (CAS) for our on-line image servers. I found it a perfect choice to apply CAS to image servers, where images are uploaded once and never changed, and where disk access throughput is a main bottleneck to many large-scale on-line image service providers.

As explained on wikipedia, CAS is a mechanism for storing information that can be retrieved based on its content, not its storage location. It is typically used for high-speed storage and retrieval of fixed content.

CAS Characteristics:
1. Storage is identified by its content
2. Designed to make the searching for a given document content very quick
3. Works most efficiently on data that does not change often

Currently, most servers store images as common files and rely on filesystem's mechanism to optimize reading speed. However, filesystems are limited by POSIX standards and are optimized only for everyday usage. For large-scale image servers (or any application storing large amount of images), CAS will be a better choice (for the following reasons).

1. deduplication: both block-level CAS and file-level CAS achieve efficient storage utilization by storing same data only once.
2. Aggregation: log-structured extent-based on-disk object-stores, storing many images continuously in a single large file, resulting only one disk seek if index files are saved/managed in-memory (verses 2~3 disk seeks per read for common files).

My CAS implementation is written by the fact that most images are small files (~10MB) and there are huge amount of them on image servers. The implementation should speed up accessing of these image files. I am planing some benchmarks this week and hope to post them here soon.

After that, I will be writing a paper illustrating the opportunistic of applying CAS system to large-scale image servers. Then, I also want to explore the possibility of changing the interface (to, maybe dbus?) and make it runnable on common desktop, because current gnome/kde also read a lot of thumbnails. Maybe I can make it quicker somehow. Who knows :)