Muli Ben-Yehuda's journal

July 31, 2003

Lustre-cluster-file-system

Filed under: Uncategorized — Muli Ben-Yehuda @ 3:39 PM

Thu. 12:30 AM

During the break between this talk and Dave Jones talk, helped Behdad Esfaboud with compiling the cipe module. Since it was b0rking with stuff related to module symbols, I just turned off module versions in the .config and that did the trick. Also talked to zwane about system call hijacking in
2.5.

Afterward, I was walking around forlornly, looking for people to talk to and not knowing how to start, when Orna hit upon a brilliant idea. The symposium has a gpg key signing party, and participants are supposed to verify each other’s identity between the talks. Participants also wear a red dot on their badges, so that they can identify each other. Orna just looked around, found the first guy that had a red dot on his badge (Ryan, one of the debian developers), and started talking to him, using the gpg key signing “verification ritual” as a start. Brilliant! That’s what we did for the rest of the break, too. Right now, I have 20 or so of 130 identifications.

Listening to a talk on the Lustre distributed file system now:

– concentrating on integrations with the Linux VFS. Eventually, eschewed using the dcache completely since the VFS wants to lock directories, which is very bad for lustre.

– Uses object protocols (rather than block based protocols)

– gigabytes of debug informations for simple I/O operations. How to make sense of so much data?

– use debug tools extensively – UML, gdb, mcore, netdump, crash, kgdb.

– debugging distributed systems is hard, but what else is new?

– how do you handle disk failures? we emphatically say: “that’s not our problem!”

– “Linux machines will beat out all of the EMC machines, netapps, etc for data storage”

– they have a single MDS (Meta Data Server). Obvious scalability bottleneck? yes, but a pretty far out bottleneck. MDS is a really quick, fast machine. Currently, the limit is about 5,000 file creations per second.

– “TCP/IP offload cards suck”

– does not scale down as well as they scale up, but plan to work on it. The architecture should support it – would consider it a personal failure otherwise.

Very interesting talk, all in all.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: