Muli Ben-Yehuda's journal

July 31, 2003

Asynch IO for 2.5 and IBM banquet dinner

Filed under: Uncategorized — Muli Ben-Yehuda @ 3:58 PM

Thu 16:30

Async IO for 2.5.

Walked out of this one pretty soon after it started, due to the speaker’s indian accent and the light bulb that went up over my head – I can use xchg() to replace the syscall pointer atomically! Rushed out to finish my “how to port syscalltrack to 2.5” paper and email it to sct-hackers. Also Did the identity ritual with a few more people.

Today was the banquet dinner, sponsored by IBM. Had it with sarnold, mharris@redhat and another Redhat guy. The food was reasonable, the Alan Cox Stories were superb.

dinner speech was by Ian Stewart, of grid fame, and I spent most of it thinking about projects I could do independent research on:

– continue developing syscalltrack. – persistent scheduler. – storage intrusion detection.

Need to think more about all of them.

Dave McCracken’s Shared Page Tables talk

Filed under: Uncategorized — Muli Ben-Yehuda @ 3:55 PM

Thu 15:00 PM

Listening to Dave McCracken’s Shared Page Tables talk. This is the most interesting talk I’ve heard so far, not in least because it’s something that I want to work on. [Later in the day, I did start working on it].

– shared memory areas mapped in many address spaces can take up more space in page table space than in data space.

– mm_struct: one per address space

– vma: one vma per mapped area per address space – linked list and tree anchored in mm_struct – describes a virtual address range and protection – reference to the backing file – anonymous vmas – have no backing file

– page table – one page table for each address space – pointed to from mm_struct – three levels – pgd, pmd, pte – doubles as hardware page table for most archs

– one address_space structure per open file. struct address_space does not describe an address space! it describes a file… – anchors list of all vmas that map a region of the file – contains a page cache of all physical pages containing data form the file

– struct page: one per physical page – describes how the page is used – has a pointer to address_space if it’s mapping data from a file – all page structs live in mem_map – with rmap – has a back pointer (or array of back pointers) to all of the ptes that map the page

– to create a new memory area – either mmap or shmemap – all shmem is file backed, either explicitly or implicitly via shmfs (internal file system) – if a page is marked prive and read_write, modified pages are converted to anonymous and backed by swap

– a page is only mapped when a task faults trying to access it – fault code finds the correct vma and pte entry, then finds and maps the page. if necessary, the pte page is allocated on the fly.

– mm subsytem has three primary locks: – read/write semapore, mmap_sem in mm_struct, protects the vma chain. taken for read during a page fault, taken for write for mmap, f.e. – spinlock page_table_lock protects the page_table – i_shared_sem in address_space protects a file’s vma chain. used to be a spinlock in 2.4, turned into a semaphore in 2.5

– sharing pte pages: – overhead for singly mapped area is small – overhead for each area grows linearly with number of mappings – massively mapped areas could use more physical pages memory for page tables than data pages – pte pages for large shared areas are identical in each address_space

[shared segments which aren’t mapped in the same virtual addresses aren’t currently considered shared – TODO ;-)]

– finding shareable pages: – vma must be shareable, must span entire pte page – walk address_space chain of vmas looking for one mapping the range – check the pte page for each mapping vma to see if it can be shared

– setting the pmd entry read-only allows you to do copy-on-write of pte pages?

[forks slowed down significantly in 2.5, due to rmap pte chains, and then shared pte sped that up again]

– locking changes: page_table_lock breaks when pte pages are shared – new lock in pte_page_lock protects pte page

– complications – reverse mapping includes pointer to mm_struct – shared page tables pages may need pointers to multiple mm_structs – pointer had to be converted to a chain – several system calls may modify mappings and require unsharing pte pages

[philosophy: better safe then sorry, if not 100% sure that the sharing is correct, unshare it]

– primary motivation of the project is reduction of memory overhead [page tables live in lowmem]

– COW improves fork performance by factor of 10 – unsharing costs as much as fork without COW, plus a little extra – all programs unshare at least 3 pte pages – small programs only have 3 pte pages – simple hack is to not do COW for such programs (with only 3 pte pages)

– kernel compile showed no change when sharing pte pages – applications with massively shared areas benefited indirectly from the extra avaliable memory

– status: patch was stable in about mid-novemeber last year – the patch is still there and dmc is still maintaining it – talk to dmc for his copy for the patch

during the break, met Jeff Dike in person, who told me that shared page tables should go into UML rather effortlessly, since the code is very similar in its organization, and also talked to dmc, who said that the patch -mjb is pretty much up to date.

Bill Irwin’s PGCL (Page Clustering)

Filed under: Uncategorized — Muli Ben-Yehuda @ 3:41 PM

Thu, 13:30 PM

Listening to Bill Irwin‘s PGCL (Page Clustering) talk now.

– teach the VM to handle “partial pages” – why do you want to do this? structures sized based on memory take up less space. – shrinking search structures containing pages (radix tree, LRU page replacement lists) – ABI preserving variant is backward compatible – Kernel Summit news: Linus is actually interested in something like this, but generic

Random thought, while my brain takes a break from trying to decipher what Bill is saying: going to OLS with Orna wasn’t the best idea – I’m interested in the talks and meeting people (which I’m not doing well at all, granted), and Orna is more interested in touring Ottawa, etc.

– early boot issues stemming from lots of places assuming that virtual pages are MMUPAGE_SIZE in size, where PGCL does MMUPAGE_SIZE != PAGE_SIZE.

Another random thought: this is one of the most interesting talks / projects presented here, but I’m having a hell of a time understanding Bill. Shame.

– MMUPAGE_SIZE is the physical size of pages – PAGE_SIZE is the virtual size of pages

– Bill is talking about various bugs he had and how he solved them – combination of luck and looking around for hints, auditing code (always necessary when changing fundamental design assumptions…)

Another talk where reading the paper make more sense than listening to the talk. Seems to be quite a lot of them, unfortunately. Good thing the corridor and pub discussions make up for it.

Lustre-cluster-file-system

Filed under: Uncategorized — Muli Ben-Yehuda @ 3:39 PM

Thu. 12:30 AM

During the break between this talk and Dave Jones talk, helped Behdad Esfaboud with compiling the cipe module. Since it was b0rking with stuff related to module symbols, I just turned off module versions in the .config and that did the trick. Also talked to zwane about system call hijacking in
2.5.

Afterward, I was walking around forlornly, looking for people to talk to and not knowing how to start, when Orna hit upon a brilliant idea. The symposium has a gpg key signing party, and participants are supposed to verify each other’s identity between the talks. Participants also wear a red dot on their badges, so that they can identify each other. Orna just looked around, found the first guy that had a red dot on his badge (Ryan, one of the debian developers), and started talking to him, using the gpg key signing “verification ritual” as a start. Brilliant! That’s what we did for the rest of the break, too. Right now, I have 20 or so of 130 identifications.

Listening to a talk on the Lustre distributed file system now:

– concentrating on integrations with the Linux VFS. Eventually, eschewed using the dcache completely since the VFS wants to lock directories, which is very bad for lustre.

– Uses object protocols (rather than block based protocols)

– gigabytes of debug informations for simple I/O operations. How to make sense of so much data?

– use debug tools extensively – UML, gdb, mcore, netdump, crash, kgdb.

– debugging distributed systems is hard, but what else is new?

– how do you handle disk failures? we emphatically say: “that’s not our problem!”

– “Linux machines will beat out all of the EMC machines, netapps, etc for data storage”

– they have a single MDS (Meta Data Server). Obvious scalability bottleneck? yes, but a pretty far out bottleneck. MDS is a really quick, fast machine. Currently, the limit is about 5,000 file creations per second.

– “TCP/IP offload cards suck”

– does not scale down as well as they scale up, but plan to work on it. The architecture should support it – would consider it a personal failure otherwise.

Very interesting talk, all in all.

Dave Jones Resurrecting Unmaintained Code

Filed under: Uncategorized — Muli Ben-Yehuda @ 3:36 PM

Thu, 10:00 AM

Listening to Dave Jones talk about resurrecting unmaintained code. This is one of the “fluffier” talks, where it seems to me that it was accepted to OLS more on because of who the speaker is, than because of technical interest. Regardless, Dave is an entertaining speaker. Despite some technical difficulties at the beginning of the talk, it is going rather well.

So far, lots of common sense on how to write maintainable code. For example, write small functions that do one thing well (some of my coworkers should have this tattoo’ed on their foreheads) and “put different functionality in different .c files”. TODO: Code snippets to the contrary of these dictums would make good All Code Sucks posts.

The talk’s examples examples include the MTRR[0] driver and agpgart code.

I spent most of this talk thinking about and writing a “how to port syscalltrack’s system call hijacking code to 2.5”

[0] Documentation/mtrr.txt: “On Intel P6 family processors (Pentium Pro, Pentium II and later) the Memory Type Range Registers (MTRRs) may be used to control processor access to memory ranges. This is most useful when you have a video (VGA) card on a PCI or AGP bus. Enabling write-combining allows bus write transfers to be combined into a larger transfer before bursting over the PCI/AGP bus. This can increase performance of image write operations 2.5 times or more.”

July 26, 2003

be back after the commercials

Filed under: Uncategorized — Muli Ben-Yehuda @ 11:09 PM

Network going down now. OLS great. Be back on Wed, rest of OLS updates then.

Muli, signing off from the OLS soon to no longer be network room. Cheers!

July 25, 2003

Paul Mackerras, Low Level Optimizations in the PowerPC Kernel

Filed under: Uncategorized — Muli Ben-Yehuda @ 11:26 PM

Wed, 15:00

After Jon Corbet’s porting to 2.6 talk, went to hear Hirokazu Takawa talking about porting Linux to the M32R architecture. This talk, how shall I put it gently, left a lot to be desired. The speaker, while obviously knowing what he was talking about, spoke in a monotone, broken english, that made following along practically impossible. I gave up and decided to read the paper instead, and went out in the middle.

After this talk, went to eat lunch with zwane, sarnold, Nick Piggins and Orna. For some inexplicable reason, we ended up in the mall’s fast food court, and the food’d quality was about as good as could be expected – that is, not at all.

After lunch, went to hear Matthew Porter’s Bringing PowerPC Book E Processor to Linux talk. This was a pretty good talk, which was a relief after the previous two talks. Low level, dealing with memory management and the peculiarities of the Book E processor, which is a PowerPC variant.

The I went to hear Paul Mackerras, on Low Level Optimizations in the PowerPC Kernel. This was an excellent talk, dealing with three low level optimizations: PTE management, memcpy implementations and cache management (the PPC architecture is not cache coherent). I really enjoyed this talk, not in the list because Paul Mackerras is a great speaker, clear and interesting. I wonder if I could get a PPC machine from work to play with 😉

At this point, I have to make a small disgression. OLS thus far is not what I expected. It seems that I don’t have anything to talk to people about, with syscalltrack development in a stand still, and my kernel work being too trivial to mention, in my (probably wrong) opinion. It’s probably all in my head, but it makes me feel pretty lousy. after Paul’s talk I headed back to the room since my eyes were burning from the contact lenses I left in over night (don’t do that, kids). I ate a quicky dinner, and instead of going back, fell asleep and woke up at 1 AM. Missed the last talk of the day and the welcome reception. Oh well. I briefly considered going out to a random pub and finding who was there, and instead read until sometime in the early morning, and then went back to sleep. So much for OLS, day 1.

Next: Dave Jones on resurrecting unmaintained code.

Jon Corbet’s Porting Drivers to 2.6 talk

Filed under: Uncategorized — Muli Ben-Yehuda @ 11:26 PM

Wed, July 23rd, 10:00 AM

Jon Corbet on Porting Drivers to 2.6

I missed the first few minutes of this talk due to writing down the previous day’s escapades. Tuning back in, Jon is talking about the changed interrupt handler signature. Nothing new so far, I’ve been following this stuff pretty closely during 2.5 development.

Next slides: the linux device model. Impact on drivers can be small, since things are mostly handled at the bus level. Again, nothing new.

Just spotted Ted T’So sneaking into the talk.

Considering heading out of the talk to go and reply to my email… should I? No. But I know this stuff! Yes, but you’re here to listen to talks, not to do email!

Obviously, the email daemon won and I ended up going out of the talk. In my defense, I really did know what he was talking about – even wrote code to interface with some of it. Took care of email, met Bill Irwin and Dave McCracken. Continued debugging my initscripts problem. The problem was that /etc/init.d/network would claim that the network was up, even when it wasn’t. Looking at the logs, they were filled with garbage such as “Jul 24 00:31:22 hydra ifdown: ./ifdown: ++: command not found”. Eventually, I found the problem – /etc/init.d/network calls /etc/sysconfig/network-scripts/ifdown, which executes a function from another file, which sources /etc/sysconfig/network-scripts/eth0. Somehow, this file was filled with garbage. I removed it, and now everything is working fine. Hmpf.

During the break, met Shawn Starr, Andrew J. Hutton and erikm.

Next: Paul Mackerels, Low Level Optimizations in the Power PC kernel.

What we did on Tuesday

Filed under: Uncategorized — Muli Ben-Yehuda @ 5:48 PM

Wed, July 23rd, 10:00

On Tuesday morning, we woke up pretty early, at 6 AM in fact. I blame the jet lag. We went out in search of breakfast, strolled through byward market, which was in the process of being set up, and eventually ended up at Tim Horton’s, in Rideau Mall. So much for our search for quality food. At 8 AM, we went to the local super market, Loeb’s and bought 91CAD worth of food. Took the food to the room, arranged it in the fridge and cardboards, and went back to sleep.

Around 2 PM, we woke up. Cooked a tasty lunch from the food we bought, and then went to register for OLS (happy happy joy joy!). At the registration desk, we discovered that no wireless cards will be supplied this year. Since the thought of having no network access was appalling (for me, at least… Orna could handle it just fine) we set out to buy a wireless card.

The first step was to take down the model numbers of all of the cards available in the local computer shop, compucenter. Then, I went to chapters’ internet cafe to see which of the cards is supported in Linux. Buying a non supported card can be fun, if documents are available, but I needed it working now. Turns out that only one(!) of the cards, a USR 2410 card, works. While I was googling, Orna was shopping at chapters. I told her that I’m done, looked at the gazillion books she picked up, waited for her to pay for them, and then we went together to buy the card – only to find someone else from OLS bought it! The only copy! Argh!

We immediately set out on a long walk to the next nearest computer store in the area, and bought a card that appeared to be supported, a D-Link DWL-650+ card. The card looked like it requires a driver from linux-wlan.org, which requires net access to download. Chicken and egg problem, indeed. What I intended to do was boot into the never used windows partition on hydra, my IBM laptop, use the drivers supplied with the card to download the linux driver, and then use that. Calm in the knowledge that I have a card that should work, while still experienced enough to have that ominous feeling in the pit of my stomach that says anything that can go wrong, will, we went back to the hotel room. In the room, I discovered that I have no CDROM drive on the laptop, and thus no way to download the Linux drivers. Banged head against wall several times.

By this time, Orna was growing restless (to put it mildly) that we’re chasing after wireless cards instead of touring Ottawa, like good little tourists (it’s my diary, I’m allowed to whine). So we went out to look at the river and talk about various things, try to synchronize our expectations. Eventually we went back to Les Suites, after buying disinfectant for my hand and Taco Bell for dinner. At the lobby, we met the illustrious zwane and the magnificent sarnold. The four of us went out to the to the Highlander Pub, where much merriment was had. Eventually stumbled back to the hotel (I had a Vodka Martini and some Cognac, *hic*), and went to sleep.

In the morning, I woke up early and went to OLS’s network area to download a driver for the wireless card. After banging head against wall, a kernel compile and some heavy RTFM’ing, discovered that contrary to what I thought, the card is not really supported!. Orna took charge, and we went back to the store and returned it. By the time we got back to the congress center, it was 10 AM, and Orna went to hear the performance talk, while I went to Jon Corbet’s Porting Drivers to 2.6 talk. On the way in, met the venerable Behdad Esfahbod.

Next: Jon Corbet’s Porting Drivers to 2.6 talk.

up and awake at 6 AM

Filed under: Uncategorized — Muli Ben-Yehuda @ 4:51 PM

Tue, July 22nd, 06:15

It’s six AM and I’m up and bouncy. I blame the jet lag.

Last evening, we walked around downtown Ottawa. We had good weather, and Ottawa is beautiful. Wide streets, very little traffic compared to what I expected, and people of many ethnicities. A few minutes of walking around were enough for me to ask Orna if she might want to live here :->

We stopped at chapters in order to check out the internet cafe and email Oleg about the wireless cards used at IBM HRL. Naturally, the internet cafe was out of order. Then we took the Ottawa Haunted Walk tour. Orna says it’s about typical, but I was disappointed with it. We didn’t get into any supposedly haunted locations, just stood outside and heard stories about them, which were mostly repetitive and didn’t have any “interesting” details. Orna says it was light on history, too. It was a nice excuse to walk through downtown for a couple of hours, though.

After the haunted walk ended at 21:30, we watched a light and sound show on parliament hill. Then we went looking for food at the Rideau Mall, and due to the late hour all we found was orange juice and Reese’s peanut butter cups. At least it was a good excuse to get the peanut butter cups 😉 Went back to the hotel room, debated which movie to watch on the hotel pay-per-view system – Orna wanted Holes, I considered Tears of the Sun. We ended up watching The Simpsons on one of the local TV channels, and eventually fell asleep.

OLS registration begins today!

« Previous PageNext Page »

Blog at WordPress.com.