linux kernel: 2007

Wednesday, October 24, 2007

Trace system calls

A nice way to trace a system call is to use the strace command.

ex: strace ls

And: strace -c ls
to get some statistics. Really nice debugging tool for filesystem writer.

Monday, October 22, 2007

This is an important topic to discuss the issue of debugging with Linux Kernel. What is the best way to debug the kernel? When facing with a race condition, crash, or freeze that we have no control of? In this topic I'll try to explore the various ways of debugging the kernel. Let me first list all the existing techniques and go through the details of each (let me know if there's other I'm not aware of). Followed by an explanation of which one I consider the best for debugging (personal view).

- Logging or debug trace through kdprint messages (/var/log/messages) dmesg
- Step through the kernel code using gdb debugger
- Using a serial port to debug from another machine
- Reading the kernel source code
- Sending system interrupt control when frozen
- Analyze oops message, dump object file

My point of view for the best way of debugging a kernel is put trace logs. The trace allows you to know the last point where the crash occurs:

You can use dmesg to print the kernel ring buffer or simply tail the /var/log/messages that's where most kernel kdprint message goes. Similar to printf, kdprint prints out the kernel message. This technique helps during debug mode as you don't want to print traces all around.

Using gdb is harder for kernel debugging, anyone tried? You also need to build a debug version of the kernel if I'm not mistaken.

Using a serial port I never tried that but worth trying.

Sometimes you may get a BUG from the kernel source, due to your code passing some wrong values. It's like an ASSERT from the kernel. This is easy to solve as you can directly read the source code where the BUG occurs. In many cases it may be dangerous reading the kernel source as some may be too tempted to patch it so it won't BUG anymore. My point of view the kernel source is for you to understand the behavior and not to patch. Until you understand it don't patch anything...

Another way of debugging is to analyze a oops message. Linux generates oops message caused by segmentation fault, etc... You can also try to generate an oops message when the machine does not respond anymore, in that case the kernel may be in deadlock scenario. You can configure the kernel to respond to some interrupt keys like ctrl-sysrq.

What is an oops message. oops message is similar to kernel dump file in Windows not as much information though, but basically you may find your last EIP address to trace back the last call before the crash.

If your oops message is not detailed enough, you can use the ksymoops command on linux to generate more detailed information. With an oops file, one can also dump the object file to figure out the last access line of code before the crash. A nice command in linux is the objdump command.

That's enough for today, more on this later......

File explained...

A file object is used for reading or writing the data. A set of operations is defined to allow read/write from/to the data. The file structure keeps a pointer to the dentry of the opened file which holds a reference to the inode. Files are accessible to processes through their file descriptors. The process id is maintained inside the file owner field with a signal to deliver data to the process. A file can be memory mapped directly using mmap.

struct file_operations {
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release) (struct inode *, struct file *);
int (*fsync) (struct file *, struct dentry *);
int (*fasync) (int, struct file *, int);
int (*check_media_change) (kdev_t dev);
int (*revalidate) (kdev_t dev);
int (*lock) (struct file *, int, struct file_lock *);
};

Dentry explained...

Dentry is a name to inode translation also known as the Directory Cache. Each name component of a pathname is represented by a dentry object for fast access from the Dcache. The name lookup gets a hash of the name from the dentry hash table. The dentry data structure contains a pointer to the inode and to a parent dentry for the parent directory. The latter is useful to get the whole pathname by going back the parent dentry. Similarly a set of dentry operations is defined for manipulating the dentry from the cache.

struct dentry_operations {
int (*d_revalidate)(struct dentry *, int);
int (*d_hash) (struct dentry *, struct qstr *);
int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
void (*d_delete)(struct dentry *);
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
};

Inode explained...

VFS deals with inode that represent a file or directory object (handle). An inode cache is maintained by the VFS layer to keep track of all frequently used inodes. Each inode contains an inode number that is unique: ls –i can list the inode number associated to a file or directory. Inode manipulation is implemented through the inode operations that define how to: mkdir, rmdir, create, lookup, rename, get attribute, etc…

struct inode_operations {
struct file_operations * default_file_ops;
int (*create) (struct inode *,struct dentry *,int);
struct dentry * (*lookup) (struct inode *,struct dentry *);
int (*link) (struct dentry *,struct inode *,struct dentry *);
int (*unlink) (struct inode *,struct dentry *);
int (*symlink) (struct inode *,struct dentry *,const char *);
int (*mkdir) (struct inode *,struct dentry *,int);
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,int,int);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
int (*readlink) (struct dentry *, char *,int);
struct dentry * (*follow_link) (struct dentry *, struct dentry *, unsigned int);

int (*get_block) (struct inode *, long, struct buffer_head *, int);

int (*readpage) (struct file *, struct page *);
int (*writepage) (struct file *, struct page *);
int (*flushpage) (struct inode *, struct page *, unsigned long);

void (*truncate) (struct inode *);
int (*permission) (struct inode *, int);
int (*smap) (struct inode *,int);
int (*revalidate) (struct dentry *);
};

Super Block explained...

Each mounted file system maintains a super block entry to represent the mounted file system.

struct super_operations {
void (*read_inode) (struct inode *);
void (*write_inode) (struct inode *);
void (*put_inode) (struct inode *);
void (*delete_inode) (struct inode *);
int (*notify_change) (struct dentry *, struct iattr *);
void (*put_super) (struct super_block *);
void (*write_super) (struct super_block *);
int (*statfs) (struct super_block *, struct statfs *, int);
int (*remount_fs) (struct super_block *, int *, char *);
void (*clear_inode) (struct inode *);
void (*umount_begin) (struct super_block *);
};

VFS explained

VFS stands for Virtual File System. VFS is an abstraction layer of a file system; it provides an interface or contract for anyone interested in implementing a real file system under Linux. With this pure abstraction, Linux OS can host multiple file systems without the client being aware of the differences. Many of the known file systems are: ext2, ext3, reiserfs, proc file system, nfs, etc...

To implement your own file system, one needs to create a kernel module and call during the module init the register_filesystem method with a file_system_type structure. It then adds the newly added file system to the list of file systems the kernel is aware of for mount or other syscalls. VFS makes specific file system call based on the registered file system mount point. The client or user is not aware of the VFS switch when traversing different mount point.

The file_system_type structure provides an entry point to setup the super block. A super block is created when our specific file system is mounted. The super block contains information of the entire file system and maintains a link to the root dentry mainly mount point entry.

After filling the file_system_type super block entry, the next step is to provide the specific file system operations for the: super operations, dentry operations, inode operations, file operations, export operations.

super operations provide file system handling operations like mount, remount. It also defines the inode handling such as read_inode, write_inode, delete_inode.

inode operations define the file or directory operations like create, lookup, mkdir, rename, getattr. A file or a directory is represented by an inode.

file operations implement the file specific operations like open, read, write, flush, lock.

dentry operations provide the dentry caching functionalities such as revalidate a dentry, hash, compare, delete, release. A dentry represents a directory entry that is maintained in the dcache. Each directory path component represents a dentry object. /hello/test contains the / dentry followed by hello dentry and test dentry.

Friday, October 19, 2007

SLAB INFO

The slab info is a useful view of the kernel cache statistics.
Located under /proc/slabinfo

A useful command is the watch command to repeatedly display the cache usage:

watch -n 1 "cat /proc/slabinfo | grep inode_cache"

The above command continuously (every second) displays the inode_cache info from the kernel.

Linux Kernel Test

A simple way to stress your kernel module without going to write lots of test cases from scratch is to reuse some test framework from the Linux Test Project:

http://ltp.sourceforge.net

Download the LTP test suite and there you go.

The test case I use most is the inod stress for file system development.

Hope it helps

Thursday, October 18, 2007

how to get the current kernel version

uname -r //print system information

-r => kernel release

kernel module

kernel modules can be dynamically loaded at runtime some usefull commands are:

-- insmod modulename.ko => The ko file is a kernel module object
insmod register this module to the kernel

-- lsmod => list all loaded modules usefull to verify that our kernel module is loaded

-- rmmod modulename.ko => use to uninstall our module from the kernel

No restart is necessary to install/remove kernel modules.

linux kernel