Lab on making your own system call.

Start the work of this lab in the first week's meeting and finish it during the second week's meeting. The writing homework is still due one week from the Friday of the first week, and the programming two weeks from the Friday of the second week. Use the second meeting to work on some of the programming homework technicalities and design work.

This lab assignment is connected with a homework assignment. It would be a good idea to begin with the reading part of the homework before the lab, then do the lab assignment in the lab (or on your own system) and finish the writing, programming and experimentation parts of the homework afterwards.

This lab uses all your work from the previous lab: a virtual machine with our modified OpenSuSE Linux installed and your Linux kernel build tree. If you are using a different computer from last week, see me about either getting a new virtual machine plus a kernel source tree in your huge subdirectory under /local/csi400+500/ and/or copying your old work to the machine you are using now. If you start with a new kernel source tree, REMEMBER to get our

Note: In the lab, we find it took about 10 min for a full kernel build even with up to 10 concurrent jobs (with make -j 10 bzImage). and about 10 min to copy the Oct4-6.vdi file over the network with ssh.

Necessary technicalities for making your own Linux 2.6 or 3.0 system call.

A good way to find out about all of the things you must do to add a new system call is to find all the things that pertain to a particularly simple system call such as getpid().

Each system call service routine C function definition is coded with the SYSCALL_DEFINE0 to SYSCALL_DEFINE6 macros. You must name the system call in the macro invocation. For example, the kernel code has SYSCALL_DEFINE0(getpid)plus a C function body (in {...} braces) for defining the system call named getpid. You can find this example in kernel/timer.c

These macros generate a prototype for a function with the name constructed by prepending "sys_"; so the system call service routine for the getpid system call is the C function named sys_getpid.

The definitions of these SYSCALL_DEFINE... macros are in include/linux/syscalls.h. Hence, the .c file in which you code the body of your syscall's service routine must

#include <linux/syscalls.h>

For consistency's sake, please name your system call csi500

  1. A declaring (not defining) function prototype must be coded in include/linux/syscalls.h

    Therefore, you must modify the existing include/linux/syscalls.h file. You will write a function declaration for sys_csi500 by following the pattern you see for the other sys_.. function names declared there.

  2. You must add a new entry to the table of system call service routine addresses. The position (counting from zero) of the new system call's entry in this table is the number of your new system call. You must record this position(index) so you know what number to write in your system call testing application.

    The lab systems are 32 bit x86, so because it is only for the lab, it's only necessary to modify the table for 32-bit systems. That table is in file arch/x86/kernel/syscall_table_32.S

    There is also a file that defines one macro that expands to each syscall number. It is

    arch/x86/include/asm/unistd_32.h

    It is not necessary for you to modify unistd_32.h

    If you must build a 64 bit x86 kernel, you also have to put a new entry in the table in the file arch/x86/include/asm/unistd_64.h The 64-bit code to #include's this file (twice!) into arch/x86/kernel/syscall_64.c to build the system call table.

  3. You will have to modify kernel/Makefile so when the kbuild system processes that Makefile, it will compile your csi500.c file and link the resulting csi500.o object file into the kernel.

    Study the contents of kernel/Makefile and figure out how you need to modify it so csi500.c is included in the kernel build. Hint: Just add one entry to the end of one line!

  4. We want you to code the body, data structures, supporting functions, etc for your new system call in a separate kernel source file. Edit a new file named csi500.c in the kernel/ subdirectory of the your source tree. This file must:

    Here is an expecially simplistic system call sample:

    #include <linux/syscalls.h>
    #include <linux/printk.h>
    SYSCALL_DEFINE0(csi500)
    {
       printk("<0> Hello from the csi500 system call written by ...\n");
       return 0;
    }
  5. You must of course (1) build your modified kernel (as done in the previous lab) (2) copy into /boot your modified kernel in the form of a "compressed kernel" named bzImage and (3) reboot the virtual machine so it runs the your revised kernel. When you (1) build that kernel, use the command

    "make -j 10 bzImage" with the concurrent jobs option -j to speed things up.

  6. In order to test your system call, you must ALSO provide an application program, in the virtual machine, that calls your system call. Since our minimal pre-built installation disk image (Oct4-6.dvi) does not have the C compiler, you can either (1) install the C compiler in it or (2) build your application program on the host (native desktop) system and copy its executable file into the virtual machine.

    In either case, you must (1) know the system call number and (2) program invoking the system call in a special way. Traditionally, invocation system calls was coded in assembly language, but fortunately, the Gnu/Linux C library has a handy C function named syscall() for this purpose.

    Write and compile an application program, get it into your virtual machine and run it to test your system call. Refer to documentation for syscall() from man syscall on a cheese machine to see how to write your application program. This documentation specifies which header files must be #included.

    Here is a particularly simple application program sample (For my kernel build, the system call number was 341 but it might be different for you!):

    #define _GNU_SOURCE
    #include <unistd.h>
    #include <sys/syscall.h>
    #include <stdio.h>
    
    int main(int argc, char *argv[])
    {
      int retval = syscall(341);
      printf("My system call returned %d.\n", retval);
      if(retval < 0)
      {
        perror("My system call returned with an error code.");
      }
    }
        
  7. Optional but recommended: Use the Redhat system administration tool yast (run by root, of course) to perform "Software Management" to install the gcc compiler. Hint: You can navigate to entry boxes and menus with an Alt- key combination. Search for gcc Observe that the package manager automatically selects a lot of other packages, dependent on the gcc package, to install also.

Homework 07 System Call Lab Reading and Question Answering:

Read Chapter 10 of ULK to enable you to answer these questions. Write the answers and submit them on paper or by email to the TA by Wednesday, Oct 24.

  1. Why is a system call always made with one software interrupt but an API function might be implemented with none, one or more software interrupts?
  2. Why are there many more system call service routines or functions than there are system call handler routines?
  3. Find the definition of the SAVE_ALL macro in the x86 Linux kernel code and explain exactly what it does. (Consult http://lxr.linux.no/#linux+v3.0.4/arch/x86/kernel/entry_32.S#L194 and use the fact that the pushl_cfi macro expands to a pushl instruction with possibly directive that emits data into a non-executable section. Ignore this _cfi stuff for now.)
  4. How does the 32-bit x86 Linux kernel get the value of the thread_info structure into the %ebx register when a system call is made with an int 0x80 exception-causing "int" instruction? What is the address of the thread_info structure when, for example, the kernel stack pointer value happens to be is 0xDEA24A20?
  5. Exactly which instruction (in assembly language) makes the proper system call service routine be called based on the system call number? Assume the system call number is valid. What happens when an invalid system call number had been put into %eax by the user program by code like

    movl $536,%eax

    int $0x80

    ?

  6. A system call service routine, like the sys_csi500() service routine you wrote, passes its return value in register %eax. How does that value in %eax, on return from the service routine, end up in %eax when the system call eventually returns to the user thread that called it, even though the kernel might have rescheduled many times before that return occurs? This question is about a detail of exiting a system call.

A simple but real experiment.

Part 1 User Level

  1. Do user level programming in a directory entirely separate from the kernel building. I strongly recommend making a directory named KernelCalling under your home directory (as user master) on the virtual machine.
  2. Write a C program that runs a loop to access, in subscript order, all the elements in a large array (length 100,000; millions or perhaps more.) Expand the program so the same loop (or a copy of it) is run three or four additional times.
  3. Find out how to use the gettimeofday() C library function. Code calls to this function and printf() operations to report the time in seconds (microseconds too is optional) (1) before any loop runs and after each run of a loop. Hence, if the loop runs 3 times, there will be 4 printf() operations.
  4. Experiment with various large sizes of arrays and record, for each one, the times printed by your program and calculate the time each loop took to run.

Part 2 Kernel Level

  1. Look at struct task_struct in the Linux sources. (Remember, it's the key data type used data type used by the scheduler, which is a module of the kernel. Directory include/linux contains Linux header files and kernel/ contains the modules of the kernel.) Your job is to extend the code of your system call so that the system call prints to the system log the values of the two fields in struct task_struct that record (1) the count of "major page faults" and (2) the count of "minor page faults". Of course, the task_struct to read from is the current thread's. The identifier current points to it.

    You will have to find out how to use the proper C99 library printf format codes for printing integers of the type of those two task_struct fields. The kernel's printk library uses the same format codes (except for one) as the user level C function printf. HINT: The answer is %lu

  2. After modifying csi500.c, try a kernel build with make -j 10 bzImage (be sure to cd .. to land in the top level kernel source directory before you type make -j 10 bzImage. Use the -j option for speed!) Fix any C errors and after a compile is successful, copy the kernel into your virtual machine.
  3. Modify your user level program so it always calls the csi500 system call in association with printing the time of day. Remember to get your user level application and reboot into your kernel with your new system call before you try to do the experiment!
  4. Get it to work and repeat your observations. This time, record, in addition to the times of day, the numbers of minor and major page faults that you can read from the system log or from the console. (Optional: If you wish, downgrade the urgency of your system call's printk messages so they don't clutter the console but appear in the system log. Read the system log with the dmesg command, or

    sudo less /var/log/messages

Homework 08 System Call and Page Fault Lab Programming:

Submit writing (word processed or scanned), an archive of the programs you wrote (the csi500.c and csi500.h files plus user level application source files.) and a report of the experimental results by Mon, Oct 29 to Blackboard for full credit.

Warmup question: (using discrete mathematics)

First law of cache performance:

Average   access   time = ( ( hit   ratio ) × ( hit   time ) ) + ( ( 1 ( hit   ratio ) ) × ( hit   time + miss   penalty ) )

It's a good exercise to extend this formula to 2 (or more) levels of cache. YOU give names to the hit ratios for the two cache levels, and names for the relevent hit times and miss penalties, and then figure out and write the extension of this formula to a two level cache system. Include your written explanation of how to figure out this formula, say if you forgot it and were disconnected from the Web!

Find out from ULK chapter 10 of ULK, somewhere inside the section on Parameter Passing, how to pass parameters and make the system call write data into a C structure when the user passes the address of the structure as one of the system call parameters. (Homework for graduate students; optional for undergrads: Read the entire remainder of Chapter 10 from Parameter Passing.)

Modify your csi500 system call so that it copies to user space (like the read( ) system call does) the long unsigned values of the minor and major page fault counts from the current task's task_struct. Keep the code to write the same data to the system log so you can check your new code. Of course, you must modify your user level program to call your modified system call with the right parameters and to print the numbers of minor and major faults. The creative part of this step is to design a suitable C structure to hold the two page fault counts, and program your system call and application program so the system call service routine receives a pointer (i.e., address in user space) to one of your C structures and copies the two counts into the structure whose address it receives. Your application program will allocate one of these structures, pass its address in the syscall( ... ) function call, and then print the data from within the structure.

After testing, prepare to capture in a text file the record of the final experiments. Before each experiment, give the script file to record stuff command. All console input and output will be recorded into the named file. Type Control-d (end-of-file) to stop the recording. Transfer your record files to the non-virtual system.

Finally, take advantage of the easier to see output from your revised benchmark program redo (and re-record!) the benchmarks and perhaps do additional ones. You will get full credit for the lab when you transmit results that are well-defined and useful enough for me and the class to analyze.

Pitfalls in modifying the lab csi500 system call so it writes to user space the numbers of minor and major page faults:

Some miscellaneous notes and links:

(2012 revision updated (local big array pitfall 10/11/2012)