Tech Support > Operating Systems > Linux / Variants > Kernel newbie questions
Kernel newbie questions
Posted by Ravi on March 5th, 2004


I am trying to understand the Linux kernel and I have a few questions.

1. I am a little confused about the kernel's identity, I mean is the
kernel a separate process (or a collection of processes handling process
scheduling, memory management, etc.) or is there no such thing as a
kernel process and all kernel code is executed by normal user processes
while in kernel mode. If the latter is the case (and I think it is as
all processes have a kernel mode stack) how does scheduling work? Say a
user process A is running and it receives a timer interrupt? If there is
no separate scheduler process then A (in kernel mode) must do the
scheduling to figure out which process to run next. Is this correct? But
I have never seen the code (which invokes kernel scheduler function(s))
as part of a(ny) user process. Also is it possible to manually put a
process in kernel mode i.e. other than in response to an interrupt or a
system-call invocation? Also if the kernel is non-preemptible, why is
there a kernel mode stack in every process?

2. What is the difference between kswapd and bdflush? These appear to
deal with memory management and are dedicated processes. So why is it
that process scheduling is not done by a separate process while memory
management is?

3. Can you recommend a good book on the Linux kernel for a beginner?
Though there is so much material online I don't know what to read from
where so I think investing in a book (which presents a lot of
information in one place) might be a good idea? What do you think about
"Understanding the Linux Kernel" 2nd edition by O'Reilly publications
(even though it deals with kernel 2.4)?

Your suggestions are highly appreciated.

Thanks,
Ravi.

Posted by Nick Landsberg on March 5th, 2004




Ravi wrote:

[trying to dredge up ancient memories and simplify them]

The kernel is conceptually a large process which decides
what to do next based on a multitude of criteria. You will
NOT see it in a PS, as such.

The kernel is responsible, among other things for process
schedule and resource allocation. It runs in "kernel"
space, and user processes run in "user" space.

When a user process is created, an entry is made in the
"process table" for that process.

When it is time to switch from one user process to another
(either because the user process did a blocking system call
such as a read, or it's time allocation was exceeded, or
the clock interrupted the kernel, etc.), the process table
is examined for a list of "runnable" processes.
(Processes which are waiting on I/O, for example, aren't "runnable")
Once a choice of one of the "runnable" processes is
made, the current processes "u_block" (pc, registers, etc.)
is copied to a safe place (within a kernel buffer), and the
"u_block" of the new process replaces that one.
Execution in the user process begins from the PC
(program counter) of the newly runnable process.

This is greatly simplified, and may not be totally
accurate for the more modern kernels, but it should
give you a basic idea of what happens.

[SNIP}

--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch


Posted by Ravi on March 5th, 2004


Nick Landsberg wrote:

Thank you for the reply.

Who does this examination of the process table? Is there a separate
scheduler process that does this or is this done by the currently
running process (taking on the role of a virtual kernel process
momentarily)? I believe there is no separate scheduler process but there
is a separate process for memory management (kswapd, etc.) I wonder why.

Thanks,
Ravi.

Posted by Nick Landsberg on March 5th, 2004




Ravi wrote:

It depends on the implementation, but it is NOT the currently
running user-level process. When the clock interrupt happens,
for example, you "trap" to the kernel and it takes over.
Different implementations may choose to make this a seperate
kernel process or as just plain routines in the main kernel
process. If I remember correctly, the clock interrupt is
almost always at the highest level, so that scheduling
another kernel process to do the process switch code would
be counterproductive.

Memory management is a different animal. Assume you have a set
of processes which, together, exceed the size of physical ram.
One of these processes is scheduled to run and then, during
it's execution phase generates a "page fault", i.e. needing
a page of instructions of data which is not currently in memory.

"Page Fault" is a trap to the kernel, similar to an interrupt.
It is also conceptually blocking since the necessary "pages"
will need to be read in from either the swap/paging device
or the original file which defines the executable.
The process is immediately marked "not runnable", another
process is scheduled, and something like the the swapping/paging
daemon (KSWAPD) does the job of
a) if all of memory is full, applying the LRU algorithm to write
out pages to the swap device (a slow operation) and
b) reading in pages either from the swap device or the original
disk image to satisfy the request (also a slow operation)

Once this operation is complete, the process is once
again marked runnable and subject to the usual scheduling
algorithms. This may take several milliseconds, during which
time other processes are using the CPU.

Again, this is an oversimplification, but is (I hope) correct
in principle.

HTH.


--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch


Posted by Jeroen Geilman on March 5th, 2004


Ravi wrote:

There are several separate tables, corresponding to the state a process
can be in - running, sleeping, waiting, dead, etc.
AFAIK there is no one "process table" that holds all processes - it's a
collection of linked lists whose pointers are handed around by the
scheduler.

Yes, there is a scheduler running in the kernel - check sched.c to see.
As to it being a separate process - no.

From the viewpoint of the kernel as the master process that either runs
or defers all other processes, the scheduler IS the kernel.
It is its main function when viewed as a multi-processing system.

There are no kernel processes - forget that idea!
There is kernel code, and there are processes.

If you have to view the kernel as a process, it is more akin to the
cracks in the wall - no process ever logically _sees_ the execution of
kernel functions, only their results.

Because those processes are probably not what you think they are.
The kernel MM code is definitely not a process, but an inherent part of
the functioning of the kernel - it is used on every system call, with
every process fork, on every bit of I/O - it's everywhere.

The parts that are represented as processes are just the parts that can
safely be moved outside - enabling the kernel to run tighter and faster.

--
Jeroen Geilman

Analog bits courtesy of adaptr.

Posted by Nick Landsberg on March 5th, 2004




Jeroen Geilman wrote:

Conceptually, you are correct, but certain implementations
create "kernel processes" which are subject to a different
scheduling algorithm. These may handle special cases,
like paging/swapping stuff in/out because it is a
slow function. This is a nit (by the way).

Not just "safely." Sometimes there are performance considerations.

Some early implementations of TCP/IP ran as user processes.
They ran so slowly that they were moved into the kernel because
of that.


--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch


Posted by Jeroen Geilman on March 5th, 2004


Nick Landsberg wrote:

<nods to nit>

Erm.. yes - that's what I meant by "safely", as in: safe to remove from
the kernel without that bit losing performance, or especially because it
hogs and the kernel will run smoother without it.

I'm not aware of any other reasons for doing this, though.

Ah - this would be the reason the entire TCP/IP stack is in the kernel,
then ?
And also probably why the Windows TCP/IP "stacks" (I use the word
loosely) suck so badly...


--
Jeroen Geilman

Analog bits courtesy of adaptr.

Posted by Nick Landsberg on March 6th, 2004




Jeroen Geilman wrote:

[SNIP]

In a word, yes!

In many words ...

You do not want to have time-critical functions, such
as millisecond timeout values, relegated to user space
because the timing is NOT guaranteed due to the
vaguaries of the user-proc scheduler. (It IS deterministic,
so they say, but the normal human brain cannot comprehend
the determinism.)

In a Unix/Linux world, it appears to be non-deterministic, and
there are packets that may go unprocessed because a user level
TCP "stack" (and it ain't, and I'm glad you used the term
loosely), may not get scheduled in time to service the request.
Once you move it into the kernel, you can schedule the servicing
to be at the next clock tick.

In a Windows world, I have no definite idea of what happens, just
observations. For example, when you are using "Outlook" on a dial
up to access your email, you are effectively locked out from
doing anyting else on your machine while outlook is running,
even over a 56KB connection. One may presume that outlook
is setting it's priority such that other processes, other than
the window manager are effectivly locked out. My belief
is that NT (and 2000) have a fixed priority scheduler, and my
observations are consistent with a process being able
to increase its own priority at the expense of other processes.

Yes ... it sucks.

Nick


--
Ñ
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch



Similar Posts