Virtual Memory
Demand Paging
virtual = in effect, if not in essence
virtual memory: still limited by swaps
Virtual Memory
entire process need not be in memory before it can execute
major advantage: programs larger than main memory can run (Fig. 9.1)
frees programmers from memory limitations
allows processes to share files easily and implement shared memory (Fig. 9.3)
easy mechanism for process creation (copy-on-write)
motivation
declaration of 100x100 array when typically only a 10x10 portion of it is used
in cases where entire program is needed, it is generally not needed all at the same time
benefits:
programmers no longer constrained by amount of main memory
higher degree of multiprogramming → higher CPU utilization and throughput, with no increase in response time or turnaround time
less I/O required to load or swap programs into memory → each program runs faster
more pages required only if the heap or stack grow (Fig. 9.2)
Demand Paging
same idea as pure paging, but now you only load a page on demand
this gives a higher degree of multiprogramming with only a little more overhead
use a lazy swapper (now called a pager) (Fig. 9.4)
initial load (Fig. 9.5)
concept of a page fault
where can a page fault occur (instruction or operand)?
in pure paging, a page fault represents a fatal error
in demand paging, a page fault means an additional page must be brought into memory and the instruction re-started
why must the instruction be restarted?
pure demand paging: bring no page(s) in initially; every process will page fault at least once
concept of locality of reference
Performance of Demand Paging
effective access time
probability of a page fault p
(1-p)*ma + p*page fault time
Page fault time
12 step process; see list on pp. 355-356
Fig. 9.6
service the page-fault interrupt
read in the page
restart the process
Which is the most time-expensive step?
#2 is the most time-costly; typically 8 milliseconds
AND waiting time in queue for device.
ma = 200ns
pft = 8ms
(1-p)*(200) + p*8,000,000 = 200 + 7,999,800p
effective access time is directly proportional to page-fault rate
if p is 1/1000, effective access time is 8.2 microseconds, a slow-down by a factor of 40 (!!!!) because of demand paging
if we want the slowdown to be less than 10%, we need p < 0.0000025
that is, fewer than 1 out of 399,990 accesses should fault
Copy-on-Write
can we do better than pure demand paging?
recall fork()? seem wasteful? why?
copy-on-write: both processes (parent and child) share all pages initially, and a shared page is only copied if and when either process writes to a shared page
not all shared pages need to be set to copy-on-write (e.g., pages containing the executable code)
Figs. 9.7 & 9.8
vfork(): virtual memory fork():
parent is suspended
child uses address space of the parent
vfork is intended to be used when the child calls exec immediately because no copying of pages takes place
extremely efficient method of process creation
sometimes used to implement command shells
References
[OSCJ]
A. Silberschatz, P.B. Galvin, and G. Gagne. Operating Systems Concepts with Java. John Wiley and Sons, Inc., Eighth edition, 2010.
Page Rreplacement Algorithms
is it possible to over-allocate memory?
page fault options
terminate process
swap out a process --- reduces the degree of multiprogramming
Basic Page Replacement
see Figs. 9.9 & 9.10
find the location of the desired page on disk
find a free frame:
if there is a free frame, use it
if there is no free frame, use a page-replacement algorithm to select a victim frame
write the victim frame to disk; change the page and frame tables accordingly
read the desired page into the newly freed frame; change the page and frame tables
restart the user process
If no frames are free, "two" pages transfers (one in and one out) are required, which doubles the page-fault service time.
This a lot of overhead without even mentioning the time spent waiting on the paging device.
Can reduce it by using a "dirty bit" (applies to read-only pages, such as program code). This scheme can reduce I/O time by 1/2 if the page has not been modified.
Two major issues with demand paging which must be addressed:
page-replacement algorithm: how to select a page to be replaced)
frame-allocation algorithm: how many frames to allocate to each process)
These are important problems because disk I/O is so expensive.
Page Faults
Let's start with page-replacement algorithms.
How to evaluate algorithms?
We want one which yields the lowest page-fault rate.
Generally, as #frames increase, #page faults should decrease (see Fig. 9.11)
Given a string of memory references (called a reference string), count the number of page faults:
0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103, 0104, 0101, 0610, 0102, 0103, 0104, 0101, 0609, 0102, 0105
Assuming 100 bytes/page, we have:
1, 4, 1, 6, 1, 6, 1 , 6, 1 , 6, 1
With 3 or more frames: ?
1 frame: ?
Page-Replacement Algorithms
first in, first out (FIFO)
optimal (OPT)
least recently used (LRU)
Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1
FIFO Page Replacement
run prior reference string with 3 frames of memory (Fig. 9.12): ?
easy to understand and implement
poor performance
initialization module
variable initialized early and used often
everything will work, but a bad replacement choice
increases the page fault rate and slows process execution
it does not result in incorrect execution
Belady's anomaly (Fig. 9.13): add more memory results in more page faults; consider reference string: 1,2,3,4,1,2,5,1,2,3,4,5 with 3 frames of memory and then 4
Optimal Page Replacement
replace the page which will not be used for the longest period of time (akin to A* search)
run prior reference string with 3 frames of memory (Fig. 9.14): ?
9 (OPT) vs. 15 (FIFO) or really 6 (OPT) vs. 12 (FIFO) (twice as good!)
has lowest page fault rate, and
never suffers from Belady's anomaly
problem? recall SJF?
mainly used for comparison purposes (e.g., algorithm X is within 12.3% of optimal in the worst case)
LRU Page Replacement
idea is to try to approximate the optimum
key difference between FIFO and OPT (other than looking backward vs. forward in time, resp.)
FIFO uses the time when a page was brought into memory
OPT uses the time when a page is to be used
use recent page as an approximation of future, then we can replace the page which "has not been used" for the longest period of time
run prior reference string with 3 frames of memory (Fig. 9.15): ?
associate with each page the time of its last use
does not suffer from Belady's anomaly
major issue is implementation (need hardware support):
counters
every time a reference is made, copy value of clock register into page table entry
requires a search of the page table and a write to memory (time of use) for each memory access
stack
when a reference is made, remove page number from stack and push onto stack
most recently used page is always on top
least recently used page is always on bottom
use doubly-linked-list with head and tail pointers to implement
no search for replacement page number
Fig. 9.16
LRU belongs to a class of page-replacement algorithms known as stack algorithms which never exhibit Belady's anomaly.
A stack algorithm is one for which it can be shown that the set of pages in memory for n frames is always a subset of the set of pages which would be in memory with n+1 frames.
Need hardware support. Is too slow to simulate in software.
LRU-Approximation Page Replacement
Often we have a reference bit associated with each entry in the page table. Can determine which pages have been used and not used by examining the reference bits, although we do not know the "order" of use.
Can we approximate order?
Additional-reference bits algorithm:
00000000
11111111
110001000 has been used more recently than one with 01110111
If we interpret these bytes as unsigned integers, the page with the lowest value is the LRU page.
Problem: all numbers not unique.
Solution: swap out all or use FIFO.
Clock Algorithms
(or second-chance page-replacement algorithms)
a FIFO replacement algorithm
number of bits used is 0; leaving only the reference bit
if 0, replace
if 1, give page a 2nd chance and move onto next FIFO page; reset reference bit to 0 and reset arrival time to current time
a page given a second chance will not be replaced until all other pages have been replaced (FIFO) or given second chances
Fig. 9.17
if a page is used often enough to keep its reference bit set, it will never be replaced
implementation: the clock algorithm using a circular queue
a pointer (hand on a clock) indicates which page is to be replaced next
when a frame is needed, the pointer advances until it finds a page with a 0 reference bit
as it advances, it clears the reference bits
once a victim page is found, the page is replaced, and the new page is inserted in the circular queue in that position
degenerates to FIFO replacement if all bits are set
Enhanced Second-chance Algorithm
by considering the reference bit and dirty bit as an ordered pair, we now have four classes:
(0,0): neither recently used nor modified --- best page to replace
(0,1): not recently used but modified --- not quite as good, because the page will need to be written out before replacement
(1,0): recently used, but clean --- probably will be used again soon
(1,1): recently used and modified --- probably will be used again soon, and the page will need to be written out to disk before it can be replaced
use same scheme as above, but rather than just checking if the reference bit is set to 1, we examine the class,
replace the first page encountered in the lowest non-empty class
may have to search circular queue several times before we find a page to be replaced
difference with above algorithm is that here we give preference to those pages which have been modified to reduce the number of I/Os required.
Further Information
counting-based page replacement algorithms
least frequently used (LFU) page-replacement algorithm
most frequently used (MFU) page-replacement algorithm
page-buffering algorithms
applications and page replacement
References
[OSCJ]
A. Silberschatz, P.B. Galvin, and G. Gagne. Operating Systems Concepts with Java. John Wiley and Sons, Inc., Eighth edition, 2010.
Allocation and Thrashing
When should we run the page replacement algorithm?
Two main questions in the study of virtual memory:
how do we replace pages?
how do we allocate frames?
Allocation of Frames
maintain 3 free frames at all times
consider a machine where all memory-reference instructions have only 1 memory address → need 2 frames of memory
now consider indirect modes of addressing
potentially every page in virtual memory could be touched, and the entire virtual memory must be in physical memory
place a limit the levels of indirection
the minimum number of frames per process is defined by the computer architecture
the maximum number of frames is defined by the amount of available physical memory
Allocation Algorithms
m frames, n processes
equal allocation: give each process m/n frames
alternative is to recognize that various processes will need different amounts of memory
consider
1k frame size
62 free frames
student process requiring: 10k
interactive database requiring: 127k
it makes no sense to give each process 31 frames
the student process needs no more than 10 so the additional 21 frames are wasted
proportional allocation:
m frames available
size of virtual memory for process pi is si
S = Sigma si
ai = (si/S) * m
student process gets 4 frames = (10/137)*62
database gets 57 frames = (127/137)*62
what about priority? use priority rather than relative size to determine ratio of frames
Global vs. Local Allocation
is page-replacement global or local?
global: one process can select a replacement frame from the set of all frames (i.e., one process can take a frame from another or itself) (e.g., high priority processes can take the frames of low priority processes)
local: each process can only select from its own set of allocated frames
local page replacement is more predictable;
depends on no external factors
a process which uses global page replacement cannot predict the page fault rate;
may execute in 0.5 seconds once and 10.3 on another run
overall, global replacement results in greater system throughput
Thrashing
simple thrashing: 1 process of 2 pages only allocated 1 frame
high page activity is called thrashing
a process is thrashing if it spends more time paging than executing
scenario:
The process scheduler see that CPU utilization is low.
So we increase the degree of multiprogramming by introducing a new process into the system.
One process now needs more frames.
It starts faulting and takes away frames from other processes. (i.e., global-page replacement).
These processes need those pages and thus they start to fault, taking frames from other processes.
These faulting processes must use the paging device to swap pages in and out.
As they queue up on the paging device, the ready-queue empties.
However, as processes wait on the paging device, CPU utilization decreases.
The process scheduler sees the decreasing CPU utilization and increases the degree of multiprogramming, ad infinitum.
The result is thrashing, page-fault rates increase tremendously, effective memory access time increases, no work is getting done, because all the processes are spending their time paging.
Fig. 9.18
we can limit the effects of thrashing by using a local replacement algorithm (or priority replacement algorithm)
but this only partially solves the problem
If one process starts thrashing, it cannot steal frames of another process and cause the later to thrash.
However, the thrashing processes will be in the paging device queue which will increase the time for a page fault to be serviced and, therefore, the effective access time will increase even for those processes not thrashing.
to really prevent thrashing, we must provide processes with as many frames as they need.
But how do we know how many frames a process "needs"?
Look at how many frames a process actually "uses".
Locality Model
as a process executes, it moves from locality to locality
a locality is a set of pages used together. A program is generally composed of several localities which may overlap.
for instance, when a function is called, it defines a new locality
the locality model is the basic unstated assumption behind cache; if accesses to any types of data were random rather than patterned, cache would be useless.
if we allocate enough for a process to accommodate its current locality, it will fault for the pages in its locality until all of these pages are in memory. Then it will not fault again until it changes locality.
if we allocate fewer frames than the size of the current locality, the process will thrash since it cannot keep in memory all the pages which it is actively uses.
Working Set Model
based on the assumption of locality
Δ defines the working-set window: some # of memory references
examine the most recent Δ page references
the set of pages in the most recent Δ is the working set or an approximation of the program's locality.
Fig. 9.20
the accuracy of the working set depends on the selection of Δ
if Δ is too small, it will not encompass the entire locality
if Δ is too large, it may overlap several localities
if Δ is &infinity;, the working set is the set of all pages touched during process execution
WSSi is working set size for process pi
D = Σ WSSi, where D is the total Demand from frames
if D > m, then thrashing will occur, because some processes will not have enough frames
Using the working-set strategy is simple:
the OS monitors the working set of each process
and allocates to that working set enough frames to provide it with its working-set size
if there are enough extra frames, a new process can be initiated
if the sum of the working set sizes increases, exceeding the total number of available frames, the OS selects a process to suspend
the working set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible and optimizes CPU utilization.
Page Fault Frequency
working set model is successful but keeping track of the working set can become complex
using page-fault frequency (PFF) is a more direct approach to prevent thrashing
if PFF is too high, we know the process needs more frames
if PFF is too low, then we know the process has too many frames
Fig. 9.21
References
[OSCJ]
A. Silberschatz, P.B. Galvin, and G. Gagne. Operating Systems Concepts with Java. John Wiley and Sons, Inc., Eighth edition, 2010.