Virtual Memory

Demand Paging

virtual = in effect, if not in essence

virtual memory: still limited by swaps

Virtual Memory

- entire process need not be in memory before it can execute
- major advantage: programs larger than main memory can run (Fig. 9.1)
- frees programmers from memory limitations
- allows processes to share files easily and implement shared memory (Fig. 9.3)
- easy mechanism for process creation (copy-on-write)
- motivation
  - - declaration of 100x100 array when typically only a 10x10 portion of it is used
    - in cases where entire program is needed, it is generally not needed all at the same time
- benefits:
  - - programmers no longer constrained by amount of main memory
    - higher degree of multiprogramming → higher CPU utilization and throughput, with no increase in response time or turnaround time
    - less I/O required to load or swap programs into memory → each program runs faster
    - more pages required only if the heap or stack grow (Fig. 9.2)

Demand Paging

- same idea as pure paging, but now you only load a page on demand
- this gives a higher degree of multiprogramming with only a little more overhead
- use a lazy swapper (now called a pager) (Fig. 9.4)
- initial load (Fig. 9.5)
- concept of a page fault
- where can a page fault occur (instruction or operand)?
- in pure paging, a page fault represents a fatal error
- in demand paging, a page fault means an additional page must be brought into memory and the instruction re-started
- why must the instruction be restarted?
- pure demand paging: bring no page(s) in initially; every process will page fault at least once
- concept of locality of reference

Performance of Demand Paging

effective access time
- probability of a page fault p
- (1-p)*ma + p*page fault time

Page fault time

- 12 step process; see list on pp. 355-356
- Fig. 9.6

1. service the page-fault interrupt
2. read in the page
3. restart the process

Which is the most time-expensive step?
- #2 is the most time-costly; typically 8 milliseconds
- AND waiting time in queue for device.
ma = 200ns
pft = 8ms
(1-p)*(200) + p*8,000,000 = 200 + 7,999,800p
effective access time is directly proportional to page-fault rate
if p is 1/1000, effective access time is 8.2 microseconds, a slow-down by a factor of 40 (!!!!) because of demand paging
if we want the slowdown to be less than 10%, we need p < 0.0000025
that is, fewer than 1 out of 399,990 accesses should fault

Copy-on-Write

- can we do better than pure demand paging?
- recall fork()? seem wasteful? why?
- copy-on-write: both processes (parent and child) share all pages initially, and a shared page is only copied if and when either process writes to a shared page
- not all shared pages need to be set to copy-on-write (e.g., pages containing the executable code)
- Figs. 9.7 & 9.8
- vfork(): virtual memory fork():
  - - parent is suspended
    - child uses address space of the parent
    - vfork is intended to be used when the child calls exec immediately because no copying of pages takes place
    - extremely efficient method of process creation
    - sometimes used to implement command shells

References

- - - - [OSCJ]

- - - - A. Silberschatz, P.B. Galvin, and G. Gagne. Operating Systems Concepts with Java. John Wiley and Sons, Inc., Eighth edition, 2010.

Page Rreplacement Algorithms

- is it possible to over-allocate memory?
- page fault options
  - - terminate process
    - swap out a process --- reduces the degree of multiprogramming

Basic Page Replacement

see Figs. 9.9 & 9.10

1. find the location of the desired page on disk
2. find a free frame:
  1. 1. if there is a free frame, use it
    2. if there is no free frame, use a page-replacement algorithm to select a victim frame
    3. write the victim frame to disk; change the page and frame tables accordingly
3. read the desired page into the newly freed frame; change the page and frame tables
4. restart the user process
5. If no frames are free, "two" pages transfers (one in and one out) are required, which doubles the page-fault service time.
6. This a lot of overhead without even mentioning the time spent waiting on the paging device.
7. Can reduce it by using a "dirty bit" (applies to read-only pages, such as program code). This scheme can reduce I/O time by 1/2 if the page has not been modified.
8. Two major issues with demand paging which must be addressed:

- - page-replacement algorithm: how to select a page to be replaced)
  - frame-allocation algorithm: how many frames to allocate to each process)

These are important problems because disk I/O is so expensive.

Page Faults

Let's start with page-replacement algorithms.

How to evaluate algorithms?

We want one which yields the lowest page-fault rate.

Generally, as #frames increase, #page faults should decrease (see Fig. 9.11)

Given a string of memory references (called a reference string), count the number of page faults:

0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103, 0104, 0101, 0610, 0102, 0103, 0104, 0101, 0609, 0102, 0105

Assuming 100 bytes/page, we have:

1, 4, 1, 6, 1, 6, 1 , 6, 1 , 6, 1

With 3 or more frames: ?

1 frame: ?

Page-Replacement Algorithms

- first in, first out (FIFO)
- optimal (OPT)
- least recently used (LRU)

Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1

FIFO Page Replacement

- run prior reference string with 3 frames of memory (Fig. 9.12): ?
- easy to understand and implement
- poor performance
  - - initialization module
    - variable initialized early and used often
- everything will work, but a bad replacement choice
- increases the page fault rate and slows process execution
- it does not result in incorrect execution
- Belady's anomaly (Fig. 9.13): add more memory results in more page faults; consider reference string: 1,2,3,4,1,2,5,1,2,3,4,5 with 3 frames of memory and then 4

Optimal Page Replacement

- replace the page which will not be used for the longest period of time (akin to A* search)
- run prior reference string with 3 frames of memory (Fig. 9.14): ?
- 9 (OPT) vs. 15 (FIFO) or really 6 (OPT) vs. 12 (FIFO) (twice as good!)
- has lowest page fault rate, and
- never suffers from Belady's anomaly
- problem? recall SJF?
- mainly used for comparison purposes (e.g., algorithm X is within 12.3% of optimal in the worst case)

LRU Page Replacement

- idea is to try to approximate the optimum
- key difference between FIFO and OPT (other than looking backward vs. forward in time, resp.)
  - - FIFO uses the time when a page was brought into memory
    - OPT uses the time when a page is to be used
- use recent page as an approximation of future, then we can replace the page which "has not been used" for the longest period of time
- run prior reference string with 3 frames of memory (Fig. 9.15): ?
- associate with each page the time of its last use
- does not suffer from Belady's anomaly
- major issue is implementation (need hardware support):
  - - counters
      - every time a reference is made, copy value of clock register into page table entry
        requires a search of the page table and a write to memory (time of use) for each memory access
    - stack
      - when a reference is made, remove page number from stack and push onto stack
        most recently used page is always on top
        least recently used page is always on bottom
        use doubly-linked-list with head and tail pointers to implement
        no search for replacement page number
        Fig. 9.16
- LRU belongs to a class of page-replacement algorithms known as stack algorithms which never exhibit Belady's anomaly.
- A stack algorithm is one for which it can be shown that the set of pages in memory for n frames is always a subset of the set of pages which would be in memory with n+1 frames.
- Need hardware support. Is too slow to simulate in software.

LRU-Approximation Page Replacement

- Often we have a reference bit associated with each entry in the page table. Can determine which pages have been used and not used by examining the reference bits, although we do not know the "order" of use.
- Can we approximate order?
- Additional-reference bits algorithm:
- 00000000
- 11111111
- 110001000 has been used more recently than one with 01110111
- If we interpret these bytes as unsigned integers, the page with the lowest value is the LRU page.
- Problem: all numbers not unique.
- Solution: swap out all or use FIFO.

Clock Algorithms

(or second-chance page-replacement algorithms)

- a FIFO replacement algorithm
- number of bits used is 0; leaving only the reference bit
- if 0, replace
- if 1, give page a 2nd chance and move onto next FIFO page; reset reference bit to 0 and reset arrival time to current time
- a page given a second chance will not be replaced until all other pages have been replaced (FIFO) or given second chances
- Fig. 9.17
- if a page is used often enough to keep its reference bit set, it will never be replaced
- implementation: the clock algorithm using a circular queue
  - - a pointer (hand on a clock) indicates which page is to be replaced next
    - when a frame is needed, the pointer advances until it finds a page with a 0 reference bit
    - as it advances, it clears the reference bits
    - once a victim page is found, the page is replaced, and the new page is inserted in the circular queue in that position
    - degenerates to FIFO replacement if all bits are set

Enhanced Second-chance Algorithm

- by considering the reference bit and dirty bit as an ordered pair, we now have four classes:
  - - (0,0): neither recently used nor modified --- best page to replace
    - (0,1): not recently used but modified --- not quite as good, because the page will need to be written out before replacement
    - (1,0): recently used, but clean --- probably will be used again soon
    - (1,1): recently used and modified --- probably will be used again soon, and the page will need to be written out to disk before it can be replaced
- use same scheme as above, but rather than just checking if the reference bit is set to 1, we examine the class,
- replace the first page encountered in the lowest non-empty class
- may have to search circular queue several times before we find a page to be replaced
- difference with above algorithm is that here we give preference to those pages which have been modified to reduce the number of I/Os required.

Further Information

- counting-based page replacement algorithms
  - - least frequently used (LFU) page-replacement algorithm
    - most frequently used (MFU) page-replacement algorithm
- page-buffering algorithms
- applications and page replacement

References

- - - - [OSCJ]

- - - - A. Silberschatz, P.B. Galvin, and G. Gagne. Operating Systems Concepts with Java. John Wiley and Sons, Inc., Eighth edition, 2010.

Allocation and Thrashing

- When should we run the page replacement algorithm?
- Two main questions in the study of virtual memory:
  - how do we replace pages?
  - how do we allocate frames?

Allocation of Frames

- maintain 3 free frames at all times
- consider a machine where all memory-reference instructions have only 1 memory address → need 2 frames of memory
  - - now consider indirect modes of addressing
    - potentially every page in virtual memory could be touched, and the entire virtual memory must be in physical memory
    - place a limit the levels of indirection
- the minimum number of frames per process is defined by the computer architecture
- the maximum number of frames is defined by the amount of available physical memory

Allocation Algorithms

- m frames, n processes
- equal allocation: give each process m/n frames
- alternative is to recognize that various processes will need different amounts of memory
  - - consider
      - 1k frame size
        62 free frames
        student process requiring: 10k
        interactive database requiring: 127k

- - - - it makes no sense to give each process 31 frames
      - the student process needs no more than 10 so the additional 21 frames are wasted

- proportional allocation:
  - - m frames available
    - size of virtual memory for process p_i is s_i
    - S = Sigma s_i
    - a_i = (s_i/S) * m

- - student process gets 4 frames = (10/137)*62
  - database gets 57 frames = (127/137)*62

- what about priority? use priority rather than relative size to determine ratio of frames

Global vs. Local Allocation

- is page-replacement global or local?
- global: one process can select a replacement frame from the set of all frames (i.e., one process can take a frame from another or itself) (e.g., high priority processes can take the frames of low priority processes)
- local: each process can only select from its own set of allocated frames
- local page replacement is more predictable;
- depends on no external factors
- a process which uses global page replacement cannot predict the page fault rate;
- may execute in 0.5 seconds once and 10.3 on another run
- overall, global replacement results in greater system throughput

Thrashing

- simple thrashing: 1 process of 2 pages only allocated 1 frame
- high page activity is called thrashing
- a process is thrashing if it spends more time paging than executing
- scenario:
  - The process scheduler see that CPU utilization is low.
  - So we increase the degree of multiprogramming by introducing a new process into the system.
  - One process now needs more frames.
  - It starts faulting and takes away frames from other processes. (i.e., global-page replacement).
  - These processes need those pages and thus they start to fault, taking frames from other processes.
  - These faulting processes must use the paging device to swap pages in and out.
  - As they queue up on the paging device, the ready-queue empties.
  - However, as processes wait on the paging device, CPU utilization decreases.
  - The process scheduler sees the decreasing CPU utilization and increases the degree of multiprogramming, ad infinitum.
  - The result is thrashing, page-fault rates increase tremendously, effective memory access time increases, no work is getting done, because all the processes are spending their time paging.
- Fig. 9.18
- we can limit the effects of thrashing by using a local replacement algorithm (or priority replacement algorithm)
  - - but this only partially solves the problem
    - If one process starts thrashing, it cannot steal frames of another process and cause the later to thrash.
    - However, the thrashing processes will be in the paging device queue which will increase the time for a page fault to be serviced and, therefore, the effective access time will increase even for those processes not thrashing.
- to really prevent thrashing, we must provide processes with as many frames as they need.
  - - But how do we know how many frames a process "needs"?
    - Look at how many frames a process actually "uses".

Locality Model

- as a process executes, it moves from locality to locality
- a locality is a set of pages used together. A program is generally composed of several localities which may overlap.
- for instance, when a function is called, it defines a new locality
- the locality model is the basic unstated assumption behind cache; if accesses to any types of data were random rather than patterned, cache would be useless.
- if we allocate enough for a process to accommodate its current locality, it will fault for the pages in its locality until all of these pages are in memory. Then it will not fault again until it changes locality.
- if we allocate fewer frames than the size of the current locality, the process will thrash since it cannot keep in memory all the pages which it is actively uses.

Working Set Model

- based on the assumption of locality
- Δ defines the working-set window: some # of memory references
- examine the most recent Δ page references
- the set of pages in the most recent Δ is the working set or an approximation of the program's locality.
- Fig. 9.20
- the accuracy of the working set depends on the selection of Δ
- if Δ is too small, it will not encompass the entire locality
- if Δ is too large, it may overlap several localities
- if Δ is &infinity;, the working set is the set of all pages touched during process execution
- WSS_i is working set size for process p_i
- D = Σ WSS_i, where D is the total Demand from frames
- if D > m, then thrashing will occur, because some processes will not have enough frames

Using the working-set strategy is simple:
- the OS monitors the working set of each process
- and allocates to that working set enough frames to provide it with its working-set size
- if there are enough extra frames, a new process can be initiated
- if the sum of the working set sizes increases, exceeding the total number of available frames, the OS selects a process to suspend
- the working set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible and optimizes CPU utilization.

Page Fault Frequency

- working set model is successful but keeping track of the working set can become complex
- using page-fault frequency (PFF) is a more direct approach to prevent thrashing
- if PFF is too high, we know the process needs more frames
- if PFF is too low, then we know the process has too many frames
- Fig. 9.21

References

- - - - [OSCJ]

- - - - A. Silberschatz, P.B. Galvin, and G. Gagne. Operating Systems Concepts with Java. John Wiley and Sons, Inc., Eighth edition, 2010.