In this exercise we will simulate LRU page replacement on a real reference
stream, and find the page-fault rate as a function of the size of the available
memory.
LRU is a well-known page replacement algorithm. The idea is that the memory pages are sorted according to the time of their last use, and whenever one has to be evicted in order to make space for another page, the least-recently used is selected.
In order to evaluate the effectiveness of this algorithm, we can simulate its operation on a real stream of memory accesses. For each reference in the stream, we check whether the referenced page is already "in memory", and if not, "insert" it into the memory and use LRU to decide what other page to "evict". Counting how many times we needed to do so, and dividing by the total number of accesses, gives the page fault rate.
In order to evaluate LRU, we want to find the page-fault rate as a function of the memory size: if more memory is available, fewer page-faults will happen. Doing so in the brute force way would mean to run a new simulation for each size. Luckily, there is a better way.
The idea is to run a single simulation. In this simulation we don't count page faults. Instead, we maintain an LRU stack. This is just a stack that keeps track of all the pages we have seen so far. Each new page (that is, a page that we have not seen before) is placed on the top of the stack. For a new page, a page fault should be counted no matter how large is our memory. But if the memory access is to a page we have already seen, we do two things:
In the end, what we have is the distribution of depths at which we have found the pages of the different references. Given this distribution, we can find the number of page faults that would have occurred for a memory of any size using LRU replacement: for a memory of size M pages, depths of <= M would be in memory, while new pages and pages with depths > M would cause page faults
In principle, we could find the page-fault rate as a function of the memory size for different replacement algorithms, and use this to compare them. But here we will only do it for LRU.
Your assignment is to write a program that reads the stream of memory accesses from a trace file, and simulates an LRU stack on it. You will then use this data to plot graphs and answer some questions.
The trace you will use comes from the Brigham Young University Trace Distribution Center (Note: this link is not always available. You do not need the link to do the exercise). These traces come in a binary format. Each entry in the file describes one memory access. An .h file is provided with the structure used to store the data. There is also an example program (in C) that parses such a trace and displays it in human readable form. Note that you do not need all the data in the trace, but only the address of each memory access. For the submitted results, use the file eon_rush_0.tr. It can be found in two places: ~os/www/Ex/ex4/eon_rush_0.tr if you are at huji, or you can download it (compressed) from ftp://mouselemur.cs.byu.edu/samples/PIII/linux/spec2000/cache/eon_rush_0.tr.gz if you are working at home. Note that this is 85MB of (uncompressed) data, so do not copy it to your home directory; instead, use it directly from its location in the ~os account using the following command in the shell:
"command line for your program" < ~os/www/Ex/ex4/eon_rush_0.tr
The "<" in this command causes the file to be written into the standard input (stdin) of the program. Thus, if your program reads correctly from stdin, this command will cause your program to process the input file.
Write a program, called lru, to simulate the LRU stack as described above (you may use any suitable data structure in order to implement the LRU stack operations), and use the data to compute the page fault rate for all possible memory sizes. Your program should accept the following command line:
lru pageSize startAt numAccesses
You may assume that there are no more than 100,000 distinct pages in the part of the stream that your program reads. However, if there are more distinct pages than this in the trace, you should stop reading the rest of the trace, print the results based on the data collected so far, and exit cleanly without crashing.
Your result executable should be called 'lru', and it should receive its input from the standard input (stdin). The output of the program is an ASCII file named lru.txt, with a line for each memory size from 0 pages up to (and including) the number of distinct pages that were accessed in the trace, where each line has the following format:memorySize pageFaultRate pageFaultRateWithoutFirst
You can print some output to stdout if you wish, but do not garbage the output, that is do not print more than 10 lines in total in one run.
If there are no problems, the program should return "0" when it finishes. Whenever there is a problem (with the input or with the running of the program), your program should fail gracefully, print to stderr what went wrong, and return "1".
As an example for the program's output file, here is the file lru.txt that the program should emit for the following command line:
lru 1024 0 1000 < ~os/www/Ex/ex4/eon_rush_0.tr
Make sure that your program outputs a file which is identical to our file for this trace (use "diff" to check this)! Points will be reduced if there is any difference between the files.
Use your program output to create a single graph of the page fault rate as a function of the memory size, for two page sizes: 32 and 4096. Use startAt=0 and numAccesses=1,000,000. For each page size, plot both the page fault rate and the page fault rate without the first fault. You can use excel or any other program to create the graph. Consider whether to use a logarithmic scale on any of the axes. Do not forget to label the axes and each data series.
Submit a tar file named ex4.tar on-line containing the following: