Exercise 4

How File System Structure Affects Performance

Subject: File System Structure.

Due Date: 20.5.2010

Assignment

In this exercise you will examine the effect of file system structure on performance, by comparing the iNode structure, used in UNIX-like systems, and the linked-list structure, used in the FAT32 file system. For this sake you will implement library functions that calculate the used space and the average number of accesses to disk in a typical file system, using data on file sizes collected from a real computer system.

You will then use your implementation to plot graphs that show how these parameters change when you change the block size that the file system employs, and compare the performance of the two file systems.

The Library

You will implement a C++ library with one function. The public interface of the library is ~os/lib/fileperf.hh. You should include this header file from its original location, and implement the function:

int fileSystemPerformance(unsigned int blockSize, unsigned int pointerSize, unsigned int maximalSize, storageType type, const std::vector fileSizes, outputStruct *output);

This function receives as input parameters that determine the storage method of files on disk, and a list of files that are currently in the file system. The function returns (using the 'output' parameter) the space used by the file system and the average number of disk accesses to access data on this file system. In the following, all sizes are in bytes.

Input Parameters: Return Value:

The return value of the function should be 0 upon success, and -1 if any of the parameters to the functions are illegal, or if any of the files of the given fileSizes cannot be stored according to the given parameters. A file of size zero (an empty file) is legal.

Output:

The function returns its output using the pointer 'output', which is the last parameter of this function. This pointer points to a struct defined as follows:

struct outputStruct {
   long unsigned int space;
   double averageAccessNum;
};

When the function returns, the fields of the 'output' structure should be filled as follows:

Handling illegal input
The function should return -1 for any illegal input.

Linked List Structure

The linked list structure for each file is as follows: The amount of file data in each block is the maximal possible amount, based on the block size and the other fields in that block. For the sake of this exercise, we will assume the fixed-size header has zero size.

Example

Suppose that the block size = 30, pointer size = 2, and the file size is 46. Then storing the file will cost two blocks: The space used by the file is therefore 30*2 = 60. The number of block accesses to bytes in the file will be 1 for the first 28 bytes, and 2 for the last 18 bytes of the file. Therefore the average number of block acceses for this file is (28*1+18*2)/46 = 1.39... .

iNode Structure

The iNode structure is similar to the one described in the Tirgul class:

iNodes

There are three types of blocks: For the sake of this exercise, we will assume the fixed-size header has zero size.

Example

Suppose that the block size = 12, the pointer size = 4, the maximal file size is 52, and we need to store a single file of size 30.

First we need to decide how many direct and indirect pointers there should be in a header block in this file system. If we use zero indirect pointers, then we will have room for 12/4 = 3 direct pointers in the header block. This allows us to use a maximum of 3 data blocks for a file, so the maximal file size would be 36. Therefore we need more than zero indirect pointers. Using one indirect pointer will be enough to store a file of size 52, therefore we will have in each header block one level 1 indirect pointer, and 2 direct pointers.

To store a file of size 30, we will need 3 data blocks. Therefore we will use the two direct pointers in the header block, and the indirect level 1 pointer as well. The indirect pointer will point to a block of direct pointers, and one of them will point to the third data block. Therefore we use 5 blocks for this file: the header block, one block of direct pointers, and three data blocks, so the space we spend for this file is 12*5 = 60 bytes.

The average number of block accesses of this file is calculated as follows: To access bytes 1-24 of the file we need to access the header block and the relevant data block, so we have 2 block accesses. To access bytes 25-30 we need to access the header block, the direct pointers block and the data block, so there are 3 block accesses. Therefore, the average access number for this file is (24*2+6*3)/30 = 2.2.

Graphs

After implementing the library function above, you will use this function to generate graphs that show the space and the average access number that will be used for a specific list of file sizes, as a function of the block size.

The file sizes are given in the file ~os/lib/fileSizes.txt. This file lists sizes of files in the recoded computer system.

Generate, using the library function you implemented, two graphs:

  1. The used space as a function of the block size.
  2. The average number of accesses as a function of the block size.
Each graph should have two lines - one for a linked list and the other for an iNode structure. Assume a pointer size of 4, and a maximal file size of 10^8. You should decide on the block sizes to plot: Make sure the range is large enough to show all the interesting phenomena. You can use either normal scale or log scale for each of the axes in each of the graphs. Do not forget to annotate your graphs: state the meaning of each line, the meaning and units of each axis, and whether it is normal scale or log scale. The data for the graphs should be generated by a program you will write, that uses your library function. You may use any graph software to plot the resulting graphs.

You do not need to submit the program that generated the graphs. You will submit only the text files, containing the data which your program generated. The text file for each graph should have three columns, separated by spaces, where each line contains: block size, value for linked list, and value for inode structure. The lines should be sorted by block size.

Questions

Answer the following questions regarding the graphs that you generated:
  1. Describe, in general, the difference in the graphs of space usage between the two storage methods.
  2. If you had to choose which storage method to use based only on the space usage, which method would you choose and why?
  3. Describe, in general, the difference in the graphs of average access number between the two storage methods.
  4. If you had to choose which storage method to use based on both the space usage and the average access number, which method would you choose and why?

Submission

Submit tar file named ex4.tar containing:

  1. README file built according to the course guidelines, that includes answers to the questions above..
  2. Source code for the library (not including fileperf.hh)
  3. Makefile - your makefile should generate a static library file named: libfileperf.a when running 'make' with no arguments.
  4. Images of the two graphs, named space.jpg for the space graph and access.jpg for the second graph.
  5. Text files named space.txt and access.txt, containing the data that you used to generate the graphs.
Do not submit the program that you wrote for generating the graph data.