Ranking Techniques for Desktop Search
This project addresses the desktop search problem by
considering various
techniques for ranking results of a search query over the
file system. We consider:
- basic ranking techniques, which are based on
a single file feature (e.g., file name, file content, access date, etc.)
- learning-based ranking schemes
- a novel ranking technique, based on query selectiveness
The learning techniques have been shown to be
significantly more effective than the basic ranking methods. Our
ranking technique based on query selectiveness is effective
for use during the cold-start period of the system, even though it does not
involve any learning.
All our results are derived after a thorough user study and
post-mortem analysis of the user log files.
People:
Papers
Raw Data:
The raw data used for analysis is available here. The following fields are available:
- ResultID: Unique id given to each result file returned to each query.
- UserName: Unique id given to each user.
- QueryID: Unique id given to each query. Note that there may be
many results with the same QueryId, since a query may return many results.
- Hit: True or false, depending on whether the user chose this
particular result.
- ContentScore: Measures the similarity of the file content to the query.
- NameScore: Measures the similarity of the file name to the query.
- PathScore: Measures the similarity of the file path to the query.
- QueryLog: Measures the similarity of the previous queries for
which this file was chosen to the current query.
- AccessedDiff: Difference in time between the last access date of the
file and the time the query was issued.
- UpdatedDiff: Difference in time between the last update date of the
file and the time the query was issued.
- CreatedDiff: Difference in time between the create date of the
file and the time the query was issued.
- Size: Size of the file.
- Level: Distance of the file from the outermost directory.
- File Type: Type of file (i.e., file extension).
- DirRank: Directory-based ranking of the file.
- NormSize: Normalized size of the file. (Normalization is with
respect to other files of the same type.)
For more details about these fields, see our papers.