|
This research was supported by the National Science Foundation under grant no. IRI-9619915 Background There has been no fundamental change in the dynamic indexing methods supporting database systems since the invention of the B-tree twenty-five years ago. And yet the whole classical approach to dynamic database indexing has long since become inappropriate and increasingly inadequate. Index structures designed for a one-dimensional world of fixed-structure text and numbers are incapable of supporting today's multi-dimensional world of variable structures, objects and images, in space and time. But even before leaving the confines of conventional (i.e. relational) database indexing, the situation is highly unsatisfactory. In fact, our research has led us to question the basic assumptions of conventional database indexing. The concept of one-dimensional primary and secondary indexes, tuned to a particular query pattern, emerged in the days of menu-driven applications. This concept has not evolved in any way to match the flexibility of database query languages, and results in performance characteristics which, from the user's point of view, appear incomprehensible and arbitrary. We believe that:
The original inventors of the B-tree [BM72] already had the next step in mind: an index which would generalize the properties of the B-tree to n dimensions i.e. an index on n attributes of a record instead of one. Ideally, such an index should have the property that, if values are specified for m out of n key attributes (a partial match query), then the time taken to find all the records matching this combination should be the same, whichever combination of m from n is chosen. The attraction of such an index is that it narrows the search space evenly over all the indexed attributes, and so behaves as if there were an index on each attribute - while avoiding all the update complication of secondary indexes. To achieve this, the index must be symmetrical in n dimensions. There is no longer a directly defined ordering between the individual records according to their (single key) attribute values. Each record must be viewed as a point in an n-dimensional data space, which is the Cartesian product of the domains of the n index attributes. An n-dimensional generalization of the B-tree must recursively partition this data space into sub-spaces or regions in such a way that the properties of the B-tree are preserved, as far as is topologically possible. Specifically, the number of nodes encountered in an exact-match search of the tree (assuming no duplicates), or a single update, must be logarithmic in the total data occupancy of the tree; and the occupancy of each data or index region must not fall below a fixed minimum. Researchers tried in vain for over twenty years to find a solution of the n-dimensional B-tree problem. McCreight (co-inventor of the B-tree) called it "one of the persistent puzzles of computer science" [McC85]. The problem stems from the unique combination of properties of the B-tree. Although the B-tree is so familiar that it may be difficult to conceive of anything better, it is by no means perfect - many alternative structures perform better under certain circumstances. But none have succeeded in improving on all of its worst-case characteristics at the same time, while maintaining its logarithmic scalability. This explains why it continues to be universally accepted as the dynamic indexing method of choice in every kind of database system. The challenge has therefore been to extend the B-tree to n dimensions while preserving all of its worst-case characteristics. But a solution has proved extremely elusive. Alternative approaches of every kind have been proposed (see, for example, [Sam89, KSS+90]), but all have been vulnerable to pathological cases. In the past this was largely an academic issue. But now it is becoming a matter of life and death. Large databases control mission-critical and non-stop applications of every kind - from fly-by-wire aircraft control systems to electricity supply load balancing. There is no room any more for systems which work very well most of the time. They must be completely predictable all the time. This consideration provides justification enough for commercial database companies to cling to the B-tree. It may be inflexible, but it has the essential qualities of being totally predictable and fully dynamic. Whatever advantages an alternative may have, it must demonstrate these same qualities before it has any hope of replacing the B-tree. For these reasons, we believe that our general solution of the n-dimensional B-tree problem [Fre95, Fre97] is a major breakthrough in indexing techniques. Above all, it provides a firm foundation for the future development of well-behaved multi-dimensional indexing methods. Principles of a generic indexing technology As a result of ten years research in this area, we have come to appreciate the fundamental importance of a solution of the n-dimensional B-tree problem. This is essentially because any data structure which can be expressed as a tree can be represented as a point in an n-dimensional data space at the deepest level of a set of recursively nested data spaces i.e. data spaces containing points which represent data spaces at the next deeper level of recursion. For example, conventional relational n-tuples can be represented as points in a single, n-dimensional data space. But if the relation contains a nested attribute, then the discriminator which distinguishes between instances of the nested attribute is the index key to the data space of nested attribute instances. This observation has been the basis of our long-standing hypothesis that an indexing system could be developed of sufficiently wide applicability to deserve being called generic. It has been the prime motivation for our attempts to develop a multi-dimensional index technique based on the following principles:
Principle six is not satisfied by a B-tree, which recursively partitions the points in the data space, not the data space itself. This choice may not be immediately obvious, since there are simple and effective techniques for mapping an n-dimensional point on to a linear order - using, for example, Z (or Morton) order [Ore86]. Then an ordinary B-tree can be used, and the worst-case characteristics of the B-tree are automatically inherited. There is the additional practical advantage that it can be immediately applied in any system which supports B-trees. But this approach is too restricted for a generic technology, which must be able to efficiently access extended spatial objects as well as points. The rectangular cover of an n-dimensional spatial object can always be mapped to a 2n-dimensional point, but neighborhood relationships are lost in such a transformation. An alternative approach [Ore86] resolves the object cover into a set of subspaces, each represented by a unique (e.g. Morton) code. This allows a recursive linear partitioning of these subspaces, in the same way as for points in a conventional B-tree. But the method violates principle three because, in general, the update of a single object cover requires the update of all its constituent parts. This introduces the uncontrollable access and update characteristics we want to avoid. (Other well-known methods such as the R-tree and the R+-tree suffer from similar problems). There are two other major reasons for the decision to partition the data space itself, rather than the objects within it. The first is that it is otherwise not possible to contract the representation of a sparsely occupied dataspace to a set of occupied subspaces. Comparative studies by [KSS+90] have clearly shown this to be a very significant factor in the efficiency of spatial range queries. The second is that, in a data space representation, each of the subspaces in the recursive sequence enclosing a point or spatial object can be defined by a key which is a prefix of that of the next subspace in the sequence. This is the motivation for principle seven: by adopting regular (or strict) binary partitioning of the data space, each subspace can be assigned an extremely compact binary key with this prefix property. In fact, a much more compact representation than this is possible, since each subspace can be uniquely represented by a key which defines it relative to the subspace which encloses it at the next higher recursion level. It is therefore possible to map the recursive partitioning of the data space to a tree-structured index in which:
In fact, a mapping between a quad-tree and a B-tree has been proposed [DSS91]. But the spatial proximity relationships of the components of the quad-tree are lost in the mapping from two dimensions to one. To preserve these relationships, a mapping is needed from a quad-tree to a two-dimensional structure with the properties of a ``grow and post'' tree. The problem lies not with the mapping, but in devising the two-dimensional ``grow and post'' tree. Once this problem is solved, it becomes possible to support large-scale, persistent quad-trees, and the very large body of computational techniques now associated with them, within a generic, multi-dimensional indexing framework. The adoption of regular binary partitioning of the data space also brings a particularly attractive feature of the quad-tree to the whole supporting framework: the precise registration (coincidence) of data space partition boundaries in overlay and spatial join operations - notably in GIS (Geographic Information System) applications. This is extremely important for the efficient and accurate execution of such operations. And since this property is a feature of the whole framework, it applies equally to raster and vector representations of spatial data, and conversions between the two. It has sometimes been claimed that regular binary partitioning leads to the inclusion of too much empty space in the representation of skewed data distributions, particularly in the direct representation of extended spatial objects. We have found this not to be the case [Fre92]. The basic reason for this is that, using a recursive partitioning scheme in a dynamic environment, the positions of the partitions at any level are moved very rarely compared to those at the level below. They are therefore effectively fixed, and in a dynamic environment their position becomes almost arbitrary. What is much more important is that, as already emphasized, it is possible to contract the representation to a set of occupied subspaces. It is also important that regular binary partitioning does not necessarily imply a strict cyclic order in the choice of dimension/domain for the partitions. A consequence of properties (d) and (e) is that, in contrast to conventional indexing methods, the binary key for an indexed object can be generated dynamically during the descent of the tree. Its value and length will no longer depend solely on the indexed value of the object - it will also depends on its degree of similarity to other objects. When a leaf node overflows and splits, the keys of the two resulting nodes will be extended just as far as is necessary to differentiate them. This incremental approach to index key generation opens the way to the indexing of data entities with different structures in the same index. In conventional indexing, it is assumed that all the elements in an indexed data set have the same structure. In a relational database, this structure is stored in the schema, and is used to interpret the contents of the data elements in the corresponding relation. New data models, however - specifically nested relational, object-oriented and logic - all allow structural variation within the elements of a relation, class or predicate respectively. In order to apply conventional indexing techniques to nested relations or objects, their structures must be resolved into simpler, fixed structures connected by physical or logical links. There is as yet no mechanism for a single index on the whole nested tuple or complex object instance. However, it is always possible to generate a unique key from a (unique) instance of any complex structure, by including the structural information as well as the value information in the key. Then property (d) makes it possible to index very complex structures efficiently without in general using correspondingly long keys. Further, a consequence of property (f) is that, if a set of points or objects in the data space generates a set of keys which are all different within the first few bits, then only these very short keys need be stored in the index. If, on the other hand, all the keys share a long, identical prefix, then this prefix will migrate to a single instance in the root of the index tree. Either way, the index representation becomes extremely compact. References [Ben79] J. Bentley. Multidimensional Binary Search Trees in Database Applications. IEEE Trans. o n Soft. Eng., Vol.SE-5, No.4, July 1979. [BM72] R. Bayer, E. McCreight. Organisation and maintenance of large ordered indexes. Acta Informatica, Vol.1, No.3, 1972. [BU77] R. Bayer, K. Unterauer. Prefix B-trees. ACM Trans. on Database Systems, Vol.2, No.1, March 1977. [Coh90] J. Cohen. Constraint Logic Programming Languages. Communications of the ACM, Vol.33, No.7, July 1990. [DSS91] W. de Jonge, P. Scheuermann, A. Schijf. Encoding and Manipulating Pictorial Data with S+-trees. 2nd Symposium on Large Spatial Databases, Zurich, August 1991. [EM90] J. Eliot, B. Moss. Design of the Mneme Persistent Object Store. ACM Trans. on Information Systems, Vol.8, No.2, April 1990. [Fre87] M. Freeston. The BANG file: a new kind of grid file. ACM SIGMOD Conf., San Francisco, May 1987. [Fre89a] M. Freeston. Advances in the Design of the BANG File. 3rd International Conference on Foundations of Data Organisation and Algorithms (FODO), Paris, June 1989. [Fre89b] M. Freeston. A well-behaved file structure for the storage of spatial objects. Symposium on the Design and Implementation of Large Spatial Databases, Santa Barbara, CA, July 1989. [Fre92] M. Freeston. The comparative performance of BANG indexing for spatial objects. 5th International Symposium on Spatial Data Handling, Charleston SC, August 1992. [Fre94] M. Freeston. A General Solution of the n-dimensional B-tree Problem. ECRC Technical Report no. ECRC-94-40. [Fre95] M. Freeston. A General Solution of the n-dimensional B-tree Problem. ACM SIGMOD Conf., San Jose, CA, May 1995. [Gut84] A. Guttman. R-trees: a dynamic index structure for spatial searching. ACM SIGMOD Conf., Boston, 1984. [HSW89] A. Henrich, H.-W. Six, P. Widmayer. The LSD-tree: Spatial Access to Multidimensional Point and Non-point Objects. 15th Int. Conf. on Very Large Data Bases (VLDB), Amsterdam, 1989. [KKR90] P. Kanellakis, G. Kuper, P. Revesz. Constraint Query Languages. 9th ACM PODS Conf., 1990. [KL80] H. T. Kung and P. L. Lehman. Concurrent Manipulation of Binary Search Trees. ACM Transactions on Database Systems, 5(3):339--353, October 1980. [KRV+92] P. Kanellakis, S. Ramaswamy, D. Vengroff, J. Vitter. Indexing for Data Models with Constraints and Classes. [KW94] B. Kröll and P. Widmayer. Distributing a Search Tree Among a Growing Number of Processors. Proc. ACM SIGMOD Conf., May 1994. [KSS+90] H.P. Kriegel, M. Schiwietz, R. Schneider and B. Seeger. A Performance Comparison of Multidimensional Point and Spatial Access Methods. 1st Symposium on the Design of Large Spatial Databases, Santa Barbara CA, 1989. [Lecture Notes in Computer Science No. 409, Springer-Verlag, 1990]. [LNS93] W. Litwin, M. Neimat, and D. A. Schneider. LH* -- Linear Hashing for Distributed Files. ACM SIGMOD Conf., June 1993. [LNS93] W. Litwin, M. Neimat, and D. A. Schneider. RP* -- A Family of Order-Preserving Scaleable Distributed Data Structures. Proceedings of the 1994 International Conference on Very Large Data Bases, August 1994. [LS89] D. Lomet, B. Salzberg. The hB-tree: a Robust Multi-Attribute Indexing Method. ACM Trans. on Database Systems, Vol.15, No.4, 1989. [LY81] P. L. Lehman and S. B. Yao. Efficient Locking for Concurrent Operations on B-Trees. ACM Transactions on Database Systems , Vol. 6, No.4, December 1981. [McC85] E. McCreight. Priority Search Trees. SIAM Journal of Computing Vol.14, No.2, May 1985. [ML92] C. Mohan and F. Levine. ARIES/IM: An Efficient and High Concurrency Index Management Index Method Using Write-Ahead Logging. ACM SIGMOD Conf., June 1992. [NS88] E. Neuhold, M. Stonebraker (ed.). Future Directions in DBMS Research. Report no. TR-88-001 International Computer Science Institute, Berkeley, California, May, 1988. [Ore86] J. Orenstein. Spatial Query Processing in an Object-Oriented Database System. ACM SIGMOD Conf. 1986. [Rob81] J.T. Robinson. The K-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes. ACM SIGMOD Conf., 1981. [RS85] K. Ramamohanarao, J. Shepherd. A superimposed codeword indexing scheme for very large Prolog databases. Technical Report 85/17, Department of Computer Science, University of Melbourne, November 1985. [Sag86] Y. Sagiv. Concurrent Operations on B*-trees with Overtaking. Journal of Computer and System Sciences, 33(2):275--296, 1986. [Sam89] H. Samet. The Design and Analysis of Spatial Data Structures. Pub. Addison Wesley, 1989. [SG88] D. Shasha and N. Goodman. Concurrent Search Structure Algorithms. ACM Transactions on Database Systems, Vol. 13 No. 1, March 1988. [SK90] B. Seeger, H.-P. Kriegel. The Buddy-tree: an Efficient and Robust Access Method for Spatial Data Base Systems. 16th Int. Conf. on Very Large Data Bases (VLDB), Brisbane, 1990. [SSU91] A Silberschatz, M. Stonebraker and J. Ullman (ed.). Database Systems: Achievements and Opportunities. Communications of the ACM, Vol.34, No.10, October 1991. [Tam84] M. Tamminen. Efficient Geometric Access to a Multi-representation Geo-database. Geoprocessing 2, pp.174-196, 1984. [VBW94] R. Vingralek, Y. Breitbart, and G. Weikum. Distributed File Organization with Scaleable Cost/Performance. ACM SIGMOD Conf. , May 1994. |