Mats Carlin

PhD project : Improving the performance of shape similarity retrieval systems

Currently, we search a large database with Computer Aided Design (CAD) drawings of extruded aluminium sections for similar objects. We also have access to a large number of logged variables from the production process. We know that the geometry of the object influence the production process. One long term goal is to build empirical models of the production process based on the logged data and the geometry of the produced objects. Such models may be combined with mechanistic and linguistic models. By empirical models we mean models built directly from empirical production data, not from a physical or linguistic knowledge about the process. We focus on empirical models that are interepretable and easily can be integrated with alternative physical and linguistic models.

The main aim of the thesis is to search for similar objects in a shape similarity retrieval system and how to improve this search. I have focused on creating a framework for comparison of different shape similarity retrieval methods based on different evaluation measures. The evaluation measures are based on ranking of perceptual similarity, knowledge about the application (production data and physical models) and knowledge about the shape representation (mathematical similarity).

The set of valid 2-dimensional objects inhabit an abstract shape space. We have little understanding of how this shape space looks like and how similar shapes are arranged and can be identified in this shape space. We therefor wish to transform every 2-dimensional object to a vector space, in which similarity is interpretable and understandable. One possibility is to transform the objects to a n-dimensional feature space spanned by n different features computed directly from the shapes. Each object is represented by a point in this feature space. We use a distance measure to locate similar shapes in the feature space. We must assume that the features express some aspect of similarity.

The n-dimensional feature space spanned by a certain set of features, is probably not optimal for shape similarity. All features should be relevant with respect to shape similarity retrieval. In addition some features may contain the same information about an object. These features are redundant. It is also an aim to create an orthogonal feature space with no singularities. By using formal definitions of relevance and redundancy we search for optimal subsets of features that have these distinct properties. We may reduce the number of features drastically by feature subset selection. By selecting optimal subsets we identify the most important features for shape similarity retrieval.

A large number of features are computed from CAD drawings. The main focus has so far been on invariant features with a physical interpretation. .

Contour (as area, perimeter and diameter)
Structure (based on the skeleton of the object)
Mass distribution (geometric moments)
Frequency (Fourier ellipses)
Symmetry (mirrorl, rotation and radial symmetry)
Complexity measures (fractals, angular distribution)
Fuzzy features (circularity, rectangularity, U-shape, C-shape).