Scope of the Alignment Editor

The HUMBOLDT Alignment Editor (HALE) is an application for defining mappings between concepts (also called spatial object types, classes or feature types) in conceptual schemas (e.g. GML Application Schemas, Database Schemas or UML models), as well as for defining transformations between attributes of these schemas. These mappings are expressed in a high-level language and can later be used by the Conceptual Schema Transformer processing component to execute the transformation based on XML input/output.

To make this complex process more accessible to a domain expert and to increase the quality of transformations, HALE allows working with sample geodata (instances) for visualization and validation. Furthermore, a sophisticated task-based system as it is often used in programming supports users in the creation of a mapping.

The value that HALE provides to data custodians, i.e. maintainers of geographic datasets, is manifold:

  1. HALE provides a unique declarative approach to making interactive schema mapping a less daunting task;
  2. It is based on a powerful conceptual-level mapping paradigm that makes mappings easier to understand and to maintain;
  3. It makes use of both the information in different conceptual schemas and of geographic instances to ensure high-quality mappings;
  4. It provides a rich, textual and graphical interface specifically adopted for GI Experts;
  5. It gives instant feedback about the progress of mapping data from one schema to another;
  6. With HALE, users can furthermore document known limitations of the mapping they are creating in the form of a unique Mismatch Description Language.

Application in the Protected Areas Scenario

The HUMBOLDT project has several scenarios which have been used to test-drive the developed software. The following presents a User Story from the Protected Areas scenario.

"Mario has created the complete application schema for his protected areas management applications. It is a cross-theme schema that includes quite different INSPIRE themes:

  • From Annex I: Administrative units (1:5.000), Transport networks (Railway Network and Roads, 1:10.000), Hydrography (1:10.000), Protected Sites (1:25.000)
  • From Annex II: Land-use and land-cover (1:25.000), Elevation, Orthoimagery
  • From Annex III: Bio-geographical regions, Habitats and biotopes, Species distribution

Furthermore, there are some specific data sets that are not represented in INSPIRE themes:

  • Burnt Areas (1:10.000),
  • Monument Trees,
  • Stopping Places,
  • Foot-paths and Hiking Trails,
  • and a regional topographic map (1:25.000) that is used as a backdrop.

Now he needs to transform the individual data sets he already has into the harmonized schema. To do so, he switches from the HUMBOLDT Model Editor to HALE and first of all loads his scenario application schema as the target model. The target model is now being shown in a tree form in one part of the UI.

The next thing he does is to select the WFS providing the Protected Sites data for one of the regions. The system retrieves the schema for this input data set and displays it's FeatureTypes in a view similar to the one for the harmonized target model. If no schema can be identified for the data set, Mario will be asked by the system to provide one manually.

After this step is completed, the system downloads a sample of the data provided by the WFS to analyze it's attribute values and to visualize it in it's map component.

After the analysis has completed, HALE presents him with a list of tasks: Each one represents the connection of one FeatureType of the source WFS to corresponding types in the harmonised application schema. The tasks are ordered according to their value, which is derived from the number of instances of a type and a few other parameters.

Mario therefore starts with the task with the highest value. He double-clicks it, and the corresponding source FT is selected and shown in the lower left corner of the GUI, together with geometric and other attributes. Mario can now select one or multiple FeatureTypes of the target application schema. He can also add additional source FeatureTypes, if the mapping tasks represents a union of multiple FeatureTypes to one.

After doing so, he adds an attribute transformation: The geometry type of the selected FeatureType in the WFS is a MultiPolygon, but his harmonized application schema needs a simple Polygon per Feature. Also, he adds transformations and simply copying operations for numeric and textual informations.

After completing the mapping of all relevant attributes into the target model, he closes the task. Next, the system performs a few sanity checks and updates the task list if necessary. This can be the case if mappings have been established that contradict each other, or attribute transformations that would not work on the sample data, or transformations that would overwrite each other.

Mario can now select the next task and go through the process of defining relationships between FeatureTypes and mapping their attributes until he has mapped everything from the source WFS that he needs. He can now save the mapping.

Now, Mario has to repeat the exercise for the data provided by the next region, but this time, it's already easier. He can now use the data set he worked on before as a target reference data set. In this way, he gets a direct visual comparison of the non-harmonised data set and what it should look like in the end. Also, he can re-use parts of the mapping created beforehand, since parts of the source schema are also identical.

After a full day of work, he has created the mappings for the four regions involved and looks forward to seeing the transformed data work together…"