It appears that your browser cannot run JavaScript at present. This will prevent you choosing the option below to load the diagram in a separate plain Window.

Task: Draw up the survey data model

Progress 2005-05-02

This Edition: Modified only to suit the revised main Task List. Further progress awaits completion of Tasks 6 and 7.
Task context | Previous versions.

This is a relatively informal page for recording our progress as we work through the above task. When the task is finished, or as appropriate, material from any decisions which we have made will be consolidated to a more formal separate area. Nothing on this page is set yet - all aspects are up for discussion; even items marked as Done can be re-opened if later considerations indicate that they should.

Discussion Colour Codes: Done | Current | Future

Page Contents

1. Task Description
1.1 Cave Survey Data Model
1.2 Cave Survey Entities
1.3 The Entity-Relationship Diagram
2. Task Plan
3. Entity definitions
4. Other definitions
4.1 General
4.2 Survey data stages
5. Entity relationships


1. Task Description

Only the cave survey data model is included in this task. Further data models to be considered later in the project will be for cave topology and cave maps. The items below describe various aspects of this task.
1.1 Cave Survey Data Model

This sub-task is to define the cave survey data model; it's needed so that we can comprehensively define the data fields and their groupings which may be needed in a data transfer or archive. In order to construct a robust CaveXML standard which won't need difficult changes later when we want to add extensions, it is important that we first understand and agree on the data and data structures involved in a cave survey, i.e. the data model. Although our initial pilot CaveXML standard is planned to address only a simple commonly used surveying technique, we need to be aware also of where the further complications lie, so that when we are designing the simpler pilot standard, we can allow for the seamless inclusion of the extras later.

Therefore these draft entity definitions and the accompanying draft diagram attempt to model the data in a cave survey up to the point where the station positions have been calculated. The closer our models get to the real-world things involved, the more robust the result will be. There are sure to be further entities needed, but the ones shown should do as a framework for starting the discussion. Of course not all these entities would be used in any given survey, or used by any existing survey reduction program or data transfer; one would just use those entities which currently applied.

One of the difficulties of course is that some cave surveying terms have different meanings for cavers in different parts of the world. As we are working on an international solution for data transfer and archiving, we will need to recognise this fact and try to settle on a single definition for each term, just for use in this project, which can most easily be accepted. Our "Comments" included in such compromise definitions can acknowledge the alternative definition(s) which may exist.

Once we have a coherent set of entities and their relationships, then we can determine what fields are needed and what entity they belong to (many survey fields have already been enumerated by various people on their websites). We will also need to define the so-called "business rules", i.e. how to handle things when certain conditions apply, such as what fields to include when a particular surveying technique is being used. Another issue we will need to address is how to achieve a range of sequences of data in an XML file to satisfy the various data groupings which people need. Once all this is done we will be in a stable position to populate with fields the XML structures which we have decided upon to describe, store and convey the survey data.

Although XML is inherently capable of describing complex data relationships more easily than a relational database, it would be best to model it first using relational database methods because these methods are well established and understood, and survey data will need to be stored in a conventional relational database at various times for various reasons anyway, i.e. both approaches are needed if our work is to be accepted. Also, it may be that data can be stored more sequence-neutrally in a relational database than in XML, which may help us with the data sequencing issue mentioned above.

| Top | CDX Home | Main Task List | Contents | Task Description | Plan | Entity Defns | Other Defns | Relationships |

1.2 Cave Survey Entities

The "Entities" referred to in this context are the real-world surveying things or events which we want to record data about. A preliminary list for comment is shown below in alphabetical order. These are equivalent to the boxes shown in the separate Entity-Relationship Diagram, which shows how the entities are related to each other. Note that these entities are different to the "entities" referred to in XML and HTML syntax. The detailed fields belonging to each entity are not being considered at this stage of the task. The set of entities below covers the data model for the survey measurements taken and the positions thereby calculated, though not all surveys will involve all of the entities shown of course. Where a definition uses terms appearing elsewhere in this list, an initial capital letter has been used for those terms. Comments, examples and a few possible fields for each entity have also been included to give a better feel for what the entity is about.

1.3 The Entity-Relationship Diagram

In the Entity-Relationship Diagram (ERD) each square represents an entity, and the lines joining them represent any relationships between them which are relevant to our purpose. The label along the line gives some idea of the type of relationship. These relationships are spelled out more fully in the text version of the diagram in the Entity Relationships section below.

Where the line has an arrowhead on one end, it means that one of the entities at the non-arrowhead end may be related to more than one of the entities at the arrowhead end, but not vice-versa. This is called a "one-to-many" relationship. For example, a CaveSystem could contain several Caves but a Cave would not normally belong to several CaveSystems.

Where the line has an arrowhead on both ends, it is called a "many-to-many" relationship. For example, one Person could belong to several Teams, and one Team contains several Persons.

You can also have a "one-to-one" relationship, where one entity at one end of the line is related to only one instance of the entity at the other end.

View ERD in this Window, or

(Note: If you're using Internet Explorer 6 with the above new-window option, the diagram may retreat to an indistinct image at the left side of the page; if so, hover your mouse over this image until a 4-arrow thing appears, which you can then click on. This should expand the diagram to its proper size. Does anyone know how to stop IE6 shrinking the image ("fit-to-screen")? It does not happen with Netscape or Mozilla.)

(Technical note: The original of the diagram is vector-based in StarOffice 5.2 Draw. It is then exported to GIF format for the web page.)

| Top | CDX Home | Main Task List | Contents | Task Description | Plan | Entity Defns | Other Defns | Relationships |


2. Task Plan

The detailed steps of the plan are listed below. Task Context.

Draw up the survey data model:
  1. Sort out any tricky or non-entity terms which we may need to use during discussions.

  2. Begin at the top end of the Cave Survey E-R Diagram and discuss each entity in logical order down the page until all definitions and relationships for this data model are agreed. Only a token number of fields would be discussed during this step - just enough to clarify how the entity would fit in.

  3. When an entity's definition and how it fits has been agreed, the entity will be updated on this web page and on the diagram to show what we have agreed, and its colour set to "Done". Earlier versions will remain available.

| Top | CDX Home | Main Task List | Contents | Task Description | Plan | Entity Defns | Other Defns | Relationships |


3. Cave Survey Entity Definitions

The draft definitions below are being set for our specific CaveXML purpose, and may differ from definitions used for other purposes, i.e. we are not trying to set up normative definitions for cave survey terms for universal use, though hopefully most of them could serve that purpose anyway. Some of these terms may have differing existing definitions in different parts of the world; for use in this project, we will need to settle on a single definition for each, while acknowledging any alternative definitions in the respective "Comments".

Branch
A survey network element which is a sequence of one or more Legs which join two adjacent Nodes in the network.
Possible fields: node1 name, node2 name, leg(s).
Cave
A single cave which is being surveyed.
Possible fields: name, survey IDs, parent cave system, ...
CaveSystem
A collection of related Caves, or a complex Cave, which is being surveyed.
Possible fields: caves, projects.
Fieldbook
A book or identified collection of documents in which the survey readings were originally recorded during the surveying. It may contain material related to several surveys, projects, and/or caves.
Possible fields: book ID, owner, caves.
Instrument
A specific surveying instrument used during a survey Segment. An Instrument's corrections might change from Segment to Segment.
Possible fields: instrument type, serial number, owner, manufacturer, model, date last serviced, date manufactured, corrections and their dates.
Interpoint
An intermediate point along a Shot or Leg from which additional observations are taken. Interpoint positions can be determined by their distance along a Shot from a specified one or other Station. An Interpoint does not form a necessary part of the survey network structure. For example, additional cross-sections may have been taken at Interpoints by means of Rays.
Possible fields: station from, distance from station, rays taken.
Leg
The set of final consolidated survey Measurements related to the connection between two adjacent Stations according to the surveying Technique being used. For example, it might be the result of traverse readings with multiple Shots or sightings in one or both directions and averaged readings. A Leg could be the set resulting from the survey measurements, or it could be the statistical set resulting from later adjustment of the network.
Possible fields: station from, station to, distance, direction, vertical angle, segment, averaged yes/no.
Map
A visual representation resulting from one or more Surveys.
Possible fields: map ID, name, size, horizontal scale(s), vertical scale(s), drafter(s), producer(s).
Measurement
A Measurement is one of the fundamental quantities which was used to calculate the position of a target, usually a new Station, from the Position of an existing Station. For example, in a normal tape and compass traverse the Measurements between the two Stations would be: direct distance, horizontal direction, and vertical angle. (Other cases: (1) independently determined, e.g. by GPS, (2) by triangulation, etc). A Measurement can be used in both a Leg and a Shot, though in a Leg, the Measurement may be the result of determining the final (statistical) Position of the new Station.
Possible fields: name, value, units, item type being measured, item being measured, technique, method.
Method
The surveying and calculation method actually used for obtaining the values for one of the Measurements of the survey. For example, the "distance" Measurement could be obtained by any of the following Methods: tape or rangefinder (a direct single measurement), topofil (difference of two readings), stadia staff (two intercepts and a vertical angle), etc. The chosen Method will affect how many values are contained in a Shot. There will be a range of Methods defined, and new ones will be needed from time to time. This is a lookup reference entity rather than containing values from any specific survey.
Possible fields: name, qty of measurements required, instrument types(s) required.
Node
A survey network element which is a Position of a Station which is the meeting point of more than two Legs, or which is otherwise needed in manipulating the network.
Possible fields: name, legs connected.
Organisation
An organisation associated with any aspect of a Survey or survey Project.
Possible fields: name, code, initials, members.
Person
A person participating in any aspect of a Survey or survey Project.
Possible fields: name, contact details, orgs associated with.
Point
A physical point occupied by one or more Stations. It may or may not have a physical marker. It may have multiple sets of Position co-ordinates from its occupying Stations. It may have several names ranging from an official government designation to a series of cave survey station names.
Possible fields: name(s)+ nametype(s), marker type, date marked, person placing mark, org placing mark.
Position
A calculated location for a Station or Interpoint. There may be several Positions for the one Station, each derived from, for example, different Segments and/or adjustment Techniques.
Possible fields: easting, northing, co-ord system used, altitude, height datum used, units, latitude, longitude, station, program, program version, adjustment technique, date calculated, program operator, segment.
Project
A cave/karst survey and mapping project considered to require extended work over multiple Trips and possibly comprising multiple Surveys.
Possible fields: name, date started, leader, org(s) involved, cave system.
Ray
A set of Measurements from a Station or Interpoint to a target point where the latter does not occupy a formal Station. For example, "left", "right", "up" or "down" sightings would each be an example of a Ray where only one Measurement was recorded, the other Measurements and target being implied. A Ray is a specialised kind of Shot which is different enough to warrant its own entity.
Possible fields: ray type, distance, station from, interpoint from, target, horizontal direction, vertical angle.
Role
One of the types of task being performed in a particular survey segment. Examples: compass reader, elevation reader, tape reader, sketcher, recorder, data processor, etc. The key for a Role would be a combined segment+person, with role-type as a field. If the role-type was unknown in a particular instance, this field could be left blank or given a value of, say, "Unknown".
Possible fields: segment used in, person, role name.
Segment
Part or all of a cave/karst survey Trip which is carried out under a single set of conditions such as Team members, Instruments, Technique, Methods, instrument corrections, etc. That is, a Segment is the largest component in a Survey (highest part in the hierarchy) to which these other values can be attached. A Segment could consist of Stations but no Legs, if the Station Positions were being determined directly. A single Leg later joining two Segments would be a new Segment. Because a Segment has a single set of conditions, it could be manipulated as a single unit if desired.
Possible fields: leg(s), role(s), trip, instrument(s), technique, ray type.
Shot
A set of the actual survey readings resulting from one sighting between two adjacent Stations before any Instrument or other corrections have been applied. The number of readings in the set for a Shot will be determined by the Technique and Methods being used, and hence the number of different Measurements required and the number of readings needed for each Measurement, e.g. two for a Topofil length. Repeated sets of readings for the purpose of averaging are considered to be separate Shots. In the simplest case, a single Shot becomes the Leg between two Stations.
Possible fields: station to, station from, tape distance, magnetic bearing, vertical angle.
Spur
A survey network element which is a sequence of one or more Legs and which is connected to the rest of the network by only one end.
Possible fields: node, leg(s).
Station
A named end of a survey Leg or a directly established point, at a particular physical Point. Its location could be the result of Shots from or to other Stations, or of independent observations such as by GPS or by radio or electromagnetic methods. A Station may have one or more Positions, for example the original position calculated by the the survey field measurements, and also as the result of correction processes such as loop closure or statistical adjustment of the survey mesh, or by later resurveys. The same Station could be in more than one Segment.
Possible fields: name, point, leg(s), shot(s), position(s).
Survey
A related collection of cave/karst survey data which can stand alone, or may form part of a larger survey Project.
Possible fields: cave(s), project, trip(s), location of data, date started.
Team
A semi-permanent group of People who carry out cave surveys. A particular Segment may have been surveyed by a particular Team, or by a group of people not belonging to any formal Team. The informal "team" of People who have carried out the surveying in a particular Segment can be found by examining all the Roles related to that Segment.
Possible fields: name, members, formation date.
Technique
The type of surveying technique used in a surveying Segment to enable the calculation of the position of each Station, and also the calculation technique possibly used later in its mathematical adjustment. Technique examples are traverse, triangulation, resection, GPS, and the various survey adjustment techniques. A survey Technique will require a particular set of Measurement types, and each Measurement type will be obtained by using a particular Method. If a survey Segment field recorded the use of a particular Technique, then a program would use the "rules" for that Technique to guide its subsequent action on the various Measurements. This is a lookup reference entity rather than containing values from any specific survey.
Possible fields: name, measurement type(s), purpose.
Trip
A cave survey trip in which one or more Segments of survey for a Survey or Project are carried out during one nominally continuous time period.
Possible fields: name, start date, end date, survey belonged to.

| Top | CDX Home | Main Task List | Contents | Task Description | Plan | Entity Defns | Other Defns | Relationships |


4. Other definitions

These are other tricky or non-entity definitions which we may need to use during our discussions, and hence will need to agree on beforehand. We will of course end up eventually defining all the fields which belong to the entities, but these ones below need clarification early on. Any others? We can accumulate them here until they get covered in specific field definitions. Some of these terms may have differing existing definitions in different parts of the world; for use in this project, we will need to settle on a single definition for each, while acknowledging any alternative definitions in the respective "Comments".

4.1 General
Altitude
The height of a point above or below mean sea level.
Comments:
  1. Altitude as opposed to Elevation, because the latter can be confused with Inclination (vertical angle).
Azimuth
The horizontal direction of a line of sight measured clockwise from the North line in the range 0-360 degrees or equivalent. It will be a True, Grid, Magnetic, or Assumed Azimuth depending on whether the North line is True, Grid, Magnetic, or Assumed.
Comments:
  1. Azimuth as opposed to Bearing.
Bearing
The horizontal direction of a line of sight measured from a North, South, East or West line in the range 0-90 degrees or equivalent. It will be a True, Grid, Magnetic, or Assumed Bearing depending on whether the line is True, Grid, Magnetic, or Assumed. Example: Bearing E30°S (meaning 30° South of the East line, which is also Azimuth 120°).
Comments:
  1. Bearing as opposed to Azimuth.
  2. This is not common usage of the term in cave surveying of course, but we may need to discuss such measurements and will need a term for it, so we might as well use the correct one. "Bearings" are likely to arise if integrating professional, land, or historic surveys with our cave surveys.
  3. The term "Quad" is also sometimes used for this type of reading.
Entity
A real-world thing, event or concept that we want to record data about. Examples are Cave, Instrument, Trip.
Comments:
  1. Entities are represented by the squares in the diagram.
  2. Note that these entities are different to the "entities" referred to in XML and HTML syntax.
Field
A property of an Entity. For example in a database table, the fields are normally represented by the columns in the table: "start date" is a field of the entity "trip", so in a table which listed trips, [Start Date] would be one of the columns. And in an XML file, any particular field would be represented either as an "element" or sub-element in its own right, or as an "attribute" of another element, depending on various considerations about that field.
Comments:
  1. Fields are not being shown in the diagram, but will be listed out when we come to discuss them in detail later in the Task.
Inclination
The angle in a vertical plane between a line of sight and the horizontal, positive above the horizontal and negative below.
Comments:
  1. Inclination as opposed to Elevation, because the latter can be confused with Altitude.
Traverse
A general term for a contiguous series of Legs in a survey. It may span several Trips and several Methods, etc.
4.2 Survey Data Stages

These are the terms we have decided to use for the processing stages which a particular set of survey data might go through between the original survey readings in the cave and the calculated and adjusted co-ordinates ready for preparing a map.

Field Data
The unaltered survey data (readings and/or sketches) recorded in the field by whatever means, or verified copies thereof. (Accepted 2003-02-07)
Comments:
  1. For example, paper-based records, or decipherable images thereof which may have been altered but only to clarify the original data (for example, marked up photocopies), or data downloaded unaltered from an instrument (survey instrument, PDA, laptop, etc), or data still stored and observable in an instrument.
Raw Data
An initial unaltered digital copy of some or all of the survey Field Data, now ready for editing, validity checking, calculation or other processing. (Accepted 2003-02-07)
Comments:
  1. Where the Field Data was already in digital form, e.g. downloaded from an instrument, the Raw Data version could be an identical but editable copy, whereas the Field Data version would effectively be a read-only copy.
  2. Raw Data could include sketches now converted to fixed or editable digital form.
  3. The Raw Data could be in any format, including that of a survey processing program into which the Field Data has been typed.
  4. If such a program unilaterally alters the data in any material way as it is being entered, then the data has become Edited Data because it differs from the Field Data, i.e. a Raw Data version has effectively been skipped.
  5. Such a survey program might also store the data in a proprietary binary format unconducive to easy data exchange, therefore the coming CaveXML standard may need to define suitable forms of Raw Data to allow free exchange. Such binary data after being exported to a text format might qualify. Acceptable formats for raster or vector graphical data may also be required.
Edited Data
Raw data which has had or is having any kind of mistakes edited out of it, but no systematic instrument corrections have been applied. (Accepted 2003-04-05)
Comments:
  1. The data has been modified since its Raw Data version, but has not yet been certified as Accepted Data.
Accepted Data - Corrected, Uncorrected and No Corrections
The input data which is currently the accepted final version of individual Shots and any other measured data: Accepted Corrected if systematic instrument corrections have already been applied to the measurements, Accepted Uncorrected if systematic instrument corrections exist but have not been applied, Accepted No Corrections if no systematic instrument corrections were recorded. (Accepted 2003-04-05)
Comments:
  1. This data set is approved as now suitable for data reduction, but may or may not be ready for input to any particular survey reduction program depending on the program and what it can accept as input data.
  2. The "correction" terms refer to the application of systematic instrument corrections, not to adjustment of measurements in order to close any loops.
Leg Data - Corrected, Uncorrected and No Corrections
Accepted Data which provides only a single set of the necessary Measurements for each Leg, possibly by consolidation of several sets of accepted Shot data: Leg Corrected if systematic instrument corrections have already been applied to the Leg data, Leg Uncorrected if systematic instrument corrections exist but have not been applied to the Leg data, Leg No Corrections if no systematic instrument corrections were recorded. (Accepted 2003-04-05)
Comments:
  1. If only one Shot was taken between two adjacent Stations, then Accepted and Leg Data would be the same.
Reduced Data
Derived from Measurements taken during a Survey, Reduced Data is two or three dimensional co-ordinate data which gives the Position for Stations in the Survey. Any gross errors have been removed, and any systematic instrument corrections have been applied in earlier stages, but the co-ordinates have not yet been statistically adjusted for the distribution of the random errors inherent in any measurements. (Accepted 2003-04-27)
Comments:
  1. Typically co-ordinates would be northings, eastings and altitude based on true, grid, magnetic, or assumed Azimuths.
  2. Any gross errors would have been removed during the Editing phase, and any systematic errors would have been removed during the Correction phase.
Adjusted Data
Adjusted Data is Reduced Data after the application of statistical adjustment in order to distribute the random errors which remain after any gross and systematic errors have been removed. (Accepted 2003-04-27)
Comments:
  1. Typically this adjustment is done by closing any loops in the Survey.
Adjusted Leg Data
Adjusted Leg Data is fictitious Leg Data which has been generated from a set of Adjusted Data. (Accepted 2003-04-27, but a better descriptive name is needed.)
Comments:
  1. This is where for some reason simple leg data is required (e.g. to feed into a different program) but only co-ordinate data is available, so the leg data has to be "reverse engineered" from the co-ordinate data.

| Top | CDX Home | Main Task List | Contents | Task Description | Plan | Entity Defns | Other Defns | Relationships |


5. Entity Relationships

The draft below describes how the various survey entities above could relate to each other. This is a text representation of the Entity-Relationship Diagram. "Many" below means more than one. The Entities are shown within square brackets [ ].

Alphabetically by entity:

[Branch]
- connects to two [Nodes]
- contains one or more [Legs]

[Cave]
- could belong to a [Cavesystem]
- could have initiated many [Surveys]
- could be recorded in one or more [Fieldbooks]

[CaveSystem]
- contains one or more [Caves]
- could have initiated many [Projects]

[Fieldbook]
- records one or more [Segments]
- could belong to one [Person]
- could belong to one [Organisation]
- could record many [Caves]

[Instrument]
- used in one or more [Segments]
- used by one or more [Methods]

[Interpoint]
- belongs to one [Shot]
- contains one or more [Measurements]
- located at one or more [Positions]
- could connect to many [Rays]

[Leg]
- belongs to one [Segment]
- connects two [Stations]
- contains one or more [Measurements]
- contains one or more [Shots]
- could be part of one [Branch]
- could be part of one [Spur]

[Map]
- is contributed to by one or more [Surveys]
- is contributed to by one or more [People]
- could be produced by many [Organisations]

[Measurement]
- used by one [Technique]
- could form part of one [Leg]
- could form part of one [Shot]
- could form part of one [Ray]
- could form part of one [Interpoint]
- uses one [Method]

[Method]
- used by one or more [Measurements]
- uses one or more [Instruments]

[Node]
- is located at one [Position]
- is connected to by one or more [Branches]
- could be connected to by one or more [Spurs]

[Organisation]
- associated with one or more [People]
- could be involved with many [Projects]
- could own many [Fieldbooks]
- could have produced many [Maps]

[Person]
- could be a member of many [Teams]
- could be associated with many [Organisations]
- could be performing many [Roles]
- could contribute to many [Maps]
- could be involved in many [Projects]
- could own many [Fieldbooks]

[Point]
- is coincident with one or more [Stations]

[Position]
- belongs to one [Segment]
- could have resulted from a calculation or loop adjustment by 
  one [Technique]
- could be the location for one [Station]
- could be the location for one network [Node]
- could be the location for one [Interpoint]

[Project]
- initiated for one [Cavesystem]
- initiates one or more [Surveys]
- involves one or more [People]
- could involve many [Organisations]

[Ray]
- could connect to one [Interpoint]
- could connect to one [Station]
- contains one or more [Measurements]

[Role]
- utilised by one [Segment]
- performed by one [Person]

[Segment]
- is surveyed on one [Trip]
- could be surveyed by one [Team]
- is surveyed using one [Technique]
- utilises many [Roles]
- is surveyed using one or more [Instruments]
- contains one or more [Positions]
- could contain many [Legs]
- is recorded in one or more [Fieldbooks]
- contains one or more [Stations]

[Shot]
- belongs to one [Leg]
- connects two [Stations]
- contains one or more [Measurements]
- could contain many [Interpoints]

[Spur]
- contains one or more [Legs]
- connects to one [Node]

[Station]
- is located by one or more [Positions]
- could be connected to by many [Legs]
- could be connected to by many [Shots]
- is coincident with one [Point]
- could connect to many [Rays]
- belongs to one or more [Segments]

[Survey]
- could include many [Caves]
- could be initiated by one [Project]
- could contribute to many [Maps]
- initiates one or more [Trips]

[Team]
- could survey many [Segments]
- consists of one or more [People]

[Technique]
- used for calculation or loop adjustment of one or more [Positions]
- used for surveying in one or more [Segments]
- uses one or more [Measurements]

[Trip]
- contributes to one [Survey]
- surveys one or more [Segments]

| Top | CDX Home | Main Task List | Contents | Task Description | Plan | Entity Defns | Other Defns | Relationships |

Previous versions: 2004-09-26 | 2002-07-05 |
P. Matthews