Skip to content

Semi-structured Data

The Need

Many applications require storage of complex data, where the schema may change often. In many situations, the relational model's requirement of atomic datatypes can be overkill. Storing multi-valued attributes as a set is often simpler than normalising them. Data exchange can benefit from this kind of semi-structured data. It allows us to conveniently pass a chunk of information between the back and front ends.

Flexibility

  • Wide column representation allows each tuple to have its own set of attributes
  • Sparse column representation has a fixed superset, but tuples may omit certain attributes

Multi-Valued Datatypes

  • Sets / Multisets - {'A', 'B', 'C'}
  • Key-value Maps - {('A','X'), ('B','Y'), ('C','Z')
  • Arrays - [5, 8, 9, 11]

These are modeled with a non first normal form (NFNF) data model. In some cases, a database may provide specialised support for arrays. Examples include GeoRaster, PostGIS, and SciDB.