Digital Technologies Digital Solutions ICT Applied Download Resources Web Applications Games Recycle Bin

Normalisation

Normalization is (basically) to design a logical database schema, such that duplicate and redundant data is avoided. If the same information is repeated in multiple places in the database, there is the risk that it is updated in one place but not the other, leading to data corruption.

Unnormalised Data
Artwork Cost Artist Contact Award1 Award2 Award3
Mona Lisa $49.95 Leonardo da Vinci leo@louvreparis.fr, +33(0)140205263 Archibald Prize, 2021, Australia Marcel Duchamp Prize, 2020, France Hugo Boss Prize, 2020, United States
Café Terrace at Night 34.90 Vincent van Gogh vinny@moma.org
The Starry Night 20 bucks on sale Vincent van Gogh vinny@moma.org Hugo Boss Prize, 2022, United States The Lumen Prize, 2022, United Kingdom
The Starry Night 20 bucks on sale Vincent van Gogh vinny@moma.org Hugo Boss Prize, 2022, United States The Lumen Prize, 2022, United Kingdom
  • different data types in same column (e.g., "$49.95" is STRING, 34.90 is FLOAT, "20 bucks on sale" is STRING)
  • multiple values in same fields (e.g., "leo@louvreparis.fr, +33(0)140205263", "Archibald Prize, 2021, Australia")
  • columns dont (really) have 'unique' names (e.g., Award1 Award2 Award3)
  • duplicate records (e.g., "The Starry Night")

First Normal Form (1NF) - hint: make it look like an SQL db
Stock
StockID Artwork Cost Artist Email Phone
1 Mona Lisa 49.95 Leonardo da Vinci leo@louvreparis.fr +33(0)140205263
2 Café Terrace at Night 34.90 Vincent van Gogh vinny@moma.org
3 The Starry Night 20.00 Vincent van Gogh vinny@moma.org
4 The Starry Night 20.00 Vincent van Gogh vinny@moma.org
Awards
Artwork Award Year Location
Mona Lisa Archibald Prize 2021 Australia
Mona Lisa Marcel Duchamp Prize 2020 France
Mona Lisa Hugo Boss Prize 2020 United States
The Starry Night Hugo Boss Prize 2022 United States
The Starry Night The Lumen Prize 2022 United Kingdom
  • first normal form isn't (really) about redundancy - it simply fixes each of the violated rules above:
    • table columns now contain same data types ('Cost' is type FLOAT)
    • table cells now contain atomic values (Contact → Email + Phone, Award → Award + Year + Location)
    • no 'repeating' groups of column names (these were moved to a separate table 'Awards')
    • no duplicate records due to key field 'StockID'

Second Normal Form (2NF) - hint: check tables with composite primary key (only)
Stock
StockID Artwork Cost Artist Email Phone
1 Mona Lisa 49.95 Leonardo da Vinci leo@louvreparis.fr +33(0)140205263
2 Café Terrace at Night 34.90 Vincent van Gogh vinny@moma.org
3 The Starry Night 20.00 Vincent van Gogh vinny@moma.org
4 The Starry Night 20.00 Vincent van Gogh vinny@moma.org
Awards
Artwork Award Year
Mona Lisa Archibald Prize 2021
Mona Lisa Marcel Duchamp Prize 2020
Mona Lisa Hugo Boss Prize 2020
The Starry Night Hugo Boss Prize 2022
The Starry Night The Lumen Prize 2022
Locations
Award Location
Archibald Prize Australia
Marcel Duchamp Prize France
Hugo Boss Prize United States
The Lumen Prize United Kingdom
  • second normal form is all about eliminating any partial dependencies on composite primary keys, so that every column relies upon the entire key. 2NF satisfies:
    • all of the rules of 1NF (first normal form), and
    • every non-key field (attribute) is fully dependent on the entire composite key. in this case:
      • the Location is dependent on the Award alone, and not the entire composite key of Artwork Award
      • .. so a new table is created 'Locations', and related with the Foreign Key field 'Award'
  • if the relation is in 1NF and only has a single column key, it is already in 2NF (which explains why the Stock table has been ignored / grayed out).

Third Normal Form (3NF) - hint: move fields that do not depend on the key to another table
Stock
StockID Artwork
1 Mona Lisa
2 Café Terrace at Night
3 The Starry Night
4 The Starry Night
Artworks
Artwork Cost Artist
Mona Lisa 49.95 Leonardo da Vinci
Café Terrace at Night 34.90 Vincent van Gogh
The Starry Night 20.00 Vincent van Gogh
Artists
Artist Email Phone
Leonardo da Vinci leo@louvreparis.fr +33(0)140205263
Vincent van Gogh vinny@moma.org
Awards
Artwork Award Year
Mona Lisa Archibald Prize 2021
Mona Lisa Marcel Duchamp Prize 2020
Mona Lisa Hugo Boss Prize 2020
The Starry Night Hugo Boss Prize 2022
The Starry Night The Lumen Prize 2022
Locations
Award Location
Archibald Prize Australia
Marcel Duchamp Prize France
Hugo Boss Prize United States
The Lumen Prize United Kingdom
  • third normal form states that no non-key attribute may depend on another non-key attribute. Basically, removing transitive functional dependencies. 3NF satisfies:
    • all of the rules of 2NF (second normal form), and
    • no non-key column is functionally dependent on another non-key column, which is functionally dependent on the PK. in this case:
      • Cost and Artist dependent on (determined by) the Artwork, not the StockID (moved to table Artworks)
      • Email and Phone dependent on the Artist, not the StockID (moved to table Artists)





older normalisation notes (not necessarily accurate)
normalisation.docx
normalisation.pdf
normalisation explained