Skip to main content

Structured data and unstructured data

A comparison

All data, in all different formats, can be sorted into one of two categories: structured data and unstructured data. In this article, we’ll compare structured data vs. unstructured data and see: 

  1. What are some examples of structured data?
  2. What are some examples of unstructured data?
  3. What are some pros and cons of structured data vs. unstructured data?

And since most of the personal data you store about your customers, employees and others is probably unstructured…how will you track and protect it? 

Structured data vs. unstructured data

Structured data: Clearly defined data types with patterns = easily searchable 

Unstructured data: Everything else = not as easily searchable 

What is structured data?

Structured data has been formatted to fit a set structure to make it easier to work with and faster to search. It is highly organized and follows a pre-defined data model that is easily decipherable by machine learning algorithms. Structured data usually resides in relational databases, i.e., stored in tables with rows and columns that are related to each other. 

Examples of structured data

Data stored in relational databases (MySQL, Oracle, etc.); inventory control systems, weblog statistics and point of sale data, airline booking systems, bank transactional systems, CRM applications, accounting systems, barcodes and more.  

Structured data examples

Structured data format examples

Tab-Separated Value files (tsv) 

Comma-Separated Value files (csv) 

Semi-structured data format examples

JavaScript Object Notation (JSON) 

Arvo 

Optimized Row Columnar (ORC) 

Apache Parquet 

XML 

Pros and cons of structured data

Pros of structured data 

  • Easy to search and access with machine learning (ML) algorithms 
  • Easy to track and understand 
  • Requires less processing and is easier to manage. 

Cons of structured data 

  • It is less flexible, because of its pre-defined structure/format. 
  • It takes more time and resources to change and update. 

What is unstructured data?

Around 80% to 90% of global data is unstructured data. Unstructured data exists in its native or raw form. It is irregular and unorganized. It cannot be processed and analyzed via conventional data tools and methods. It comes in a much greater variety of formats than structured data.

Examples of unstructured data

Emails, chats, invoices, records, presentations, and reports that contain data in the form of text, numbers, audio, images, or video. 

Unstructured data examples

Unstructured data format examples

Email: Eml, msg, emlx, dbx, and wab 

Presentations: ppt, keynote, gslides, or ppz 

Plain text: text or txt 

Compressed files: 7z, zip, rar, rar5 

Text tables: csv or tsv 

Spreadsheets: xls, xlsx, numbers, cal, and ots*

Audio files: mp3, mp4a, wma, ram, aac 

Video files: mpeg, mpg, h263, h264, 3gp, wmv 

Image files: jpeg, png, bmp, tiff 

Design: model, stl, iges, art, 3dxml, psmodel 

Publishing: pdf, pub, xfdf, ave 

Crypto Keys and Certificates: crt, pem, pkipath, etc. 

Desktop files: pdf, pub, xfdf, ave, etc. 

Database files: 4db, adt, box, kexic, contact, pdb 

Binary files: gsf, hex, exe, or bpk 

Mark-up texts: html, xhtml, markdown 

Machine-readable data: avro, parquet, xml, dtd, or xsd (semi-structured) 

Machine-generated medical data: dicom and hl7 

Source codes: a2w, amw, androidproj, awd, axb, bufferedimage, or buildpath 

*The nature of some data types, such as spreadsheets, is still a matter of debate. The spreadsheet itself has some structure, but the data you put into each cell of a spreadsheet, like Excel, is not regulated by the application.

Pros and cons of unstructured data

Pros of unstructured data 

  • It is more adaptable.  
  • It can be collected quickly and easily. 
  • It is cheap/easy to store in large volumes.  

Cons of unstructured data 

  • Lack of visibility
  • You don’t understand how to put it to best use and protect it.
  • Data management tools are needed to manipulate unstructured data.

Unstructured data management challenges

Worldwide, unstructured data is far more abundant than structured data. Because it comes in so many formats and is easy to store, most companies have a considerable amount of unstructured data in their systems.  

Managing unstructured data without the right tools is difficult, because its raw and unorganized nature makes it difficult to search and access = low visibility. 

The main challenges of unstructured data management are: 

  • Lack of visibility 
  • Large volume 
  • Unorganized 

Unstructured personal data and the GDPR

These terabytes of unstructured files your company has accumulated over the years are certain to contain plenty of personal data and sensitive personal data. 

The low visibility of unstructured personal data presents a special challenge for compliance with privacy laws like the GDPR, CCPA and others.  

New privacy laws put limits on how long you store personal data, and they require you to monitor and protect it to make sure it will not be accessed by unauthorized persons. 

Leaving unstructured files in data lakes without keeping track of the personal data contained in them is a good way to get fined.  

Make sure personal data does not linger in your systems too long. When you are no longer using data for the purpose for which you collected, it should be deleted.  

To meet these requirements, you must have systems in place to sort, classify and monitor unstructured personal data. 

Want more free data privacy tips?

Get the latest data privacy management news, trends and expert tips delivered straight to your inbox.

    AI and machine learning for unstructured personal data management

    Technology must keep up with the demand for unstructured data management. Our data discovery tool, DataMapper is ideal for structured and unstructured data management, with a focus on privacy for personal data and GDPR compliance.  

    DataMapper uses AI and machine learning to identify structured personal data and unstructured personal data in the locations SMBs most commonly stash it:

    • Local drives 
    • Network Drives (Windows Fileshare) 
    • Dropbox 
    • Google Drive 
    • Microsoft Exchange 
    • Google Mail 
    • IMAP 
    • Microsoft Outlook 
    • Microsoft SharePoint 
    • Microsoft OneDrive 
    • Azure Blob storage

    Additional cloud database integrations, such as Amazon, SAP, etc., can be added to your DataMapper account on demand.

    Read more about DataMapper → 

    Sebastian Allerelli

    Governance, risk, and compliance specialist