Skip to main content

What is the difference between structured data and unstructured data?

All data, in all different formats, can be sorted into one of two categories: structured data and unstructured data. Structured data is clearly defined data types with patterns and is easily searchable. Unstructured data is everything else and is not easily searchable. In this article, we’ll compare structured data vs. unstructured data seeing examples of both while also weighing in pros and cons of both types of data. The vast majority of the data you store on customers, employees etc. is often unstructured. The question then is how to find it and protect it?

What is structured data?

Structured data has been formatted to fit a set structure to make it easier to work with and faster to search. It is highly organized and follows a pre-defined data model that is easily decipherable by machine learning algorithms. Structured data usually resides in relational databases, i.e., stored in tables with rows and columns that are related to each other. 

Examples of structured data

Data stored in relational databases (MySQL, Oracle, etc.); inventory control systems, weblog statistics and point of sale data, airline booking systems, bank transactional systems, CRM applications, accounting systems, barcodes and more.  

Structured data examples

Structured data format examples

  • Tab-Separated Value files (tsv)
  • Comma-Separated Value files (csv) 

In addition, there are semi-structured data format examples:

  • JavaScript Object Notation (JSON) 
  • Arvo 
  • Optimized Row Columnar (ORC) 
  • Apache Parquet 
  • XML 

Pros of structured data

Pros of structured data 

  • Easy to search and access with machine learning (ML) algorithms 
  • Easy to track and understand 
  • Requires less processing and is easier to manage. 

Cons of structured data

Cons of structured data 

  • It is less flexible, because of its pre-defined structure/format. 
  • It takes more time and resources to change and update. 

What is unstructured data?

Around 80% to 90% of global data is unstructured data. Unstructured data exists in its native or raw form. It is irregular and unorganized. It cannot be processed and analyzed via conventional data tools and methods. It comes in a much greater variety of formats than structured data.

Examples of unstructured data

Typical examples of unstructured data are listed below. There will probably be some that you know well by name – but have not considered that it belongs to unstructured data. These are: Emails, chats, invoices, registrations of various kinds, presentations, reports – which contain data in the form of text, numbers, sound, images or video. It could be:

Email: Eml, msg, emlx, dbx, and wab
Presentations: ppt, keynote, gslides, or ppz
Plain text: text or txt
Compressed files: 7z, zip, rar, rar5
Text tables: csv or tsv
Spreadsheets: xls, xlsx, numbers, cal, and ots*
Audio files: mp3, mp4a, wma, ram, aac
Video files: mpeg, mpg, h263, h264, 3gp, wmv
Image files: jpeg, png, bmp, tiff
Design: model, stl, iges, art, 3dxml, psmodel
Publishing: pdf, pub, xfdf, ave
Crypto Keys and Certificates: crt, pem, pkipath, etc.
Desktop files: pdf, pub, xfdf, ave, etc.
Database files: 4db, adt, box, kexic, contact, pdb
Binary files: gsf, hex, exe, or bpk
Mark-up texts: html, xhtml, markdown
Machine-readable data: avro, parquet, xml, dtd, or xsd (semi-structured)
Machine-generated medical data: dicom and hl7
Source codes: a2w, amw, androidproj, awd, axb, bufferedimage, or buildpath 

*The nature of some data types, such as spreadsheets, is still a matter of debate. The spreadsheet itself has some structure, but the data you put into each cell of a spreadsheet, like Excel, is not regulated by the application.

Want to clean up your emails for sensitive information?

With an analysis scan by DataMapper, you can have all Outlook accounts in your company scanned. You will receive key statistics on all (current and former) employees' emails - including information on which emails, employees and processes generate GDPR risk.

Pros of unstructured data

Pros of unstructured data 

  • It is more adaptable.  
  • It can be collected quickly and easily. 
  • It is cheap/easy to store in large volumes.  

Cons of unstructured data

Cons of unstructured data 

  • Lack of visibility
  • You don’t understand how to put it to best use and protect it.
  • Data management tools are needed to manipulate unstructured data.

Unstructured data management challenges

Worldwide, unstructured data is far more abundant than structured data. Because it comes in so many formats and is easy to store, most companies have a considerable amount of unstructured data in their systems.  

Managing unstructured data without the right tools is difficult, because its raw and unorganized nature makes it difficult to search and access = low visibility. 

The main challenges of unstructured data management are: 

  • Lack of visibility 
  • Large volume 
  • Unorganized 

Unstructured personal data and the GDPR

These terabytes of unstructured files your company has accumulated over the years are certain to contain plenty of personal data and sensitive personal data. The low visibility of unstructured personal data presents a special challenge for compliance with privacy laws like the GDPR, CCPA and others. New privacy laws put limits on how long you store personal data, and they require you to monitor and protect it to make sure it will not be accessed by unauthorized persons. 

Leaving unstructured files in data lakes without keeping track of the personal data contained in them is a good way to get fined. Make sure personal data does not linger in your systems too long. When you are no longer using data for the purpose for which you collected, it should be deleted.  

To meet these requirements, you must have systems in place to sort, classify and monitor unstructured personal data. 

Get our Newsletter!

In our newsletter you get tips and tricks for dealing with privacy management from our founder Sebastian Allerelli.

When you sign up for our newsletter you get a license for one user to ShareSimple, which will give you a secure email in Outlook. This special offer is for new customers only, with a limit of one freebie per company.

An easier way to handle unstructured data

The technology has to keep up with the increasing demand for handling unstructured data. Our data discovery tool, DataMapper, is ideal for handling both structured and unstructured data with a focus on personal data privacy and GDPR compliance.

Sebastian Allerelli

Founder & COO at Safe Online
Governance, Risk & Compliance Specialist
Follow me on LinkedIn to get tips on GDPR →