Skip to main content

AIM III – Part 2: AI engine

In the first article of this series, we focused on AIM III’s new processing engine and how it has made gathering large volumes of data from multiple locations for data discovery quicker and easier for companies.  

In this second part, we’ll take a look at AIM III’s new AI engine for DataMapper that will soon be applied in the rest of our product portfolio. 

AIM III’s AI helps us scale faster

DataMapper is an automated data-discovery tool that finds, classifies and monitors personal and sensitive information across all company storage locations and emails; flagging data that poses a potential risk. 

To identify risk data accurately, AIM III uses a combination of advanced AI and machine learning algorithms.  Typically, AI systems recognise sensitive information with either basic NLP techniques like context NER (as used by SpaCy), or are rule-based (as used by MS MIP). Rather than relying on just one of these techniques, AIM III’s AI engine combines a variety of AI approaches; allowing it to recognise street addresses, social security numbers, ages, perform disambiguation and sentiment analyses; and much, much more.  

This multi-prong approach extends the list of recognised entities from the usual Named Entities (PERSON-ORGANISATION-LOCATION) to include more details, like first and last names; or identify whether a number is an age or a street address, for example. 

Get our Newsletter!

In our newsletter you get tips and tricks for dealing with privacy management from our founder Sebastian Allerelli.

When you sign up for our newsletter you get a license for one user to ShareSimple, which will give you a secure email in Outlook. This special offer is for new customers only, with a limit of one freebie per company.

How we built it: Technical details

The system is maintained with our own dataset that represents a variety of real cases, along with constant quality control using a set of metrics and scaled with user gamification so we can constantly improve.

AIM III’s AI engine has been developed as a smart combination of:

  • Our own context-based models. Sensitive words are recognised by the set of words around it, e.g., “Anna’s Social security number is xxxx-xxxxxx”.  In this case, the system understands the semantics, thereby increasing accuracy when classifying the numbers as sensitive.
  • SpaCy’s rule-based framework and additional validation rules. Sensitive numbers (xxxx-xxxxxx) are recognised by a regular expression filter that knows the values of a certain number or word.
  • Hybrid intelligence, our AI along with human intelligence in the form of gamification to allow our models to learn from/support each other.
  • A list-based search and a character-based understanding of names (first, last and street address). The models use general language syntax rules to understand names, even if they were never exposed to them.

Combining these models, along with various Best-of-Breed AI tools that fit the scope of our privacy solutions, lets us scale AIM for many countries, languages and sectors.

For more information, see part one of this series, AIM III: Our new processing engine.

Start your GDPR cleanup where it is needed the most

Sensitive data can tends to accumulate in the employees' e-mails. With a GDPR Risk Scan from DataMapper, you get a report that shows any potential GDPR risks in the company's e-mails.

Benefits for our users

The new AI engine for DataMapper will make data discovery easier, faster and more secure. DataMapper lets you discover, classify, and track your company’s personal and sensitive data, wherever it is; then continuously monitor everything to make sure your privacy policies are being followed.

Would you like to see for yourself what types of sensitive and personal information your company stores?

Sebastian Allerelli
Founder & COO at Safe Online
Governance, Risk & Compliance Specialist
Follow me on LinkedIn to get tips on GDPR →


How to handle sensitive personal data


How to find personal data with datamapping tool


How to prepare for a data audit