AIM III – Part 2: AI engine
In the first article of this series, we focused on AIM III’s new processing engine and how it has made gathering large volumes of data from multiple locations for data discovery quicker and easier for companies.
In this second part, we’ll take a look at AIM III’s new AI engine for DataMapper that will soon be applied in the rest of our product portfolio.
AIM III’s AI helps us scale faster
DataMapper is an automated data-discovery tool that finds, classifies and monitors personal and sensitive information across all company storage locations and emails; flagging data that poses a potential risk.
To identify risk data accurately, AIM III uses a combination of advanced AI and machine learning algorithms. Typically, AI systems recognize sensitive information with either basic NLP techniques like context NER (as used by SpaCy), or are rule-based (as used by MS MIP). Rather than relying on just one of these techniques, AIM III’s AI engine combines a variety of AI approaches; allowing it to recognize street addresses, social security numbers, ages, perform disambiguation and sentiment analyses; and much, much more.
This multi-prong approach extends the list of recognized entities from the usual Named Entities (PERSON-ORGANIZATION-LOCATION) to include more details, like first and last names; or identify whether a number is an age or a street address, for example.
How we built it: Technical details
The system is maintained with our own dataset that represents a variety of real cases, along with constant quality control using a set of metrics and scaled with user gamification so we can constantly improve.
AIM III’s AI engine has been developed as a smart combination of:
- Our own context-based models. Sensitive words are recognized by the set of words around it, e.g., “Anna’s Social security number is xxxx-xxxxxx”. In this case, the system understands the semantics, thereby increasing accuracy when classifying the numbers as sensitive.
- SpaCy’s rule-based framework and additional validation rules. Sensitive numbers (xxxx-xxxxxx) are recognized by a regular expression filter that knows the values of a certain number or word.
- Hybrid intelligence, our AI along with human intelligence in the form of gamification to allow our models to learn from/support each other.
- A list-based search and a character-based understanding of names (first, last and street address). The models use general language syntax rules to understand names, even if they were never exposed to them.
Combining these models, along with various Best-of-Breed AI tools that fit the scope of our privacy solutions, lets us scale AIM for many countries, languages and sectors.
For more information, see part one of this series, AIM III: Our new processing engine.
Benefits for our users
The new AI engine for DataMapper will make data discovery easier, faster and more secure. DataMapper lets you discover, classify, and track your company’s personal and sensitive data, wherever it is; then continuously monitor everything to make sure your privacy policies are being followed.
Would you like to see for yourself what types of sensitive and personal information your company stores?