Matching Models are algorithms used to detect duplicate records. DataGroomr provides two options for duplicate detection, machine learning based matching model and classic matching model.
Accessing Matching Models
To access matching models, press Objects/Matching item in the Navigation Menu and press on the desired object.
Creating a Matching Model
Press ADD MODEL button and select desired model type:
Classic Matching Model
Selecting Classic Matching option will open a dialog that includes the following elements:
- Name - enter a unique name for your model
- Fields - select all fields that should be included as part of this model. Matched records will be identified based on exact match on these fields.
- Field Sets - Use a predefined set of fields that DataGroomr reccomends matching on. The selected fields can alwyays be reconfigured to better fit your needs
- Field Weights - Clicking on the percentage values will allow you to specify the importance of field similarity matches between specified fields. The higher the percentage controlled by the slider, the greater the influence of matches betweeen record fields will have against the match confidence score.
Press Save button to create a model.
Note: Classic models are based on the OR condition, it means that you will see matched groups based on the match confidence as calculated across all fields in a model. For example,Using the example model above will generate groups where records
- matching on all fields will have 100% match confidence;
- matching on Full Name and Business Phone will have 70% match confidence;
- matching on Full Name, Phone and Email will have 90% match confidence;
- matching on Full Name and Email will have 60% match confidence;
- and so forth as long as group match confidence is more than the minimum confidence selected in a dataset.
Gear Icon (Additional Options)
1. Blank Values
Matching behavior on blank values can be customized to allow blank values to be considered as matches. Users can specify whether or not to match records if either field is blank or if blank values should be disregarded entirely from the process. Match confidence values are impacted by this setting.
2. Field Groups
Fields can be configured as a group, which tells DataGroomr to match records as a collective group, rather than individual fields in records. This feature is useful to match on values that can be stored in different fields, for example, an address field inside an account could be populated in one of many places. A group tag assigned to fields allows the cross comparison and duplicate detection in mis-entered fields.
Good to know: Fields of type Phone and Email are pre-configured as a group.
3. Synonyms
When selected, words contained within a dictionary list are considered to be the same word. A common example would be the contact name Robert which might be alternately be entered as Rob, Bob or Robbie.
4. Ignore Words
When selected, words contained within a dictionary list are ignored, therefore field value similarities between two records being compared are ignored. Ie. Corporation, Corp, Incorporated or Inc.
Add additional words into your list of synonyms and ignore words by Supervisr: Dictionaries
5. First N Characters
First N Characters setting allows users to specify a defined number of first characters in a field value instead of the entire text. This feature might be used to compare only area codes within a phone number field or the prefix numbers within a zip code.
Machine Learning Matching Model
Selecting Machine Learning option will open a dialog that includes the following elements:
- Name - enter a unique name for your model.
- Fields - select all the fields that should be included as part of this model. Notice that a pencil icon is shown within each field. Pressing this icon will allow you to control how the matching is done for this field.
Machine Learning Models are based on algorithms powered by machine learning. When a new model is created it will need to be Trained. Supervised Training is a process where you provide examples of the duplicates and distinct records for DataGroomr to learn and identify patterns in data based on the fields you specified.
Tip: A model may be trained multiple times to improve accuracy.
Fields
The prefilled matching type is auto selected by DataGroomr based on the type of object field and it is generally not
The following comparison types are available:
- Text - compares text values. Default comparison type;
- Short Text - compares short text values, faster than text, good examples to use it are City names and Zip Codes;
- Long Text - compares long text values like Description, preselected for TextArea field types;
- Name - compares person or company names;
- List - compares values in a list, preselected for Picklists;
- Date/Time - compares values as date and time, preselected for Date and DateTime field types;
- Number - compares numbers, preselected for price and number field types;
- Exact - checks if values are exactly the same;
Press Save to save model in Draft status or press Train button to advance to the Training stage.
Learn more: Training machine learning model
Editing an Existing Model
To Edit an existing model, select it and then press the Open button.
Classic matching model can be edited at any time. Machine learning model can be edited only while it's in a Draft state. If it's Trained then it can be trained or re-trained and another version of the trained model will be created.
Learn more: Training machine learning model
Cloning a Model
Occasionally you may need to modify or retrain an existing matching model. For example, you may need to remove or add a field. However, an existing model cannot be changed this way, but you can create a copy that can be edited.
To do this, select the rule and then press the CLONE button.
Deleting Model
A model can be deleted by selecting rule and then pressing the Trash button.
Assigning to Datasets
Classic matching models and trained machine learning models can be assigned to datasets or can be designated as default model for your organization.
From the Matching Models feature, select a model press the Assign button. Then choose the datasets to apply and press Assign button.
Good to Know: Alternatively, the same can be done using the Dataset Configuration window.
To set a model as a default, select the rule and then press the SET DEFAULT button. The green 'Default' label will be displayed next to that model.