Dataset configuration controls all aspects of what data is displayed, how duplicates are detected and what happens when duplicates are merged.  


For existing datasets, the configuration is accessible from the Trimmr dashboard and within the dataset itself.  



For new datasets, the configuration panel is shown when creating Datasets.




The Dataset dialogue consists of 5 tabs.

  1. General Tab - this section is used to name the dataset and select the Salesforce object that will be analyzed for duplicates.
  2. Fields Tab - this section is used to select the fields that will be displayed to the user. You may also drag the fields up and down the page to change the order that they are displayed when comparing duplicate records side-by-side.
  3. Filters Tab - this section allow users to specify custom filter for the dataset. This adds more flexibility to focus deduplication on specific segments of data. Users can add a filter by selecting a field from the drop-down list and then adding criteria and values. Alternatively, filters can be configured using SOQL commands.  
  4. Merge Rules - this section allows you to specify rules used to select the master record and field values during a merge. A Master Record Rules will automatically select the surviving record within your duplicate group.  Field Value Rules allow you to create automated actions around data values located in your child (or non-surviving records). These actions will typically preserve the data from the child records within the master record upon merging. You can learn more about Merge Rules by following this link.
    Another option within this tab is a selector to enable/disable Undo or Rollback merges. This functionality allows users to reverse merges. To learn more about this funcitonality please use this link.
  5. Machine Learning - this section allows you to specify a models used to detect duplicates. You can learn more about creating and managing models by following this link.

    The Minimum match confidence setting allows users to control how many duplicates will be displayed to the user. This is particularly useful for organizations with very large number of potential duplicates.