In short, a Dataset is the workspace for any records you bring into the system — whether from Salesforce or a CSV file. They’re flexible, configurable, secure, and can power automated processes across all modules.


A Dataset is a collection of records that you want to analyze and work with across the platform. It serves as the foundation for identifying duplicates, applying data quality rules, and automating cleanup tasks.


Datasets can come from multiple sources:

  • Salesforce data – For example, your Leads, Contacts, or Accounts can each become their own Dataset.

  • CSV files – You can also upload CSV files to create new Datasets, useful when working with external lists or preparing data before loading it into Salesforce.


How does it work?

Once created, a Dataset is analyzed to identify duplicates and potential quality issues. The datasets page will display all datasets:

  • The total number of records in the Dataset

  • The number of duplicates detected

  • The progress of the analysis


Flexibility and Control

  • Unlimited Datasets – You can create as many Datasets as needed, whether from Salesforce objects or CSV files.

  • Custom Settings – Each Dataset can be configured with its own matching models, merge rules, and other preferences.

  • Permissions – Access can be controlled with role-based permissions to ensure the right people have visibility and control.

  • Automation – Datasets can be used within the Automate module to schedule recurring deduplication, transformations, or other workflows, reducing manual effort and keeping data continuously clean.