Configuring Datasets : DataGroomr Support

Dataset configuration governs every aspect of how data is displayed, how duplicates are detected, and how merges are handled in Dedupe.

Accessing Dataset Configuration

Existing Datasets: Configuration can be accessed from the Dedupe Dashboard or directly within the dataset.
New Datasets: The configuration panel appears automatically during dataset creation.

You can open configuration in several ways:

From the dataset list view — select the three dots next to a dataset and choose Edit.

From the dashboard tiles — click the three dots on a dataset card and select Edit.

From inside an open dataset — use the three-dot menu next to Analyze, then select Edit Dataset.

Configuration Tabs

The dataset configuration panel is organized into six tabs, each serving a specific function.

1. General

This tab allows you to name your dataset and select the Salesforce object that will be analyzed for duplicates.

The Advanced section controls how the dataset behaves when interacting with Salesforce rules and account hierarchies.

Include Parent/Child Accounts in Matching - When enabled, the matching engine considers related accounts in the same account hierarchy (parent and child accounts) during the matching process.
Bypass Active Assignment Rule - When enabled, Salesforce active assignment rules will be bypassed during record creation or update.
Bypass Salesforce Duplicate Rules - When enabled, Salesforce duplicate rules are bypassed during processing, allowing records to be created or updated even if they trigger Salesforce duplicate checks.

2. Fields

Use this tab to choose which fields will be visible during the deduplication process.
Fields can be reordered via drag-and-drop to change how they're displayed in side-by-side comparisons.

3. Filter

This section lets you apply custom filters to focus deduplication efforts on specific subsets of data.

You can:

Add filters by selecting a field, setting criteria, and providing values.
Use SOQL queries for more advanced filtering.

Date Filtering Options

When filtering by date fields, two tabs are available: Absolute Date and Relative Date. Absolute Date is selected by default.

Absolute Date: Select a specific calendar date for filtering.
Relative Date: Filter based on dynamic date ranges using the Date Literal dropdown (e.g., LAST_N_DAYS, THIS_MONTH, LAST_YEAR) and an N value input box where applicable (e.g., for LAST_N_DAYS, enter the number of days).

Example showing multiple field-based filters applied.

Example showing filters written directly in SOQL mode.

New — IN / NOT IN Filter Support

You can now filter datasets using multiple values for a single field.
Select the IN or NOT IN operator, then enter multiple values in a new multi-line text area that supports scrolling and resizing — ideal for bulk inclusion or exclusion filters (e.g., multiple states or industries).

4. Match

This tab specifies the models used for duplicate detection.

Key Features

Matching Model: Select the primary model used to detect duplicates.
Minimum Match Confidence: Set a threshold to define when two records are considered duplicates.
Additional Matching Models: Assign up to two more models to a single dataset for broader detection coverage.
Each model is color-coded and has its own confidence slider.

Multi-Model Execution Options

Then Run – Sequential Execution
Runs models one after another. The second model only evaluates records not matched by the first, ideal for layered precision (e.g., exact first, ML second).

And Run – Parallel Execution
Runs all selected models simultaneously for maximum coverage. Results are merged with confidence prioritization.

5. Merge

Here you can define how merges are handled.

Master Record Rule: Determines which record remains after a merge (e.g., Most Recently Modified).
Field Merge Rule: Specifies how to populate master record fields (e.g., Fill Empty Fields).
Undo / Rollback Merge: Enable options to reverse merges or restore records from encrypted backups.

Live Dedupe

Refer to this article for information about Real-time deduplication Live Dedupe: Deduplication in real-time

Configuring Dedupe Datasets Print