The match-or-create pattern: linking CSV data to existing records | Tapix                  [ ![Tapix](/img/tapix-logo-light.svg) ![Tapix](/img/tapix-logo-dark.svg) ](https://tapix.dev) [Features](https://tapix.dev#features) [Pricing](https://tapix.dev#pricing) [Docs](https://docs.tapix.dev) [Blog](https://tapix.dev/blog)

   Try Demo  [ Get Tapix from $99](https://tapix.dev#pricing)

  [Features](https://tapix.dev#features) [Pricing](https://tapix.dev#pricing) [Docs](https://docs.tapix.dev) [Blog](https://tapix.dev/blog)   Try Demo  [ Get Tapix from $99](https://tapix.dev#pricing)

   ![Tapix](https://tapix.dev/img/tapix-logo-light.svg)

 Best PracticesThe match-or-create pattern: linking CSV data to existing records
=================================================================

 tapix.dev/blog

  [    Back to blog ](https://tapix.dev/blog) [ Best Practices ](https://tapix.dev/blog/category/best-practices)

The match-or-create pattern: linking CSV data to existing records
=================================================================

 Manch Minasyan ·  June 2, 2026  · 11 min read

 Every CSV import that touches relationships faces the same question: what happens when a row references something that might already exist in the database? A "Company" column that says "Acme Corp" could mean one of three things. It could mean "find the existing Acme Corp and link to it." It could mean "find Acme Corp if it exists, otherwise create it." Or it could mean "create a new record no matter what." The answer depends on the data, the domain, and the user's intent -- and hardcoding the wrong one causes either silent data loss or a polluted database.

This is the match-or-create pattern. It shows up in every non-trivial CSV import, whether you build it explicitly or stumble into it by accident. Getting it right means defining three behaviors, knowing when each one applies, using the right field to match on, and handling the edge cases that trip up most implementations. If you have not read about how duplicate handling fits into the broader cost of building a CSV importer yourself, [The hidden cost of building your own CSV importer](/blog/hidden-cost-building-csv-importer) has a useful overview.

[\#](#three-behaviors-three-use-cases "Permalink")Three behaviors, three use cases
----------------------------------------------------------------------------------

The match-or-create decision is a three-way enum, not a boolean. Thinking of it as "match or don't match" misses the third case entirely.

```
enum MatchBehavior: string
{
    case MatchOnly = 'match_only';
    case MatchOrCreate = 'match_or_create';
    case Create = 'create';

    public function performsLookup(): bool
    {
        return $this !== self::Create;
    }

    public function createsOnNoMatch(): bool
    {
        return $this === self::MatchOrCreate || $this === self::Create;
    }
}

```

Two helper methods capture the behavioral differences. `performsLookup()` tells the system whether it needs to query the database at all. `createsOnNoMatch()` tells it what to do when a lookup comes up empty -- or when no lookup was attempted. These two booleans drive every downstream decision in the import pipeline.

**MatchOnly: find existing or skip.** The system searches for a matching record. If it finds one, it links. If it does not, it skips the relationship (or skips the entire row, depending on whether the field is required). No new records are created in the related table.

Use `MatchOnly` when the related data is authoritative and the CSV should not modify it. Importing contacts into a mature CRM where companies are managed separately. Importing transactions where the account must already exist. Importing order line items where the product catalog is the source of truth. In all of these cases, a missing match is an error in the CSV, not a signal to create something new.

**MatchOrCreate: find existing or make a new one.** The system searches for a matching record. If it finds one, it links. If it does not, it creates a new record from the CSV data and links to that. This is the most common behavior for user-facing imports because it handles both existing and new data gracefully.

Use `MatchOrCreate` for relationships where the CSV is a legitimate source of new records. Importing contacts with company names where some companies are new. Importing products with categories where the user wants missing categories created. Importing blog posts with tags where new tags are expected. The key criterion is that creating a new related record from the CSV value is a valid operation, not a data quality problem.

**Create: skip the lookup, always insert.** The system never queries the database. Every row produces a new related record, regardless of whether a matching one exists. No deduplication, no matching, no lookups.

Use `Create` for bulk seeding scenarios where you know the related table is empty or where every row genuinely represents a new entity. Importing audit log entries where each row is a distinct event. Initial data migration into a fresh application. Seeding test data. The moment your related table has existing records you care about preserving, `Create` becomes dangerous -- it produces duplicates by design.

[\#](#the-decision-tree "Permalink")The decision tree
-----------------------------------------------------

Choosing the right behavior per relationship comes down to three questions asked in order.

**Question 1: Does the related table contain data you need to preserve?** If no -- if the table is empty or you are doing a full replacement -- use `Create`. There is nothing to match against, so the lookup is wasted work.

**Question 2: Is the CSV an authoritative source for new records in the related table?** If no -- if new records should only come through a dedicated workflow (an admin panel, an API, a separate import) -- use `MatchOnly`. The CSV can reference existing data but should not create new data.

**Question 3: Is the match field reliable enough to avoid false matches?** If yes, use `MatchOrCreate`. If no -- if you are matching on a field like "name" that is ambiguous and typo-prone -- consider using `MatchOnly` for that field and letting the user manually resolve unmatched values in a review step.

These questions apply per relationship, not per import. A single import might use `MatchOrCreate` for the company relationship (new companies are fine) and `MatchOnly` for the account manager relationship (account managers should already exist in the system). The behavior is a property of the relationship field, not the import as a whole.

[\#](#priority-based-matching "Permalink")Priority-based matching
-----------------------------------------------------------------

Deciding the behavior is half the problem. The other half is deciding what field to match on. A company named "Acme Corp" in the CSV needs to find the right row in the `companies` table, but matching by name is the worst option available. Names have typos, formatting differences, and abbreviations. "Acme Corp" and "Acme Corporation" are the same company but will not match on a string comparison.

Better identifiers exist, and they have a natural priority order. A database ID is deterministic -- if the CSV includes it, there is zero ambiguity. An email address is nearly unique in practice. A domain is good for companies. A phone number is useful but format-sensitive. A name is the fallback when nothing better is available.

This priority is encoded as an integer on the matchable field definition:

```
final class MatchableField
{
    public function __construct(
        public readonly string $field,
        public readonly string $label,
        public readonly int $priority = 0,
        public readonly MatchBehavior $behavior = MatchBehavior::MatchOrCreate,
        public readonly bool $multiValue = false,
    ) {}

    public static function id(): self
    {
        return new self(field: 'id', label: 'Record ID', priority: 100, behavior: MatchBehavior::MatchOnly);
    }

    public static function email(string $fieldKey, MatchBehavior $behavior = MatchBehavior::MatchOrCreate): self
    {
        return new self(field: $fieldKey, label: 'Email', priority: 90, behavior: $behavior, multiValue: true);
    }

    public static function domain(string $fieldKey, MatchBehavior $behavior = MatchBehavior::MatchOrCreate): self
    {
        return new self(field: $fieldKey, label: 'Domain', priority: 80, behavior: $behavior, multiValue: true);
    }

    public static function phone(string $fieldKey, MatchBehavior $behavior = MatchBehavior::MatchOrCreate): self
    {
        return new self(field: $fieldKey, label: 'Phone', priority: 70, behavior: $behavior, multiValue: true);
    }

    public static function name(): self
    {
        return new self(field: 'name', label: 'Name', priority: 10, behavior: MatchBehavior::Create);
    }
}

```

Static factories encode sensible defaults. `id()` is always `MatchOnly` at priority 100 -- if you have a database ID and it does not match, something is wrong and you should not create a record with a potentially conflicting primary key. `email()` defaults to `MatchOrCreate` at priority 90 because a new email legitimately represents a new entity. `name()` defaults to `Create` at priority 10 because names are too ambiguous for reliable matching.

In an importer, you declare the full priority chain:

```
public function matchableFields(): array
{
    return [
        MatchableField::id(),
        MatchableField::email('email', MatchBehavior::MatchOrCreate),
        MatchableField::domain('domain', MatchBehavior::MatchOrCreate),
        MatchableField::phone('phone', MatchBehavior::MatchOrCreate),
        MatchableField::name(),
    ];
}

```

The system inspects which columns the user actually mapped from the CSV. If the CSV has an "ID" column and the user mapped it, the match resolver uses `MatchableField::id()` at priority 100. If the CSV has no ID but has an "Email" column, it falls through to `MatchableField::email()` at priority 90. The highest-priority field that has a mapped column wins.

This means the same importer handles different CSV formats without configuration changes. A CSV exported from your own system includes IDs and gets deterministic matching. A CSV from a third-party tool has emails but no IDs and gets near-deterministic matching. A CSV from a spreadsheet the user typed by hand has only names and gets the lowest-confidence matching with the most conservative behavior. The priority system adapts automatically.

Notice each `MatchableField` carries its own `MatchBehavior`. This is deliberate. The confidence of the match field determines how aggressively the system should create new records. High-confidence fields (ID, email) can safely use `MatchOrCreate` because a non-match almost certainly means a genuinely new record. Low-confidence fields (name) should default to `Create` because a non-match might just mean a typo, and running a database lookup against a typo-prone field wastes queries and risks false positives.

One correctness requirement worth noting: in multi-tenant applications, every lookup in the priority chain must be scoped to the current tenant. A company named "Acme Corp" in tenant A is not the same record as "Acme Corp" in tenant B. [Multi-tenant CSV imports in Laravel](/blog/multi-tenant-csv-imports-laravel) covers how to enforce that scoping across the full resolution pipeline.

[\#](#handling-duplicate-matches "Permalink")Handling duplicate matches
-----------------------------------------------------------------------

Priority-based matching works well when each CSV value maps to exactly one database record. It gets interesting when a value matches multiple records.

This happens most often with name-based matching. Your database has two contacts named "John Smith." The CSV has a row referencing "John Smith." Which one should it link to?

The resolver takes the pragmatic approach: first match wins. The query returns results ordered by primary key, and the first result becomes the match. This is not perfect -- it is arbitrary -- but it is deterministic. The same import with the same data always produces the same result.

The real fix is not a smarter disambiguation algorithm. It is choosing a higher-priority match field that does not have duplicates. If "John Smith" is ambiguous but "john.smith@acme.com" is not, include the email column in the CSV and let the priority system select it over the name.

When ambiguity is unavoidable, the system surfaces it. The review step shows "Matched: John Smith (ID 42) -- 3 records with this name" so the user can verify or override the selection.

[\#](#what-the-user-sees "Permalink")What the user sees
-------------------------------------------------------

The technical details above drive a user-facing experience in the import wizard's review step. After column mapping and validation, the resolver runs and determines the match action for each row. The user sees a clear summary of what will happen before the import executes.

For rows that matched, the preview shows the resolved entity: "Found: Acme Corp (ID 42)." The user verifies that the CSV value resolved to the correct record.

For rows that did not match under `MatchOrCreate`, the preview shows "Will create: New Corp" -- making it explicit that a new record will be inserted. If "Acme Corp" appears as "Will create" when it should have matched, the user knows something is wrong (probably a spelling difference) and can correct it before execution.

For rows that did not match under `MatchOnly`, the preview shows "No match -- will skip." The user can correct the value, skip the row, or switch the behavior to `MatchOrCreate` if they decide new records are acceptable.

This preview step is what separates a trustworthy import from a black box. The user is never surprised by what the import creates, updates, or skips. The UX patterns behind how this review step is designed -- what information to surface, how to present resolved entities, what controls to offer -- are covered in [CSV column mapping UX patterns that reduce support tickets](/blog/csv-column-mapping-ux-patterns).

[\#](#the-behavioral-matrix "Permalink")The behavioral matrix
-------------------------------------------------------------

To summarize the three behaviors and their effects across the pipeline:

BehaviorQueries DB?Creates on no match?Unmatched rowsBest forMatchOnlyYesNoSkipped or flaggedRead-only reference data, authoritative tablesMatchOrCreateYesYesNew record createdMost user-facing relationship importsCreateNoYes (always)N/A -- all rows createBulk seeding, audit logs, fresh databasesThe power of treating this as an explicit enum rather than a boolean or an implicit behavior is composability. Each relationship field in an import carries its own behavior. Each matchable field carries its own behavior. The import system does not need special-case logic for "should I create this?" -- it asks the enum, and the enum answers.

[\#](#going-deeper "Permalink")Going deeper
-------------------------------------------

The match-or-create pattern is one piece of the relationship import puzzle. For the full picture of how relationships are resolved during CSV imports -- BelongsTo lookups, MorphToMany tags, multi-tenant scoping, and the declarative field configuration that ties it all together -- see [Importing relational data from CSV files in Laravel](/blog/importing-relational-data-csv-laravel).

For the deduplication problem that compounds with match-or-create (what happens when 300 rows all reference "Acme Corp" and you are creating on no match), see [Intra-import deduplication: preventing duplicate records during CSV import](/blog/intra-import-deduplication).

For a hands-on tutorial that puts match-or-create into practice with a real CRM importer, see [Building a contact importer for your CRM](/blog/building-contact-importer-crm).

---

If you are building CSV imports that involve matching against existing records and you would rather declare the behavior than code the plumbing, take a look at [Tapix](/) to see how it handles match resolution, priority-based field selection, and the user-facing review step out of the box.

 ### Enjoyed this post?

Get notified when we publish new articles about Laravel imports and data handling.

  Email address   Subscribe

Almost there — confirm your subscription via email.

 Related posts
-------------

 [  Best Practices   May 26, 2026

 Intra-import deduplication: preventing duplicate records during CSV import
----------------------------------------------------------------------------

500 rows reference 'Acme Corp'. Without deduplication, you get 500 company records. Here's the normalized-key cache pattern that prevents it.

 ](https://tapix.dev/blog/intra-import-deduplication) [  Best Practices   May 15, 2026

 The hidden cost of building your own CSV importer
---------------------------------------------------

That two-day CSV import task? It always becomes two months. Here's the iceberg of complexity hiding beneath the upload button.

 ](https://tapix.dev/blog/hidden-cost-building-csv-importer) [  Best Practices   May 5, 2026

 CSV column mapping UX patterns that reduce support tickets
------------------------------------------------------------

Every user's CSV is different. Smart column mapping -- with auto-detection, preview values, and entity link mapping -- keeps imports flowing without support tickets.

 ](https://tapix.dev/blog/csv-column-mapping-ux-patterns)

   [ ![Tapix](/img/tapix-logo-light.svg) ![Tapix](/img/tapix-logo-dark.svg) ](https://tapix.dev)CSV and Excel import wizard for Laravel.

  Product [Pricing](https://tapix.dev#pricing) [Docs](https://docs.tapix.dev) [Blog](https://tapix.dev/blog) [Contact](mailto:hello@tapix.dev)

 Compare [vs Laravel Excel](https://tapix.dev/vs/laravel-excel) [vs Filament Import](https://tapix.dev/vs/filament-import)

 Legal [Privacy](https://tapix.dev/privacy-policy) [Terms](https://tapix.dev/terms-of-service)

© 2026 Tapix. All rights reserved.
