Back to blog

Importing relational data from CSV files in Laravel

Manch Minasyan · · 13 min read

The easy part of CSV importing is flat fields. First name goes into first_name, email goes into email, done. The hard part starts the moment a column references something that lives in another table. A "Company" column that needs to become a company_id foreign key. A "Tags" column with comma-separated values that map to a polymorphic many-to-many relationship. A "Category" field pointing to a record that might not exist yet.

This post walks through the real patterns you need: BelongsTo lookups, MorphToMany tag resolution, the match-or-create decision, match priority strategies, and multi-tenant scoping. Each section includes code you can adapt. If you are new to CSV imports in Laravel, The complete guide to CSV imports in Laravel covers the full landscape of approaches first.

#The problem: CSV is flat, databases are not

A CSV file is a grid. Every value is a string in a cell. There are no foreign keys, no join tables, no polymorphic relationships. But the database you are importing into has all of those things.

Consider a contacts CSV:

First Name,Last Name,Email,Company,Tags
Jane,Doe,jane@example.com,Acme Corp,"laravel, php, backend"
John,Smith,john@example.com,Acme Corp,"laravel, devops"
Sarah,Chen,sarah@example.com,Globex Inc,"php, frontend"

Three columns here reference other tables:

Every one of these is a lookup, a potential creation, and a deduplication problem. And they all interact: you cannot insert the contact row until you have resolved the company ID, and you cannot attach tags until the contact row exists.

#Simple case: BelongsTo lookup

The most common relationship in CSV imports is BelongsTo. A column contains a human-readable name, and you need the corresponding foreign key.

Here is how most developers first write this, typically inside a Laravel Excel ToModel import:

use App\Models\Contact;
use App\Models\Company;
use Maatwebsite\Excel\Concerns\ToModel;
use Maatwebsite\Excel\Concerns\WithHeadingRow;

class ContactImport implements ToModel, WithHeadingRow
{
    public function model(array $row): Contact
    {
        $company = Company::where('name', $row['company'])->first();

        return new Contact([
            'first_name'  => $row['first_name'],
            'last_name'   => $row['last_name'],
            'email'       => $row['email'],
            'company_id'  => $company?->id,
        ]);
    }
}

This works when every company in the CSV already exists in the database. But what happens when it does not?

You have three options, and the first two are bad:

Option A: Skip the row. If "Globex Inc" does not exist, set company_id to null or skip the entire contact. This causes silent data loss. The user uploaded 500 contacts and only 430 made it. The 70 that referenced unknown companies vanished without explanation.

Option B: Always create. Use firstOrCreate instead of first. Now every unknown company name becomes a new company record. This sounds reasonable until the CSV has typos. "Acme Corp", "Acme Corp.", "ACME Corp", and "acme corp" all become separate companies. Your companies table fills with garbage that someone has to clean up manually.

Option C: Let the user decide. Show the user which company names matched and which did not. Let them choose per-relationship whether unmatched values should be created, skipped, or corrected before the import runs. This is the right answer, and it is significantly harder to build.

Option C requires that relationship resolution happens as a discrete step, not inline during row insertion. The import system needs to collect all unique relationship values from the file, attempt to match them against existing records, present the results to the user, and then execute based on their decision. This is why a multi-step import wizard exists -- a single-pass model() method cannot support this workflow.

#Complex case: MorphToMany and tags

Tags are the second most common relationship pattern in CSV imports, and they compound every problem from the BelongsTo case.

A typical tags column looks like: "laravel, php, backend". To import this, you need to:

  1. Split the string on commas
  2. Trim whitespace from each value
  3. Normalize for matching (lowercase, strip extra spaces)
  4. Look up each tag in the database
  5. Create any that do not exist (if the import is configured to allow that)
  6. Attach all of them to the contact via the pivot table
  7. Do this after the contact record exists, since you need its ID for the pivot

Here is a manual implementation:

public function afterModel(Contact $contact, array $row): void
{
    if (empty($row['tags'])) {
        return;
    }

    $tagNames = array_map('trim', explode(',', $row['tags']));
    $tagIds = [];

    foreach ($tagNames as $name) {
        if ($name === '') {
            continue;
        }

        $tag = Tag::firstOrCreate(
            ['name' => strtolower($name)],
        );

        $tagIds[] = $tag->id;
    }

    $contact->tags()->syncWithoutDetaching($tagIds);
}

This works for small files. It falls apart at scale for one reason: deduplication.

If your CSV has 500 rows and 400 of them include "laravel" in the tags column, the code above runs Tag::firstOrCreate(['name' => 'laravel']) four hundred times. The first call creates the tag. The next 399 each run a SELECT query, find the existing row, and return it. That is 399 unnecessary database queries.

The fix is an in-memory cache that tracks what you have already resolved within the current import:

private array $tagCache = [];

private function resolveTag(string $name): int
{
    $cacheKey = "tag:" . strtolower(trim($name));

    if (isset($this->tagCache[$cacheKey])) {
        return $this->tagCache[$cacheKey];
    }

    $tag = Tag::firstOrCreate(
        ['name' => strtolower(trim($name))],
    );

    $this->tagCache[$cacheKey] = $tag->id;

    return $tag->id;
}

The cache key uses a {linkKey}:{normalized_name} pattern. The linkKey prefix (here tag:) prevents collisions when you resolve multiple relationship types in the same import. The normalized name strips whitespace and lowercases so that "Laravel", " laravel ", and "LARAVEL" all hit the same cache entry.

This pattern is essential for any import that touches relationship tables. Without it, an import of 10,000 rows with a tags column generates tens of thousands of redundant queries. With it, you issue one query per unique tag value in the entire file.

#The match-or-create decision

The examples above hardcode the behavior: firstOrCreate always creates missing records, first never does. But in production, the right behavior depends on the data and the user's intent.

Consider three scenarios for a "Company" column during a contact import:

  1. Importing into a mature CRM. Most companies already exist. You want to link contacts to existing companies and flag any that do not match. Creating new companies from CSV data would introduce unverified records.

  2. Seeding a new application. Nothing exists yet. Every company name in the CSV should become a new company record.

  3. Ongoing data sync. Some companies exist, some are new. You want to match existing ones and create the rest.

These map directly to three behaviors:

The key insight is that this decision belongs to the user, not the developer. A well-designed import system exposes it as a configuration choice per relationship, not a hardcoded behavior buried in an import class.

#Match resolution strategies

When you do perform a lookup, the next question is: what field do you match on?

Matching a company by name seems obvious until you realize that names are the least reliable identifier. "Acme Corp" and "Acme Corporation" are the same company. "ABC Inc." might appear twice in your database with different addresses. Names are ambiguous by nature.

Better identifiers exist, and they have a natural priority order:

Match field Priority Why
Record ID 100 Deterministic. If the CSV includes the database ID, there is no ambiguity.
Email 90 Nearly unique in practice. Best general-purpose identifier for people and organizations.
Domain 80 Good for companies. Extract from email or use a dedicated column.
Phone 70 Useful but format-sensitive. "+1 (555) 123-4567" and "5551234567" must normalize to the same value.
Name 10 Last resort. Ambiguous, typo-prone, and format-inconsistent.

In code, this priority system looks like:

public function matchableFields(): array
{
    return [
        MatchableField::id(),                                           // priority: 100
        MatchableField::email('email', MatchBehavior::MatchOrCreate),   // priority: 90
        MatchableField::domain('domain', MatchBehavior::MatchOrCreate), // priority: 80
        MatchableField::phone('phone', MatchBehavior::MatchOrCreate),   // priority: 70
        MatchableField::name(),                                         // priority: 10
    ];
}

The system tries each matcher in priority order. If the CSV row contains an ID column and the ID matches an existing record, that match wins regardless of whether the name or email also appears in the data. If there is no ID, it falls back to email, then domain, and so on.

Each MatchableField carries its own MatchBehavior, which means you can have different create-or-skip policies per field. Matching by ID is always MatchOnly -- if you provide an ID and it does not exist, something is wrong and you should not create a record with a potentially conflicting primary key. Matching by email might be MatchOrCreate because a new email legitimately represents a new contact. Matching by name is typically Create because names alone are too ambiguous for reliable lookups.

This priority-based approach lets the import system make the best decision it can with whatever data the CSV provides, without requiring the user to specify which column to match on every time.

#Multi-tenant complications

Everything above assumes a single-tenant application. In a multi-tenant system, every lookup must be scoped.

When you search for a company named "Acme Corp", you are not searching the entire companies table. You are searching for "Acme Corp" within the current tenant's companies. A different tenant might have their own "Acme Corp" that is a completely different organization.

This scoping must be airtight across the entire import pipeline:

The queue job point is where most implementations break. The user uploads a file in a web request scoped to Tenant A. The import job runs minutes later in a queue worker that has no tenant context. If the job does not explicitly restore the tenant scope before running relationship lookups, it queries across all tenants. At best, you get wrong matches. At worst, you leak data between tenants.

Solving this requires the tenant context to be serialized when the import is created and restored when each job runs. The import record itself carries the tenant identifier, and every job reads it before executing any queries.

#Why this is the hardest part of CSV imports

Flat field imports are stateless. Each row is independent. You validate it, insert it, move on. The row before and the row after do not matter.

Relationship imports are stateful. Row 1 creates "Acme Corp" as a new company. Rows 2 through 50 need to reference that same company, not create 49 duplicates. The deduplication cache from earlier in this post is one piece of that state. Match resolution results are another. The user's create-or-skip decisions per relationship type are yet another.

This state has to survive across:

Then add validation. Before the import runs, you want to show the user which company names matched and which did not. That requires a separate resolution pass that queries the database for every unique relationship value in the file, batched and cached, and presents the results in a review step. This is fundamentally different from validating flat fields, which is just running Laravel's Validator on individual values.

And then add multi-tenancy. And MorphToMany. And the fact that a single row might have both a BelongsTo company and MorphToMany tags, each with their own match behavior and deduplication cache. The combinatorial complexity is why relationship handling accounts for more code than every other part of a CSV import system combined.

This is also the problem that motivated building Tapix -- see Why we're building Tapix for the full context.

#The declarative alternative

The manual approaches above work. They also require hundreds of lines of carefully coordinated code per importer, and every new relationship type means writing more of it.

The alternative is to declare the relationship configuration on the field definition and let the import system handle resolution, deduplication, and user-facing match behavior automatically:

ImportField::make('company')
    ->label('Company')
    ->guess(['company', 'company name', 'organization', 'org'])
    ->relationship(
        name: 'company',
        model: Company::class,
        matchBy: ['name'],
        behavior: MatchBehavior::MatchOrCreate,
    ),

Four lines. The field declares that it maps to a company relationship on the Company model, matches by the name column, and uses MatchOrCreate behavior. The import system takes it from there: it collects unique company names from the file, queries the database for matches, presents unmatched values to the user, creates new records if configured to, caches resolved IDs for deduplication across rows, and links everything during execution.

For BelongsTo relationships, the resolved ID is stored as a foreign key on the imported record. For MorphToMany relationships, records are attached via the pivot table after the main record is saved. The storage strategy is determined by the relationship type -- you declare it once and do not think about it again.

Combined with matchableFields() for controlling match priority:

public function matchableFields(): array
{
    return [
        MatchableField::id(),
        MatchableField::email('email', MatchBehavior::MatchOrCreate),
    ];
}

This gives the import system everything it needs to resolve relationships correctly, across rows, across chunks, across queue jobs, and across tenants.


Relationship resolution is the part of CSV importing that separates toy implementations from production-ready systems. If your imports involve relationships, multi-tenancy, and the match-or-create decision, and you would rather not build the infrastructure yourself, take a look at tapix.dev to see how it handles these problems. For inline validation before any data touches the database, see Handling CSV validation errors before they hit your database.

Enjoyed this post?

Get notified when we publish new articles about Laravel imports and data handling.

Almost there — confirm your subscription via email.

Related posts