Importing relational data from CSV files in Laravel
Importing relational data from CSV files in Laravel
The easy part of CSV importing is flat fields. First name goes into first_name, email goes into email, done. The hard part starts the moment a column references something that lives in another table. A "Company" column that needs to become a company_id foreign key. A "Tags" column with comma-separated values that map to a polymorphic many-to-many relationship. A "Category" field pointing to a record that might not exist yet.
This post walks through the real patterns you need: BelongsTo lookups, MorphToMany tag resolution, the match-or-create decision, match priority strategies, and multi-tenant scoping. Each section includes code you can adapt. If you are new to CSV imports in Laravel, The complete guide to CSV imports in Laravel covers the full landscape of approaches first.
#The problem: CSV is flat, databases are not
A CSV file is a grid. Every value is a string in a cell. There are no foreign keys, no join tables, no polymorphic relationships. But the database you are importing into has all of those things.
Consider a contacts CSV:
First Name,Last Name,Email,Company,Tags
Jane,Doe,jane@example.com,Acme Corp,"laravel, php, backend"
John,Smith,john@example.com,Acme Corp,"laravel, devops"
Sarah,Chen,sarah@example.com,Globex Inc,"php, frontend"
Three columns here reference other tables:
- Company is a BelongsTo relationship. "Acme Corp" needs to resolve to a row in the
companiestable and itsidstored ascompany_idon the contact. - Tags is a MorphToMany relationship. "laravel, php, backend" needs to be split into individual tags, each resolved or created in the
tagstable, then attached via a pivot table. - Both Jane and John reference "Acme Corp". Both Jane and Sarah reference "php". The import needs to create each related record once, not once per row.
Every one of these is a lookup, a potential creation, and a deduplication problem. And they all interact: you cannot insert the contact row until you have resolved the company ID, and you cannot attach tags until the contact row exists.
#Simple case: BelongsTo lookup
The most common relationship in CSV imports is BelongsTo. A column contains a human-readable name, and you need the corresponding foreign key.
Here is how most developers first write this, typically inside a Laravel Excel ToModel import:
use App\Models\Contact;
use App\Models\Company;
use Maatwebsite\Excel\Concerns\ToModel;
use Maatwebsite\Excel\Concerns\WithHeadingRow;
class ContactImport implements ToModel, WithHeadingRow
{
public function model(array $row): Contact
{
$company = Company::where('name', $row['company'])->first();
return new Contact([
'first_name' => $row['first_name'],
'last_name' => $row['last_name'],
'email' => $row['email'],
'company_id' => $company?->id,
]);
}
}
This works when every company in the CSV already exists in the database. But what happens when it does not?
You have three options, and the first two are bad:
Option A: Skip the row. If "Globex Inc" does not exist, set company_id to null or skip the entire contact. This causes silent data loss. The user uploaded 500 contacts and only 430 made it. The 70 that referenced unknown companies vanished without explanation.
Option B: Always create. Use firstOrCreate instead of first. Now every unknown company name becomes a new company record. This sounds reasonable until the CSV has typos. "Acme Corp", "Acme Corp.", "ACME Corp", and "acme corp" all become separate companies. Your companies table fills with garbage that someone has to clean up manually.
Option C: Let the user decide. Show the user which company names matched and which did not. Let them choose per-relationship whether unmatched values should be created, skipped, or corrected before the import runs. This is the right answer, and it is significantly harder to build.
Option C requires that relationship resolution happens as a discrete step, not inline during row insertion. The import system needs to collect all unique relationship values from the file, attempt to match them against existing records, present the results to the user, and then execute based on their decision. This is why a multi-step import wizard exists -- a single-pass model() method cannot support this workflow.
#Complex case: MorphToMany and tags
Tags are the second most common relationship pattern in CSV imports, and they compound every problem from the BelongsTo case.
A typical tags column looks like: "laravel, php, backend". To import this, you need to:
- Split the string on commas
- Trim whitespace from each value
- Normalize for matching (lowercase, strip extra spaces)
- Look up each tag in the database
- Create any that do not exist (if the import is configured to allow that)
- Attach all of them to the contact via the pivot table
- Do this after the contact record exists, since you need its ID for the pivot
Here is a manual implementation:
public function afterModel(Contact $contact, array $row): void
{
if (empty($row['tags'])) {
return;
}
$tagNames = array_map('trim', explode(',', $row['tags']));
$tagIds = [];
foreach ($tagNames as $name) {
if ($name === '') {
continue;
}
$tag = Tag::firstOrCreate(
['name' => strtolower($name)],
);
$tagIds[] = $tag->id;
}
$contact->tags()->syncWithoutDetaching($tagIds);
}
This works for small files. It falls apart at scale for one reason: deduplication.
If your CSV has 500 rows and 400 of them include "laravel" in the tags column, the code above runs Tag::firstOrCreate(['name' => 'laravel']) four hundred times. The first call creates the tag. The next 399 each run a SELECT query, find the existing row, and return it. That is 399 unnecessary database queries.
The fix is an in-memory cache that tracks what you have already resolved within the current import:
private array $tagCache = [];
private function resolveTag(string $name): int
{
$cacheKey = "tag:" . strtolower(trim($name));
if (isset($this->tagCache[$cacheKey])) {
return $this->tagCache[$cacheKey];
}
$tag = Tag::firstOrCreate(
['name' => strtolower(trim($name))],
);
$this->tagCache[$cacheKey] = $tag->id;
return $tag->id;
}
The cache key uses a {linkKey}:{normalized_name} pattern. The linkKey prefix (here tag:) prevents collisions when you resolve multiple relationship types in the same import. The normalized name strips whitespace and lowercases so that "Laravel", " laravel ", and "LARAVEL" all hit the same cache entry.
This pattern is essential for any import that touches relationship tables. Without it, an import of 10,000 rows with a tags column generates tens of thousands of redundant queries. With it, you issue one query per unique tag value in the entire file.
#The match-or-create decision
The examples above hardcode the behavior: firstOrCreate always creates missing records, first never does. But in production, the right behavior depends on the data and the user's intent.
Consider three scenarios for a "Company" column during a contact import:
-
Importing into a mature CRM. Most companies already exist. You want to link contacts to existing companies and flag any that do not match. Creating new companies from CSV data would introduce unverified records.
-
Seeding a new application. Nothing exists yet. Every company name in the CSV should become a new company record.
-
Ongoing data sync. Some companies exist, some are new. You want to match existing ones and create the rest.
These map directly to three behaviors:
- MatchOnly -- Only look up existing records. If no match is found, leave the relationship empty or skip the row entirely. The imported data never creates side effects in related tables.
- MatchOrCreate -- Look up first. If a match exists, link to it. If not, create a new record and link to that. This is the most common behavior for user-facing imports.
- Create -- Skip the lookup entirely. Always create a new related record. Useful for one-time bulk seeding where you know nothing exists yet.
The key insight is that this decision belongs to the user, not the developer. A well-designed import system exposes it as a configuration choice per relationship, not a hardcoded behavior buried in an import class.
#Match resolution strategies
When you do perform a lookup, the next question is: what field do you match on?
Matching a company by name seems obvious until you realize that names are the least reliable identifier. "Acme Corp" and "Acme Corporation" are the same company. "ABC Inc." might appear twice in your database with different addresses. Names are ambiguous by nature.
Better identifiers exist, and they have a natural priority order:
| Match field | Priority | Why |
|---|---|---|
| Record ID | 100 | Deterministic. If the CSV includes the database ID, there is no ambiguity. |
| 90 | Nearly unique in practice. Best general-purpose identifier for people and organizations. | |
| Domain | 80 | Good for companies. Extract from email or use a dedicated column. |
| Phone | 70 | Useful but format-sensitive. "+1 (555) 123-4567" and "5551234567" must normalize to the same value. |
| Name | 10 | Last resort. Ambiguous, typo-prone, and format-inconsistent. |
In code, this priority system looks like:
public function matchableFields(): array
{
return [
MatchableField::id(), // priority: 100
MatchableField::email('email', MatchBehavior::MatchOrCreate), // priority: 90
MatchableField::domain('domain', MatchBehavior::MatchOrCreate), // priority: 80
MatchableField::phone('phone', MatchBehavior::MatchOrCreate), // priority: 70
MatchableField::name(), // priority: 10
];
}
The system tries each matcher in priority order. If the CSV row contains an ID column and the ID matches an existing record, that match wins regardless of whether the name or email also appears in the data. If there is no ID, it falls back to email, then domain, and so on.
Each MatchableField carries its own MatchBehavior, which means you can have different create-or-skip policies per field. Matching by ID is always MatchOnly -- if you provide an ID and it does not exist, something is wrong and you should not create a record with a potentially conflicting primary key. Matching by email might be MatchOrCreate because a new email legitimately represents a new contact. Matching by name is typically Create because names alone are too ambiguous for reliable lookups.
This priority-based approach lets the import system make the best decision it can with whatever data the CSV provides, without requiring the user to specify which column to match on every time.
#Multi-tenant complications
Everything above assumes a single-tenant application. In a multi-tenant system, every lookup must be scoped.
When you search for a company named "Acme Corp", you are not searching the entire companies table. You are searching for "Acme Corp" within the current tenant's companies. A different tenant might have their own "Acme Corp" that is a completely different organization.
This scoping must be airtight across the entire import pipeline:
- During mapping, when the system previews match results for relationship columns
- During validation, when it checks whether referenced records exist
- During execution, when it performs
firstOrCreateor links to existing records - In queue jobs, where the original HTTP request's tenant context no longer exists
The queue job point is where most implementations break. The user uploads a file in a web request scoped to Tenant A. The import job runs minutes later in a queue worker that has no tenant context. If the job does not explicitly restore the tenant scope before running relationship lookups, it queries across all tenants. At best, you get wrong matches. At worst, you leak data between tenants.
Solving this requires the tenant context to be serialized when the import is created and restored when each job runs. The import record itself carries the tenant identifier, and every job reads it before executing any queries.
#Why this is the hardest part of CSV imports
Flat field imports are stateless. Each row is independent. You validate it, insert it, move on. The row before and the row after do not matter.
Relationship imports are stateful. Row 1 creates "Acme Corp" as a new company. Rows 2 through 50 need to reference that same company, not create 49 duplicates. The deduplication cache from earlier in this post is one piece of that state. Match resolution results are another. The user's create-or-skip decisions per relationship type are yet another.
This state has to survive across:
- Rows within a chunk -- the deduplication cache must persist within a processing batch
- Chunks within a job -- if you process 500 rows per chunk, the cache from chunk 1 must be available in chunk 2
- Queue job restarts -- if a job fails and retries, previously created related records must not be duplicated
- Database transactions -- creating a company and linking a contact to it should be atomic; if the contact insert fails, the orphaned company is a problem
Then add validation. Before the import runs, you want to show the user which company names matched and which did not. That requires a separate resolution pass that queries the database for every unique relationship value in the file, batched and cached, and presents the results in a review step. This is fundamentally different from validating flat fields, which is just running Laravel's Validator on individual values.
And then add multi-tenancy. And MorphToMany. And the fact that a single row might have both a BelongsTo company and MorphToMany tags, each with their own match behavior and deduplication cache. The combinatorial complexity is why relationship handling accounts for more code than every other part of a CSV import system combined.
This is also the problem that motivated building Tapix -- see Why we're building Tapix for the full context.
#The declarative alternative
The manual approaches above work. They also require hundreds of lines of carefully coordinated code per importer, and every new relationship type means writing more of it.
The alternative is to declare the relationship configuration on the field definition and let the import system handle resolution, deduplication, and user-facing match behavior automatically:
ImportField::make('company')
->label('Company')
->guess(['company', 'company name', 'organization', 'org'])
->relationship(
name: 'company',
model: Company::class,
matchBy: ['name'],
behavior: MatchBehavior::MatchOrCreate,
),
Four lines. The field declares that it maps to a company relationship on the Company model, matches by the name column, and uses MatchOrCreate behavior. The import system takes it from there: it collects unique company names from the file, queries the database for matches, presents unmatched values to the user, creates new records if configured to, caches resolved IDs for deduplication across rows, and links everything during execution.
For BelongsTo relationships, the resolved ID is stored as a foreign key on the imported record. For MorphToMany relationships, records are attached via the pivot table after the main record is saved. The storage strategy is determined by the relationship type -- you declare it once and do not think about it again.
Combined with matchableFields() for controlling match priority:
public function matchableFields(): array
{
return [
MatchableField::id(),
MatchableField::email('email', MatchBehavior::MatchOrCreate),
];
}
This gives the import system everything it needs to resolve relationships correctly, across rows, across chunks, across queue jobs, and across tenants.
Relationship resolution is the part of CSV importing that separates toy implementations from production-ready systems. If your imports involve relationships, multi-tenancy, and the match-or-create decision, and you would rather not build the infrastructure yourself, take a look at tapix.dev to see how it handles these problems. For inline validation before any data touches the database, see Handling CSV validation errors before they hit your database.