Tapix under the hood: how we built a 4-step import wizard | Tapix                  [ ![Tapix](/img/tapix-logo-light.svg) ![Tapix](/img/tapix-logo-dark.svg) ](https://tapix.dev) [Features](https://tapix.dev#features) [Pricing](https://tapix.dev#pricing) [Docs](https://docs.tapix.dev) [Blog](https://tapix.dev/blog)

   Try Demo  [ Get Tapix from $99](https://tapix.dev#pricing)

  [Features](https://tapix.dev#features) [Pricing](https://tapix.dev#pricing) [Docs](https://docs.tapix.dev) [Blog](https://tapix.dev/blog)   Try Demo  [ Get Tapix from $99](https://tapix.dev#pricing)

   ![Tapix](https://tapix.dev/img/tapix-logo-light.svg)

 ProductTapix under the hood: how we built a 4-step import wizard
=========================================================

 tapix.dev/blog

  [    Back to blog ](https://tapix.dev/blog) [ Product ](https://tapix.dev/blog/category/product)

Tapix under the hood: how we built a 4-step import wizard
=========================================================

 Manch Minasyan ·  June 9, 2026  · 11 min read

 Most import tools treat the process as a single operation: upload file, wait, see results. Tapix treats it as a pipeline with five distinct stages, each persisted independently to the database. This post walks through the architecture -- the data model, the job design, the value objects -- and explains the reasoning behind each decision.

[\#](#the-five-stage-pipeline "Permalink")The five-stage pipeline
-----------------------------------------------------------------

Every import flows through five stages: Upload, Map, Validate, Review, Execute. Each stage transitions the `Import` model to a new status and persists all state to the database before moving forward.

```
enum ImportStatus: string
{
    case Uploading = 'uploading';
    case Mapping = 'mapping';
    case Reviewing = 'reviewing';
    case Previewing = 'previewing';
    case Importing = 'importing';
    case Completed = 'completed';
    case Failed = 'failed';
}

```

**Upload** parses the file, extracts headers, and stores every row as an individual `ImportRow` record with the raw CSV data in a JSON column. **Map** presents the user with a column-matching UI -- Tapix auto-detects common headers like "fname", "given name", and "first" as aliases for `first_name`, then lets the user correct anything it got wrong. **Validate** dispatches a batch of jobs, one per mapped column, to validate every distinct value -- [Handling CSV validation errors before they hit your database](/blog/handling-csv-validation-errors) covers the user-facing side of what those jobs produce. **Review** shows users which rows will create new records, which will update existing ones, and which will be skipped, based on match resolution. **Execute** processes rows in chunked batches through a queue, creating or updating the target model.

The critical design choice: each stage writes its results back to the database, not to session storage or in-memory state. The wizard reads from the database on mount, not from session.

[\#](#why-database-persisted-state "Permalink")Why database-persisted state
---------------------------------------------------------------------------

Session-based wizards break in predictable ways. The user refreshes the page and loses their column mappings. The queue worker restarts and the job has no way to recover partial progress. The user opens the import in a second tab and state conflicts silently corrupt the data.

Tapix avoids all of this by treating the `Import` model as the single source of truth. The Livewire wizard component restores its entire state from the database on every mount:

```
final class ImportWizard extends Component implements HasActions, HasForms
{
    #[Url(as: 'import')]
    public ?string $storeId = null;

    public function mount(string $importerClass, ?string $returnUrl = null): void
    {
        $this->importerClass = $importerClass;
        $this->returnUrl = $this->sanitizeReturnUrl($returnUrl);
        $this->restoreFromStore();
    }

    private function restoreFromStore(): void
    {
        $import = $this->findCurrentImport();

        if (! $import instanceof Import) {
            $this->storeId = null;
            return;
        }

        $this->rowCount = $import->total_rows;
        $this->columnCount = count($import->headers ?? []);
        $this->currentStep = $this->stepFromStatus($import->status);
        $this->importStarted = in_array($import->status, [
            ImportStatus::Importing,
            ImportStatus::Completed,
            ImportStatus::Failed,
        ], true);
    }
}

```

The import ID is stored in the URL as a query parameter (`?import=01KCCFM...`). This means users can bookmark an in-progress import, share the link with a colleague, or close the browser and come back later. The wizard picks up exactly where they left off because every piece of state -- column mappings, validation errors, corrections, skip decisions -- lives in the database.

This also makes the queue side reliable. When `ExecuteImportJob` picks up a job after a worker restart, it queries `ImportRow` for rows where `processed = false` and resumes from where it stopped. No state is lost. No rows are double-processed.

[\#](#the-data-model "Permalink")The data model
-----------------------------------------------

Two models carry all the import state: `Import` and `ImportRow`.

`Import` represents the overall session. It stores the importer class, the user who initiated it, the CSV headers, and the column mappings. It tracks aggregate counts -- total rows, created, updated, skipped, failed -- and transitions through status values as the pipeline progresses. It uses ULID primary keys and is scoped by tenant.

`ImportRow` represents a single row from the CSV. Each row stores several JSON columns:

- `raw_data` -- the original values from the CSV, keyed by header name
- `validation` -- error messages keyed by column, written during the Validate stage
- `corrections` -- user-edited values that override the raw data
- `skipped` -- columns the user explicitly chose to skip
- `relationships` -- resolved relationship matches for entity link columns

This JSON-column design means the entire lifecycle of a row -- from raw import through validation, user correction, and final execution -- is captured in a single record. The `getFinalValue` method on `ImportRow` resolves the precedence:

```
public function getFinalValue(string $column): mixed
{
    if ($this->isValueSkipped($column)) {
        return null;
    }

    if ($this->hasValidationError($column)) {
        return null;
    }

    if ($this->corrections?->has($column)) {
        return $this->corrections->get($column);
    }

    return $this->raw_data->get($column);
}

```

Skipped values return null. Unresolved validation errors return null. Corrections override raw data. Otherwise, the original CSV value is used. This priority chain runs at execution time, so user edits made during the Review step automatically take effect without any data migration.

Both models use configurable table name prefixes (`config('tapix.table_prefix')`) so they never collide with application tables. The `ImportRow` model disables timestamps entirely -- with potentially hundreds of thousands of rows per import, the write overhead is not worth it.

Querying JSON columns across database drivers is notoriously inconsistent. SQLite, MySQL, and PostgreSQL all use different syntax for extracting and testing JSON values. Tapix handles this through a `JsonQuery` helper that generates the correct SQL for whichever driver is active. Every scope on `ImportRow` -- `withErrors`, `withCorrections`, `toCreate`, `toUpdate`, `forFilter` -- uses `JsonQuery` internally. This means the same importer code works identically whether your local dev environment runs SQLite and production runs PostgreSQL.

[\#](#job-architecture "Permalink")Job architecture
---------------------------------------------------

The job layer is where most import tools cut corners. They run validation synchronously during the request, or they process all rows in a single giant job, or they skip match resolution entirely and just create everything. Tapix uses three specialized jobs, each designed for a specific stage of the pipeline.

### [\#](#validatecolumnjob-batched-per-column "Permalink")ValidateColumnJob: batched per column

Validation is dispatched as a batch of jobs, one per mapped column. This is the key insight: columns are independent. Validating the `email` column does not depend on validating the `first_name` column. By splitting validation into per-column jobs, Tapix runs them in parallel across queue workers.

Each `ValidateColumnJob` receives a `ColumnData` value object that describes the mapping. It fetches only the distinct uncorrected values for that column (not every row), validates each unique value once, then writes errors back to every `ImportRow` that contains that value. For a 50,000-row CSV with 200 unique email addresses, the job validates 200 values, not 50,000.

The job also handles entity link validation -- when a column maps to a relationship rather than a direct field. It batch-resolves all unique values against the target model in a single query and writes back which values matched, which need to be created, and which could not be found.

### [\#](#resolvematchesjob-create-update-or-skip "Permalink")ResolveMatchesJob: create, update, or skip

After validation, `ResolveMatchesJob` determines what will happen to each row during execution. It resolves matchable fields -- email, phone, domain, record ID -- against existing records in the target table and writes a `match_action` to each `ImportRow`: Create, Update, or Skip.

```
enum RowMatchAction: string
{
    case Create = 'create';
    case Update = 'update';
    case Skip = 'skip';
}

```

This runs as a separate job because match resolution can involve expensive database queries against large tables with tenant scoping. By running it before the Review step, Tapix can show users an exact breakdown -- "412 new contacts, 88 updates, 3 skipped" -- before they commit to running the import.

### [\#](#executeimportjob-chunked-unique-batchable "Permalink")ExecuteImportJob: chunked, unique, batchable

The execution job implements `ShouldBeUnique` keyed by import ID, preventing duplicate dispatches. It processes rows using `chunkById` with a configurable chunk size (default 500), persisting progress after every chunk:

```
ImportRow::where('import_id', $this->importId)
    ->where('processed', false)
    ->orderBy('row_number')
    ->chunkById($chunkSize, function (Collection $rows) use (...): void {
        $existingRecords = $this->preloadExistingRecords($rows, $importer, $context);

        foreach ($rows as $row) {
            $this->processRow($row, ...);
            $this->flushProcessedRows();
        }

        $this->flushFailedRows($import, $context);
        $this->persistResults($import, $results);

        ImportRowProcessed::dispatch($import, $processedCount, $import->total_rows);
    });

```

Several things happen per chunk. First, existing records for Update rows are preloaded in a single query to avoid N+1. Each row is processed inside a database transaction. After the chunk, failed rows are bulk-inserted into the `FailedImportRow` table, aggregate counts are persisted to the `Import` model, and an `ImportRowProcessed` event fires with progress data for the real-time UI. For the user-facing treatment of failed rows -- inline correction, skip tracking, and the full error lifecycle -- see [CSV import error recovery: from silent failures to user-friendly correction](/blog/csv-import-error-recovery).

The job also maintains an in-memory deduplication cache for intra-import matching. If row 10 creates "Acme Corp" and row 500 also references "Acme Corp", the second row is automatically promoted from Create to Update, preventing duplicates within the same import. [Intra-import deduplication](/blog/intra-import-deduplication) explains the full implementation.

All three jobs use the `TenantAware` trait, which preserves tenant context across queue boundaries -- [Multi-tenant CSV imports in Laravel](/blog/multi-tenant-csv-imports-laravel) covers this in full. Timeout, retry count, and backoff intervals come from `config/tapix.php`, not hardcoded values.

[\#](#immutable-value-objects "Permalink")Immutable value objects
-----------------------------------------------------------------

Three value objects carry data between the pipeline stages: `ImportField`, `ColumnData`, and `EntityLink`. All three are immutable -- builder methods return new instances via `cloneWith`, never mutating the original.

`ImportField` defines a single importable field with its key, label, type, validation rules, and optional relationship configuration. The static `make` factory provides a fluent builder:

```
ImportField::make('email')
    ->label('Email Address')
    ->type(FieldType::Email)
    ->required()
    ->rules(['email']) // FieldType::Email already applies the email rule; ->rules() here adds extra rules on top
    ->guess(['email', 'e-mail', 'email_address', 'mail'])
    ->example('jane@example.com');

```

Every builder method calls `cloneWith` internally, which constructs a new `ImportField` with the overridden property and copies everything else. This prevents accidental mutation when the same field definition is used across multiple parts of the pipeline.

`ColumnData` represents the mapping between a CSV column and a target field or entity link. It has two factory methods -- `toField` for direct field mappings and `toEntityLink` for relationship mappings -- and carries optional format hints for date and number parsing. The `castValue` method handles type conversion at execution time, routing numeric values through the `NumberFormat` parser to strip currency symbols and normalize decimal separators.

`EntityLink` describes a relationship target: the model class, how matches are resolved, how the link is persisted (foreign key, morph-to-many, or custom field value), and what auto-mapping aliases to use for column header detection. It also follows the `cloneWith` pattern, with named constructors like `belongsTo` and `morphToMany` that set sensible defaults for each relationship type.

The immutability guarantee matters because these objects flow through jobs, Livewire components, and event listeners. Any of those could hold a reference to the same instance. Mutation in one place would silently corrupt data in another. The `cloneWith` pattern eliminates that entire category of bugs.

[\#](#event-system "Permalink")Event system
-------------------------------------------

Tapix dispatches four events during the execution lifecycle:

- `ImportStarted` -- fired once at the beginning of `ExecuteImportJob::handle()`, before any rows are processed
- `ImportRowProcessed` -- fired after each chunk with the current progress count and total
- `ImportCompleted` -- fired after all rows are processed and the status transitions to Completed
- `ImportFailed` -- fired when the job catches an exception or exhausts its retry attempts

Each event carries the `Import` model instance. `ImportRowProcessed` adds `processedCount` and `totalCount` integers for progress calculation.

These events are the extension points for everything outside the core pipeline. Send a Slack notification when an import finishes. Update a real-time progress bar via broadcasting. Log import metrics to an analytics service. Trigger a follow-up job that enriches newly created records. None of that requires modifying the import pipeline itself -- just listen to the events.

The importer class also exposes lifecycle hooks (`beforeImport`, `afterImport`, `beforeSave`, `afterSave`, `prepareForSave`) for logic that needs to run inside the pipeline. Events are for external consumers; hooks are for importer-specific behavior.

[\#](#what-this-architecture-gives-you "Permalink")What this architecture gives you
-----------------------------------------------------------------------------------

The result of these decisions is an import system that handles the cases other tools ignore. A 100,000-row file processes in the background without blocking the web server. A user who closes their laptop mid-import picks up where they left off. A queue worker that crashes and restarts skips already-processed rows automatically. Validation scales linearly with unique values, not total rows. Match resolution tells users exactly what will happen before anything is written.

These are not features we added after launch. They are consequences of the architecture. Database-persisted state means crash recovery is free. Per-column validation jobs mean parallelism is free. Chunked execution with progress events means real-time feedback is free. The hard problems were solved in the foundation, not bolted on afterward.

If you want to understand why we built Tapix in the first place, read [Why we're building Tapix](/blog/why-we-are-building-tapix). For a deeper look at queue performance, see [Queue-powered imports: handling 100k rows without breaking a sweat](/blog/queue-powered-imports-100k-rows). For the hook system that lets you customize behavior at every stage, that is covered in a future post on lifecycle hooks.

Ready to stop rebuilding CSV imports from scratch? [See the pricing and get started](/#pricing).

 ### Enjoyed this post?

Get notified when we publish new articles about Laravel imports and data handling.

  Email address   Subscribe

Almost there — confirm your subscription via email.

 Related posts
-------------

 [  Product   May 8, 2026

 Laravel Excel vs Tapix: choosing the right import tool
--------------------------------------------------------

Laravel Excel and Tapix solve different problems. Here's when to use each -- and why they work well together.

 ](https://tapix.dev/blog/laravel-excel-vs-tapix) [  Product   Apr 28, 2026

 Filament Import Action: when it's enough and when you need more
-----------------------------------------------------------------

Filament's built-in Import Action handles simple CSV imports well. Here's where it falls short -- and when you need a dedicated import wizard.

 ](https://tapix.dev/blog/filament-import-action-when-enough) [  Product   Apr 17, 2026

 Why we're building Tapix
--------------------------

The story behind Tapix -- born from building import features for Relaticle CRM, now a standalone package for any Laravel app.

 ](https://tapix.dev/blog/why-we-are-building-tapix)

   [ ![Tapix](/img/tapix-logo-light.svg) ![Tapix](/img/tapix-logo-dark.svg) ](https://tapix.dev)CSV and Excel import wizard for Laravel.

  Product [Pricing](https://tapix.dev#pricing) [Docs](https://docs.tapix.dev) [Blog](https://tapix.dev/blog) [Contact](mailto:hello@tapix.dev)

 Compare [vs Laravel Excel](https://tapix.dev/vs/laravel-excel) [vs Filament Import](https://tapix.dev/vs/filament-import)

 Legal [Privacy](https://tapix.dev/privacy-policy) [Terms](https://tapix.dev/terms-of-service)

© 2026 Tapix. All rights reserved.
