Back to blog

Queue-powered imports: processing 100K rows in Laravel

Manch Minasyan · · 12 min read

If you have hit Allowed memory size of 134217728 bytes exhausted or Maximum execution time of 30 seconds exceeded while importing a CSV file in Laravel, you have reached the ceiling of synchronous processing. These are not edge cases -- they are the inevitable result of trying to process tens of thousands of rows inside a single HTTP request.

A CSV import that works for 500 rows and a CSV import that works for 100,000 rows are not the same system. The difference is not optimization -- it is a fundamentally different architecture where processing happens asynchronously, state persists between steps, and failures are recoverable. For the context on why these problems matter in practice, see Why we're building Tapix.

This post covers the queue patterns that make large-scale imports reliable in Laravel. The code examples reflect patterns Tapix uses internally, so you can apply them to any import system.

#Why direct processing fails at scale

A synchronous CSV import runs inside a single HTTP request. The controller reads the file, loops through rows, and inserts them into the database. For 500 rows, this takes a few seconds and works fine. For 100,000 rows, three things conspire to kill it.

Memory exhaustion. Every Eloquent model created inside a loop stays in memory for the duration of the request. At 100,000 rows, your PHP worker accumulates tens of thousands of model instances. A process that starts at 32MB of memory quietly climbs past 256MB and the runtime kills it with PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 262144 bytes).

Timeout. PHP-FPM defaults to 30 seconds. Nginx adds its own at 60. A 100K-row import might take 5 to 15 minutes depending on validation complexity and relationship resolution. The result: Fatal error: Maximum execution time of 30 seconds exceeded. No web server timeout is designed for that.

Connection drops. The browser holds an HTTP connection open for the entire duration. Lose Wi-Fi for three seconds or let the laptop sleep -- the connection drops, the request dies, and partial data is left in the database with no record of where it stopped.

The solution: move processing off the web request and onto a queue worker.

#Laravel queue fundamentals for imports

Laravel's queue system dispatches jobs to a worker process that runs independently of HTTP requests. The basic dispatch pattern looks like this:

use App\Jobs\ProcessImportJob;

// In your controller or Livewire component
ProcessImportJob::dispatch($import->id);

return redirect()->route('imports.show', $import)
    ->with('status', 'Import started. You can close this tab.');

The user gets an immediate response. The job runs in the background. If the worker crashes, the job gets retried. If the server reboots, the job is still in the queue.

#Driver setup

For production imports, use a persistent queue driver. The database driver is the simplest and works well for most applications. Redis is faster for high-throughput scenarios, but the database driver has an advantage: jobs survive Redis restarts and you can inspect them with SQL queries when debugging.

php artisan queue:work --queue=imports --timeout=600 --tries=3

The --timeout flag sets the maximum runtime per job -- it needs to exceed the time for one chunk, not the entire file. The --tries flag controls retry attempts before a job moves to the failed jobs table.

#Dedicated queue

Run imports on a dedicated queue so a 10-minute import does not block password reset emails. Configure the queue name in your import job:

public function __construct(
    private readonly string $importId,
) {
    $this->onQueue(config('tapix.queue', 'imports'));
}

Then run a separate worker for that queue. In production, Supervisor or Laravel Horizon manages this. Set numprocs=2 in your Supervisor config if you need two imports to run concurrently.

#Chunked batch processing

The key to processing 100K rows without exhausting memory is chunking: load a small batch of rows from the database, process them, flush the results, and move to the next batch. The database cursor resets between chunks, so memory stays flat regardless of total row count.

#Why chunkById over chunk

Laravel provides two chunking methods: chunk() and chunkById(). For import processing, always use chunkById().

chunk() uses LIMIT/OFFSET pagination under the hood. When you process chunk 1 (rows 1-500) and mark them as processed, chunk 2 still starts at OFFSET 500 -- but the result set has shifted because rows changed. You silently skip rows or process them twice.

chunkById() uses WHERE id > last_id pagination. It always picks up exactly where the last chunk ended, regardless of what changed in the table. This is correct for any operation where rows are modified during processing.

Here is the core pattern from Tapix's ExecuteImportJob:

$chunkSize = (int) config('tapix.chunk_size', 500);
$processedCount = 0;

ImportRow::where('import_id', $this->importId)
    ->where('processed', false)
    ->orderBy('row_number')
    ->chunkById($chunkSize, function (Collection $rows) use (
        $importer, $fieldMappings, &$results, $import, &$processedCount
    ): void {
        $existingRecords = $this->preloadExistingRecords($rows, $importer, $context);

        foreach ($rows as $row) {
            $this->processRow(
                $row, $importer, $fieldMappings, $allowedKeys,
                $matchField, $matchSourceColumn, $context,
                $tenantColumn, $results, $existingRecords
            );
            $this->flushProcessedRows();
            $processedCount++;
        }

        $this->flushFailedRows($import, $context);
        $this->persistResults($import, $results);

        ImportRowProcessed::dispatch($import, $processedCount, $import->total_rows);
    });

Several things happen in this pattern that are worth calling out.

Preloading existing records. For update operations, the job batch-loads all target records for the chunk in a single query (whereIn on the matched IDs). This avoids N+1 queries -- 500 rows that need updates run 1 SELECT instead of 500.

Flushing after each chunk. Processed row markers, failed row records, and result counters are written to the database at the end of every chunk, not at the end of the job. If the worker crashes mid-import, you lose at most one chunk of work. The next retry picks up from where the last successful flush ended because the query filters on where('processed', false).

Chunk size of 500. This is a balance between memory pressure and database round trips. At 500 rows per chunk, a 100K-row import makes 200 database queries for the chunk boundaries plus the per-row insert/update queries. Too small (50) and the overhead of chunk boundary queries dominates. Too large (5,000) and you risk memory issues if rows have large raw_data JSON payloads. The 500 default works well across a wide range of file sizes and row widths.

#Database transactions per row

Each row is processed inside a DB::transaction() call. This means a failure on row 347 rolls back only row 347 -- it does not affect the 346 rows that already committed. The failed row gets recorded in the FailedImportRow table with the error message and original data, and processing continues.

This is a deliberate tradeoff. Wrapping the entire chunk in a single transaction would be faster (one COMMIT per 500 rows instead of 500), but a single bad row would roll back the whole chunk. For user-facing imports, row-level atomicity is almost always the right choice. Users expect partial success -- "9,847 imported, 153 failed" is a useful result. "0 imported because row 347 had a bad email" is not.

#Preventing duplicate dispatches

Import jobs should be idempotent, but they should also not run concurrently. Two workers processing the same import simultaneously create duplicate records, double-count results, and produce a corrupted final state.

Laravel's ShouldBeUnique interface solves this. When a job implements ShouldBeUnique, the queue system acquires a lock before processing. If another instance of the same job is already running, the duplicate is silently dropped.

use Illuminate\Contracts\Queue\ShouldBeUnique;
use Illuminate\Contracts\Queue\ShouldQueue;

final class ExecuteImportJob implements ShouldBeUnique, ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function uniqueId(): string
    {
        return $this->importId;
    }

    // ...
}

The uniqueId() method returns the import's ID, so uniqueness is scoped per import. Two different imports can run concurrently. The same import cannot.

This matters because imports can be dispatched from multiple entry points: a user clicking "Run import" in the UI, a webhook triggering a re-run, a retry from the failed jobs table, or a Supervisor restart that re-queues the job. Without ShouldBeUnique, any of these can create a race condition.

The lock releases when the job finishes. If a job gets stuck (worker dies without releasing), the cache-based lock has a TTL that eventually expires. Set it to exceed your job timeout:

public int $uniqueFor = 900; // 15 minutes, exceeds the 300s job timeout

#Live progress tracking

Users need to know their import is making progress, especially for large files that take several minutes. The pattern is straightforward: dispatch an event after each chunk with the current count and total.

Tapix fires ImportRowProcessed at the end of every chunk:

use Tapix\Core\Events\ImportRowProcessed;

// Inside the chunkById callback:
ImportRowProcessed::dispatch($import, $processedCount, $import->total_rows);

The event carries three values: the import model (for identification), the number of rows processed so far, and the total row count. The frontend can listen to this via Laravel's broadcasting system and update a progress bar in real time.

A simple broadcast listener on the frontend:

Echo.private(`imports.${importId}`)
    .listen('ImportRowProcessed', (event) => {
        const percent = Math.round((event.processedCount / event.totalCount) * 100);
        progressBar.style.width = `${percent}%`;
        statusText.textContent = `${event.processedCount} of ${event.totalCount} rows`;
    });

For a 100K-row import with a chunk size of 500, the user sees 200 progress updates -- roughly one every 2-3 seconds. Frequent enough to feel responsive, infrequent enough to not flood the websocket connection.

When processing finishes, ImportCompleted fires and triggers a database notification. The user can close the tab, work on something else, and come back when the notification appears.

#Status transitions

An import moves through a state machine: Pending, Validating, Reviewing, Processing, Completed, or Failed. Each transition fires an event, so you can hook into any stage for logging, analytics, or external webhooks.

#Failed row handling

No import of 100K rows will have zero failures. Some rows will have constraint violations, unexpected data types, or relationship references that cannot be resolved. The system needs to handle these gracefully -- log the failure, continue processing, and give the user a clear report at the end.

#The FailedImportRow model

When a row throws an exception during processing, the job catches it, increments the failure counter, and stores the row data:

try {
    DB::transaction(function () use ($row, $importer, ...) {
        // process row
    });
    $this->markProcessed($row);
} catch (\Throwable $e) {
    $results['failed']++;
    $this->recordFailedRow($row->row_number, $row->raw_data->all(), $e);
    report($e);
}

Failed rows are buffered in memory and flushed to the FailedImportRow table at the end of each chunk. This is a write optimization -- inserting 500 individual rows is slower than one bulk insert of however many failed in that chunk.

The FailedImportRow model stores the original row data as JSON and the error message as a string -- everything needed for debugging. The admin panel can display these in a filterable table, and users can export them for correction and re-import.

#Auto-pruning

Failed row records are not kept forever. The FailedImportRow model uses Laravel's MassPrunable trait with a configurable retention period:

use Illuminate\Database\Eloquent\MassPrunable;

class FailedImportRow extends Model
{
    use MassPrunable;

    public function prunable(): Builder
    {
        $days = config('tapix.cleanup_after_days', 30);

        return self::query()->where('created_at', '<=', now()->subDays($days));
    }
}

Schedule php artisan model:prune --model=FailedImportRow daily and old records clean themselves up. The default is 30 days -- long enough for users to investigate failures, short enough that the table does not grow unbounded.

#Retry configuration

The queue job itself has retry behavior for infrastructure-level failures (database timeouts, connection resets, worker crashes). This is different from row-level error handling. A row that fails because of bad data will fail on every retry. A job that fails because the database connection dropped will succeed on the next attempt.

The default configuration gives three attempts with exponential backoff:

public int $timeout = 300;   // 5 minutes per attempt
public int $tries = 3;       // 3 total attempts
public array $backoff = [10, 30]; // 10s after first failure, 30s after second

The backoff intervals are short because import failures are typically transient (network blips, brief database locks). If a job fails all three attempts, it moves to the failed_jobs table where you can inspect it with php artisan queue:failed and retry manually with php artisan queue:retry.

One critical detail: your queue connection's retry_after setting must exceed the job timeout. If retry_after is 90 seconds and the job takes 300 seconds to process a chunk, the queue system thinks the job is stuck and dispatches a duplicate. Set retry_after to at least job_timeout + 30:

// config/queue.php
'database' => [
    'driver' => 'database',
    'retry_after' => 330, // job_timeout (300) + 30 second buffer
],

#Putting it together

The full job architecture for a 100K-row import looks like this:

  1. The user uploads a file and maps columns. Rows are parsed and stored as ImportRow records with their raw data in a JSON column.
  2. Validation runs as a batch of parallel ValidateColumnJob instances -- one per mapped column. Each job validates all unique values for its column and writes errors back to the row records. For the user-facing side of this pipeline, see Handling CSV validation errors before they hit your database.
  3. The user reviews and corrects errors in the UI. Corrections are stored as a separate JSON column on the row, preserving the original data.
  4. ResolveMatchesJob determines which rows are creates and which are updates, based on matchable fields. See Importing relational data from CSV files in Laravel for a full treatment of the match resolution system.
  5. ExecuteImportJob processes all rows in chunks of 500. Each chunk preloads existing records, processes rows in individual transactions, flushes results, and fires progress events.
  6. On completion, ImportCompleted fires and the user gets a notification with the final counts.

Every stage persists state to the database. The user can close the browser at step 2, come back later, and resume at step 3. The queue worker can crash during step 5, restart, and pick up from the last flushed chunk.

#Configuration reference

All queue-related settings live in one config file:

// config/tapix.php
return [
    'chunk_size'  => 500,       // rows per chunk
    'queue'       => 'imports', // dedicated queue name
    'job_timeout' => 300,       // seconds per job attempt
    'job_tries'   => 3,         // total retry attempts
    'job_backoff' => [10, 30],  // seconds between retries

    'cleanup_after_days' => 30, // auto-prune failed rows
];

Override any of these per environment. A local dev machine might use a chunk size of 50 and a timeout of 60. A production server with 8GB of RAM and a fast database might push the chunk size to 1,000.

#What comes next

This post covered the queue architecture for processing large imports. For the prerequisite steps that happen before the queue job runs -- file parsing, column mapping, and validation -- see The complete guide to CSV imports in Laravel.

For handling the multi-tenancy dimension of queued imports (preserving tenant context across job boundaries, scoping queries inside workers), that will be covered in a future post on multi-tenant CSV imports.

If you are building an import system from scratch, consider whether the engineering time is worth it. These patterns took months to get right across SQLite, MySQL, and PostgreSQL, with proper multi-tenancy, relationship resolution, and error recovery. Tapix ships all of it as a drop-in package for Laravel.

Enjoyed this post?

Get notified when we publish new articles about Laravel imports and data handling.

Almost there — confirm your subscription via email.

Related posts