Displaying Large Datasets Efficiently in WPF — When DataGrid Stops Scaling

· unvell team
Displaying Large Datasets Efficiently in WPF — When DataGrid Stops Scaling

The built-in System.Windows.Controls.DataGrid is virtualized — only the rows currently on screen are realized into visuals. People hear “virtualized” and assume the control will scale to whatever you throw at it. It doesn’t. There’s a row count past which the grid is fine but everything around it — the ItemsSource materialization, sort, filter, style triggers, export — falls over.

This article is about that gap. We’ll walk through where WPF DataGrid actually stops scaling, what you can recover by tuning, and the lazy-data-source pattern that lets you keep a million-row workload usable.


What “large” means for a WPF DataGrid

Up to about 10,000 rows, the stock DataGrid with virtualization on is usually fine. The user scrolls, rows materialize, scrolling is smooth. You will not feel a problem.

Between 10,000 and 100,000 rows, virtualization still works while scrolling, but the things around scrolling start to hurt: initial bind, sort, filter, AutoGenerateColumns measuring, Style.Triggers on every cell.

Past 100,000 rows the model itself is the problem. Even if every visible row is virtualized, you’ve still materialized 100,000 view-models in memory, run them all through the sort comparer, and held them in a CollectionView. Memory and GC start to dominate.

This article assumes you’ve already established you need to display this many rows — telemetry browsers, transaction blotters, log viewers, lab data. If you only need to paginate a long list, that’s a different problem and DataGrid plus paging works fine.


Where DataGrid stalls in practice

SymptomWhat’s actually happening
Initial load freezes the UI for several secondsEvery row in ItemsSource is constructed and held in memory before the first paint
Scrolling is jerkyScrollUnit="Item" measures every row, or row containers re-template on each scroll tick
Sort takes 5+ seconds on a column clickICollectionView.SortDescriptions walks every row in the source, not just the visible ones
Filter dropdowns lagEach opening of the dropdown re-enumerates distinct values across all rows
AutoGenerateColumns="True" is slow on first paintThe control measures column widths against a sample of rows that grows with row count
Excel export OOMsEach row gets a fresh XLCell / ExcelCell instance; 500k cells × object overhead is a lot

The fixes for the symptoms in the top half of that table are tuning. The fixes for the bottom half — especially export and any “show this big dataset with formulas / merges / multiple sheets” requirement — are architectural, and we’ll get to them.


Option 1: tune the WPF DataGrid

Before adding new dependencies, try the levers the control already gives you. The good news is most defaults are already correct in modern .NET; the failures tend to come from XAML that disabled a default by accident.

Confirm virtualization is on. The defaults are EnableRowVirtualization="True" and EnableColumnVirtualization="True". If you’ve set ScrollViewer.CanContentScroll="False" anywhere up the visual tree — or wrapped the grid in another ScrollViewer — virtualization silently turns off. Check the inspector.

Use item-level scrolling, not pixel-level, for heavy templates.

<DataGrid VirtualizingPanel.ScrollUnit="Item"
          VirtualizingPanel.VirtualizationMode="Recycling"
          EnableRowVirtualization="True"
          EnableColumnVirtualization="True" />

Recycling reuses container instances instead of creating new ones on every scroll — a big win when row templates have triggers, converters, or attached behaviors. Trade-off: smooth pixel-precise scrolling becomes step-by-row. For data grids users skim, this is the right trade.

Don’t auto-size every column. DataGridLength.Auto re-measures across all loaded rows on every change. Pin the widths you know, and use * (star) only on the column the user actually resizes.

Avoid AutoGenerateColumns="True" for production grids. Define columns explicitly. Auto-generation walks the source to infer types and runs reflection per row on first paint.

Load asynchronously. Don’t assign ItemsSource from a synchronous DB call on the UI thread:

// Inside an async event handler:
var rows = await Task.Run(() => _repo.LoadRows());
grid.ItemsSource = rows;          // assignment itself is fast; the cost was the load

For larger sets, switch to incremental loading — push rows in via ObservableCollection<T> in batches, or use a CollectionView that defers refresh:

using (collectionView.DeferRefresh())
{
    foreach (var batch in batches)
        foreach (var row in batch)
            source.Add(row);
}

These are the cheap fixes. They take a 100k-row screen from “freezes for 6 seconds” to “loads instantly and scrolls smoothly.” They do not solve the architectural problems: sort still touches every row, filter still enumerates the whole source, and the moment a user asks for formulas, merged headers, or .xlsx round-tripping, you’re outside what a DataGrid is built for. See Choosing the Right Editable Grid for WinForms and WPF Apps for that decision.


Option 2: a lazy data source

Once you’re past a few hundred thousand rows, the right move is to stop loading everything into the grid’s memory in the first place. The control should ask the source for the rows it needs to draw, and only those.

ReoGrid 4.4 added a built-in LazyLoadDataSource that makes this almost a one-liner. You hand it an object[,] array, attach it in lazy mode, and the worksheet only materializes Cell instances for the rows that are actually rendered or referenced. The array is held by reference — it isn’t copied — so the grid-side memory is roughly “the visible region” rather than “every row times every column.”

using unvell.ReoGrid.Data;

// 1,000,000 rows × 10 columns, already in memory as a plain object[,].
var data = new object[1_000_000, 10];
FillData(data);   // your own population logic

worksheet.SetRows(data.GetLength(0));

// One line. Range is inferred from the source's RecordCount / ColumnCount.
worksheet.AddDataSource(
    new LazyLoadDataSource(data),
    DataSourceLoadMode.LazyLoading);

With one million rows attached, only the ~30 rows currently on screen are ever turned into Cell instances. As the user scrolls, more rows are materialized on demand. Formulas that reference a not-yet-visible cell pull it through the same path, so cross-row references still work.

If you want to point the source at a sub-range instead of the whole sheet, the explicit form is still there:

worksheet.AddDataSource(
    new RangePosition(0, 0, rows, cols),
    new LazyLoadDataSource(data),
    DataSourceLoadMode.LazyLoading);

A few practical notes:

  • The object[,] is held by reference. Mutating it after attach will be reflected the next time the affected cells are loaded. Convenient for streaming new values into existing rows; surprising if you weren’t expecting it.
  • To append rows, swap the source. Grow the object[,], call RemoveAllDataSources(), then AddDataSource(new LazyLoadDataSource(newData), …) and SetRows(newRows). The demo in the source tree (DemoJP/Performance/LazyLoadDemo.cs) follows exactly this pattern.
  • Memory profile. The raw object[,] is still in memory — what you avoid is the grid-side materialization (per-row records, styles, borders, formula state). If even the raw array is too large, drop down to a custom IDataSource<T> whose GetRecord reads from disk or a database — same DataSourceLoadMode.LazyLoading, just a different backing store.

Full walkthrough with formatting and borders: Speeding Up Large Data Loading with Lazy Load Mode.


Option 3: bulk write when you already have the data

If the entire dataset is in memory and you just want to put it on the screen as fast as possible, lazy loading isn’t the right tool — bulk writing is. The cost of sheet["A1"] = x 200,000 times is mostly per-cell event/setup overhead, not the value assignment itself.

SetRangeData takes the whole 2D block at once and writes it in one pass:

var data = new object[rows, cols];
for (int r = 0; r < rows; r++)
    for (int c = 0; c < cols; c++)
        data[r, c] = source[r, c];

sheet.SetRangeData("A1", data);

In ReoGrid 4.4 this is roughly 3× faster than per-cell assignment on a 200,000-cell workload, and it composes cleanly with conditional formatting — which 4.4 also made dramatically faster on bulk writes (see the 4.4 release notes).

Rule of thumb:

  • Streaming or open-ended source → lazy data source.
  • Whole dataset in hand at once → SetRangeData.
  • Both true → load with SetRangeData, leave the grid in normal mode.

What you get for free that DataGrid won’t

Once you’re already in a spreadsheet control because of row count, you incidentally pick up things WPF DataGrid never had:

  • Frozen rows/columns that survive horizontal scrolling
  • Cell merging in headers and body
  • Multiple sheets in the same control instance
  • Conditional formatting that doesn’t re-run on every scroll tick (4.4 made bulk-write + conditional formatting ~11,700× faster)
  • Native .xlsx export of the visible data, formulas and formats intact
  • Formulas between cells, so users can add a =SUM(...) row without you writing code

You don’t have to use these. They’re there if the requirement shows up next quarter.


Exporting the dataset to .xlsx

The “show 500k rows” requirement is almost always followed by “and let me download it as an Excel file.” This is where DataGrid-based stacks fall hardest — the export path usually re-enumerates Items and constructs a fresh cell object per value, which is what eats the memory.

If the data is already in a Worksheet, the export is a single call:

worksheet.Workbook.Save("export.xlsx");

No second pass. No copy into a different library’s object model. The same in-memory representation that’s drawing the grid is what gets written.

For DataTable-shaped sources where the grid is incidental, the comparison of the standalone libraries is in Export a DataTable to Excel (.xlsx) in C# Without Office Interop.


A short decision tree

  • Fewer than 10k rows, list-shaped, no formulas? → stock WPF DataGrid, defaults are fine.
  • 10k–100k rows, still list-shaped? → WPF DataGrid with the tuning above (recycling, explicit columns, async load).
  • 100k+ rows, or data arrives incrementally? → lazy data source (IDataSource + DataSourceLoadMode.LazyLoading).
  • Whole dataset in memory, want it on screen instantly?SetRangeData bulk write.
  • Any of: formulas, merged headers, multiple sheets, .xlsx round-trip? → you’ve outgrown DataGrid. See Choosing the Right Editable Grid.

The expensive mistake here is the same one as for editable grids in general: keeping the built-in control past the point where it makes sense, and patching the symptoms one at a time. The first three tuning fixes are worth doing. If you’re still hitting walls after them, the problem is the model, not the knobs.


Further reading

Related articles

Try ReoGrid in your own project

The Excel-compatible spreadsheet component for .NET WinForms and WPF. 30-day free trial — no credit card required.