Kotlin Sequences vs Collection Operators: Impact Of Data Distribution

I recently read Chris Banes’ post “Should you use Kotlin Sequences for Performance?” and found his findings intriguing, especially when compared with Max Sidorov’s results in “Measuring sequences” and “Kotlin under the hood: how to get rid of recursion”. The differences in environments, methodologies, pipelines, and data produced varying benchmark results, leading to potential misunderstandings. In this post, I’ll show how data distribution can significantly impact the performance of processing pipelines under different parameters.

Before moving on, there’s a crucial reminder from JMH (Java Microbenchmark Harness):

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell.

What Are We Really Comparing?

When comparing Kotlin Sequences and Collection Operators, it’s important to understand that we’re not merely comparing abstract concepts like pull/push models or lazy/eager evaluation. Instead, we’re examining how these approaches perform in concrete scenarios and pipelines.

Eager evaluation on bare loops is a common pattern, but lazy calculations are not. If you’re not familiar with the differences between these approaches, I recommend checking the official Kotlin documentation for diagrams illustrating how Sequence operators differ in execution, as well as Max Sidorov’s “Sequences: Theory” for additional explanation.

The Pipeline Structure

I liked Chris’s approach of using a simple repository example for benchmarking, so I’ve reused that pattern:

class SimplePipelineRepository(private val db: Db) : DataRepository {
    override fun getItemsListCollectionOperators(): List<UiModel> {
        return db.getItems()
            .filter { it.isEnabled }
            .map { UiModel(it.id) }
    }

    override fun getItemsListSequence(): List<UiModel> {
        return db.getItems()
            .asSequence()
            .filter { it.isEnabled }
            .map { UiModel(it.id) }
            .toList()
    }
}

These pipelines have identical named operations and lambda code. But if the “work” is the same, what exactly are we comparing? The benchmark results for these pipelines will include:

Method calling overhead (hasNext/next on filter and map) during one cycle versus two cycles with “flat” calculations
Memory management differences between the approaches
How the JIT compiler optimizes each pipeline
Hardware performance characteristics

For those interested in benchmarking JVM apps, I highly recommend the JMH Samples as the best resource for learn how to write benchmarks . Some samples demonstrate how to avoid or minimize the impact of JIT optimizations.

But for this post I don’t use them. :)

Setting Up

Benchmark Parameters

To explore how Sequences and Collection Operators behave across different scenarios, I’ve built a matrix of benchmarks varying three key parameters:

Collection size (batch size): The number of items in the input list
Filtering ratio (percent): The percentage of items that will pass the filter logic - 0%, 10%, 25%, 50%, 75%, 90%, and 100% (from 0.0 to 1.0)
Data distribution: How filtered items are arranged in the input list
- Ordered: Most passing items are at the beginning of the list
- Distributed: Passing items are distributed with fixed steps throughout the list
- Shuffled: Items are randomly arranged using Kotlin’s default shuffle algorithm

Environment

For all tests, I used the following environment:

Hardware: MacBook Pro M1 Pro (8-core) with 32GB RAM
OS: macOS Sequoia 15.3.2
JDK: Temurin-21.0.6+7
JMH: version 1.37
Kotlin: 2.0.20

Benchmarking Medium and Large Collections

In this section, I’ll present results for collection sizes of 100, 1,000, 10,000, and 100,000 elements. For the medium and large batch sizes, I used this configuration:

Warm-ups: 10
Measurements: 25
Forks: 2
Mode: Throughput
Duration: 1 second
GC: G1
Fixed Heap size: 2GB
Enabled +AlwaysPreTouch option to ensure memory is fully allocated at startup to avoid performance spikes

If you want to measure time, be careful—especially for very fast computations. The ancient manuscript “Nanotrusting the Nanotime” by Aleksey Shipilёv explains it in detail.

In the tables below, the “Diff %” column shows the percentage difference between Collection Operators (ColOps) and Sequences implementations. Positive values (+%) indicate Sequences are faster, while negative values (-%) indicate Collection Operators are faster.

Ordered data

The table is based on data from [CSV ].

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	18,856,930 (±0.25%)	18,371,690 (±0.77%)	-2.6%
10%	5,664,906 (±0.56%)	10,341,531 (±0.17%)	+82.6%
25%	3,981,816 (±0.26%)	3,992,161 (±5.56%)	+0.3%
50%	2,756,168 (±0.15%)	2,165,411 (±0.36%)	-21.4%
75%	2,091,353 (±1.03%)	2,177,218 (±1.78%)	+4.1%
90%	1,874,621 (±0.25%)	1,959,500 (±0.47%)	+4.5%
100%	1,783,201 (±0.16%)	2,105,502 (±0.26%)	+18.1%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	2,043,831 (±1.01%)	1,985,882 (±0.32%)	-2.8%
10%	597,364 (±0.25%)	793,354 (±0.24%)	+32.8%
25%	411,388 (±2.45%)	439,794 (±3.12%)	+6.9%
50%	259,954 (±0.70%)	228,300 (±0.31%)	-12.2%
75%	193,585 (±0.29%)	164,271 (±0.26%)	-15.1%
90%	165,430 (±1.00%)	140,505 (±0.48%)	-15.1%
100%	154,720 (±0.23%)	151,463 (±0.52%)	-2.1%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	174,631 (±0.63%)	199,958 (±0.56%)	+14.5%
10%	56,166 (±0.19%)	69,180 (±0.79%)	+23.2%
25%	37,806 (±0.21%)	42,571 (±0.35%)	+12.6%
50%	25,220 (±0.16%)	23,809 (±0.24%)	-5.6%
75%	18,769 (±0.46%)	16,590 (±0.96%)	-11.6%
90%	16,255 (±0.55%)	14,730 (±0.45%)	-9.4%
100%	14,951 (±0.49%)	14,318 (±0.27%)	-4.2%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	20,271 (±0.75%)	20,799 (±0.53%)	+2.6%
10%	5,546 (±0.44%)	7,669 (±1.18%)	+38.3%
25%	3,827 (±0.37%)	4,219 (±0.27%)	+10.2%
50%	2,467 (±0.34%)	2,477 (±1.00%)	+0.4%
75%	1,836 (±0.40%)	1,673 (±3.58%)	-8.8%
90%	1,611 (±0.40%)	1,428 (±2.12%)	-11.4%
100%	1,490 (±0.34%)	1,457 (±0.33%)	-2.2%

Key Observations for Ordered Data:

With ordered data, Sequences show a dramatic advantage (+82.6%) for small collections with low filter percentages (10%)
As the filter percentage increases, the advantage diminishes and sometimes reverses
For medium-to-large collections (10,000+), Sequences perform better at very low (0-10%) and very high (90-100%) filter percentages

Shuffled data

Yes, it’s strange to use shuffled data, but it’s the easiest way to demonstrate behavior in an “unstable” state.

The table is based on data from [CSV ].

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	18,351,457 (±0.21%)	18,488,046 (±0.18%)	+0.7%
10%	5,261,433 (±0.33%)	3,018,650 (±4.93%)	-42.6%
25%	3,903,873 (±0.31%)	4,143,732 (±0.21%)	+6.1%
50%	2,666,945 (±0.17%)	2,471,655 (±0.61%)	-7.3%
75%	2,037,681 (±0.25%)	2,182,115 (±0.26%)	+7.1%
90%	1,836,880 (±0.34%)	1,958,603 (±0.58%)	+6.6%
100%	1,770,229 (±0.39%)	2,105,915 (±0.23%)	+19.0%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	2,002,420 (±0.31%)	1,983,975 (±0.53%)	-0.9%
10%	653,513 (±0.51%)	757,662 (±0.78%)	+15.9%
25%	386,755 (±0.28%)	440,960 (±0.29%)	+14.0%
50%	257,941 (±0.28%)	224,932 (±0.33%)	-12.8%
75%	191,069 (±0.25%)	164,345 (±0.26%)	-14.0%
90%	163,977 (±0.37%)	140,887 (±0.37%)	-14.1%
100%	151,641 (±0.33%)	151,823 (±0.26%)	+0.1%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	171,580 (±0.75%)	152,665 (±1.08%)	-11.0%
10%	59,221 (±0.50%)	64,486 (±0.23%)	+8.9%
25%	37,434 (±0.15%)	39,228 (±0.16%)	+4.8%
50%	23,604 (±0.24%)	16,776 (±0.31%)	-28.9%
75%	18,626 (±0.42%)	13,201 (±0.52%)	-29.1%
90%	16,292 (±0.33%)	11,974 (±0.39%)	-26.5%
100%	14,958 (±0.30%)	14,656 (±0.20%)	-2.0%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	12,491 (±0.84%)	12,303 (±0.70%)	-1.5%
10%	4,033 (±1.05%)	4,826 (±2.09%)	+19.7%
25%	2,092 (±2.50%)	1,991 (±4.67%)	-4.8%
50%	1,200 (±2.32%)	1,021 (±4.02%)	-14.9%
75%	1,294 (±4.23%)	777 (±2.80%)	-40.0%
90%	1,430 (±2.74%)	925 (±3.25%)	-35.3%
100%	1,481 (±0.48%)	1,479 (±0.17%)	-0.1%

Key Observations for Shuffled Data:

High filter percentages (90-100%) show more consistent results regardless of distribution

Distributed data

The table is based on data from [CSV ].

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	18,871,658 (±0.19%)	18,451,270 (±0.63%)	-2.2%
10%	5,749,032 (±0.48%)	2,853,098 (±0.33%)	-50.4%
25%	3,980,196 (±0.18%)	4,076,371 (±0.52%)	+2.4%
50%	2,672,543 (±0.19%)	2,444,005 (±0.21%)	-8.6%
75%	2,059,558 (±0.25%)	2,116,703 (±0.34%)	+2.8%
90%	1,872,262 (±0.32%)	1,956,287 (±0.52%)	+4.5%
100%	1,723,854 (±1.07%)	2,102,347 (±0.89%)	+22.0%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	2,055,754 (±0.37%)	1,977,181 (±0.51%)	-3.8%
10%	642,339 (±0.66%)	731,905 (±0.40%)	+13.9%
25%	394,202 (±0.27%)	425,367 (±0.38%)	+7.9%
50%	261,218 (±0.30%)	223,243 (±0.40%)	-14.5%
75%	190,938 (±0.30%)	163,153 (±0.38%)	-14.6%
90%	165,775 (±0.47%)	140,938 (±0.42%)	-15.0%
100%	154,954 (±0.24%)	152,512 (±0.51%)	-1.6%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	175,417 (±0.64%)	194,157 (±0.98%)	+10.7%
10%	56,803 (±0.23%)	60,288 (±0.60%)	+6.1%
25%	38,284 (±0.13%)	38,877 (±0.23%)	+1.5%
50%	24,950 (±0.25%)	18,808 (±0.38%)	-24.6%
75%	19,091 (±0.26%)	15,133 (±0.33%)	-20.7%
90%	16,488 (±0.47%)	14,027 (±0.47%)	-14.9%
100%	15,174 (±0.13%)	14,431 (±0.31%)	-4.9%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	20,056 (±0.28%)	19,859 (±0.37%)	-1.0%
10%	5,647 (±0.23%)	7,057 (±0.26%)	+25.0%
25%	3,869 (±0.13%)	2,747 (±0.18%)	-29.0%
50%	2,485 (±0.23%)	1,533 (±0.32%)	-38.3%
75%	1,826 (±0.31%)	1,528 (±0.31%)	-16.3%
90%	1,618 (±0.27%)	1,466 (±0.51%)	-9.4%
100%	1,508 (±0.25%)	1,462 (±0.37%)	-3.1%

Key Observations for Distributed Data:

Distributed data produces results similar to shuffled data for many cases
For small collections with low filter percentages (10%), Sequences perform even worse
As collection size increases, the distribution pattern impacts performance more significantly

Comparative Analysis of Distribution Impact

To better understand how data distribution affects each approach, I’ve compared the same implementation across different distribution patterns.

Impact on Collection Operators

Filter %	Ordered CollOps (ops/s)	Distributed CollOps (ops/s)	Diff Ordered vs Distributed	Shuffled CollOps (ops/s)	Distribution Impact (min-max)
0%	18,856,929 ±0.25%	18,871,657 ±0.50%	+0.08%	18,351,457 ±0.17%	2.76%
10%	5,664,906 ±0.56%	5,749,031 ±0.33%	+1.49%	5,261,433 ±0.89%	8.61%
25%	3,981,815 ±0.26%	3,980,196 ±0.29%	-0.04%	3,903,873 ±0.56%	1.96%
50%	2,756,167 ±0.15%	2,672,543 ±0.58%	-3.03%	2,666,945 ±1.01%	3.24%
75%	2,091,353 ±1.03%	2,059,557 ±0.69%	-1.52%	2,037,681 ±0.24%	2.57%
90%	1,874,621 ±0.25%	1,872,261 ±0.26%	-0.13%	1,836,880 ±0.57%	2.01%
100%	1,783,200 ±0.16%	1,723,854 ±1.07%	-3.33%	1,770,228 ±0.39%	3.33%

Filter %	Ordered CollOps (ops/s)	Distributed CollOps (ops/s)	Diff Ordered vs Distributed	Shuffled CollOps (ops/s)	Distribution Impact (min-max)
0%	2,043,831 ±1.01%	2,055,753 ±0.31%	+0.58%	2,002,420 ±0.30%	2.61%
10%	597,364 ±0.25%	642,339 ±0.16%	+7.53%	653,512 ±0.28%	9.40%
25%	411,388 ±2.45%	394,201 ±0.59%	-4.18%	386,755 ±0.24%	5.99%
50%	259,954 ±0.70%	261,218 ±1.50%	+0.49%	257,940 ±0.24%	1.26%
75%	193,584 ±0.29%	190,938 ±1.47%	-1.37%	191,069 ±0.18%	1.37%
90%	165,429 ±1.00%	165,775 ±0.51%	+0.21%	163,977 ±0.18%	1.09%
100%	154,719 ±0.23%	154,954 ±0.88%	+0.15%	151,640 ±0.52%	2.14%

Filter %	Ordered CollOps (ops/s)	Distributed CollOps (ops/s)	Diff Ordered vs Distributed	Shuffled CollOps (ops/s)	Distribution Impact (min-max)
0%	174,630 ±0.63%	175,417 ±1.15%	+0.45%	171,580 ±1.82%	2.20%
10%	56,165 ±0.19%	56,803 ±0.11%	+1.14%	59,221 ±3.79%	5.44%
25%	37,805 ±0.21%	38,284 ±0.18%	+1.26%	37,434 ±0.55%	2.24%
50%	25,219 ±0.16%	24,949 ±0.45%	-1.07%	23,603 ±0.83%	6.41%
75%	18,768 ±0.46%	19,090 ±0.15%	+1.72%	18,626 ±0.44%	2.48%
90%	16,254 ±0.55%	16,487 ±0.22%	+1.43%	16,292 ±0.29%	1.43%
100%	14,950 ±0.49%	15,174 ±0.24%	+1.50%	14,957 ±0.23%	1.50%

Filter %	Ordered CollOps (ops/s)	Distributed CollOps (ops/s)	Diff Ordered vs Distributed	Shuffled CollOps (ops/s)	Distribution Impact (min-max)
0%	20,271 ±0.75%	20,055 ±0.20%	-1.06%	12,491 ±0.57%	38.38%
10%	5,545 ±0.44%	5,646 ±0.49%	+1.83%	4,033 ±1.53%	29.10%
25%	3,827 ±0.37%	3,868 ±0.52%	+1.09%	2,091 ±0.84%	46.44%
50%	2,467 ±0.34%	2,484 ±0.24%	+0.72%	1,199 ±0.26%	52.09%
75%	1,835 ±0.40%	1,825 ±0.49%	-0.53%	1,294 ±0.25%	29.48%
90%	1,611 ±0.40%	1,617 ±0.23%	+0.41%	1,430 ±0.18%	11.66%
100%	1,490 ±0.34%	1,508 ±0.31%	+1.21%	1,480 ±0.62%	1.84%

Key Observations: Collection Operators show relatively low sensitivity to data distribution for small and medium collections. For the largest collections (100,000 items), however, shuffled data can cause dramatic performance drops (up to 52% slower compared to ordered data).

Impact on Sequences

Filter %	Ordered Sequence (ops/s)	Distributed Sequence (ops/s)	Diff Ordered vs Distributed	Shuffled Sequence (ops/s)	Distribution Impact (min-max)
0%	18,371,690 ±0.77%	18,451,269 ±0.21%	+0.43%	18,488,045 ±0.13%	0.63%
10%	10,341,531 ±0.17%	2,853,097 ±3.25%	-72.41%	3,018,649 ±3.64%	72.41%
25%	3,992,161 ±5.56%	4,076,371 ±0.21%	+2.11%	4,143,732 ±0.21%	3.80%
50%	2,165,410 ±0.36%	2,444,005 ±6.88%	+12.87%	2,471,655 ±6.55%	14.14%
75%	2,177,218 ±1.78%	2,116,703 ±0.19%	-2.78%	2,182,114 ±1.36%	3.00%
90%	1,959,499 ±0.47%	1,956,287 ±0.60%	-0.16%	1,958,603 ±0.19%	0.16%
100%	2,105,501 ±0.26%	2,102,346 ±0.30%	-0.15%	2,105,915 ±0.26%	0.17%

Filter %	Ordered Sequence (ops/s)	Distributed Sequence (ops/s)	Diff Ordered vs Distributed	Shuffled Sequence (ops/s)	Distribution Impact (min-max)
0%	1,985,881 ±0.32%	1,977,180 ±0.18%	-0.44%	1,983,974 ±0.59%	0.44%
10%	793,353 ±0.24%	731,905 ±0.47%	-7.75%	757,661 ±0.29%	7.75%
25%	439,794 ±3.12%	425,366 ±0.29%	-3.28%	440,960 ±0.20%	3.55%
50%	228,299 ±0.31%	223,243 ±0.35%	-2.21%	224,932 ±0.28%	2.21%
75%	164,270 ±0.26%	163,153 ±0.33%	-0.68%	164,344 ±0.29%	0.73%
90%	140,504 ±0.48%	140,938 ±0.37%	+0.31%	140,886 ±0.30%	0.31%
100%	151,462 ±0.52%	152,512 ±0.23%	+0.69%	151,822 ±0.30%	0.69%

Filter %	Ordered Sequence (ops/s)	Distributed Sequence (ops/s)	Diff Ordered vs Distributed	Shuffled Sequence (ops/s)	Distribution Impact (min-max)
0%	199,957 ±0.56%	194,157 ±0.53%	-2.90%	152,665 ±0.82%	23.65%
10%	69,180 ±0.79%	60,288 ±0.50%	-12.85%	64,486 ±0.93%	12.85%
25%	42,571 ±0.35%	38,876 ±0.15%	-8.68%	39,227 ±1.71%	8.68%
50%	23,809 ±0.24%	18,807 ±0.38%	-21.01%	16,775 ±1.86%	29.54%
75%	16,589 ±0.96%	15,133 ±0.80%	-8.78%	13,201 ±1.49%	20.43%
90%	14,730 ±0.45%	14,027 ±0.27%	-4.77%	11,973 ±0.55%	18.71%
100%	14,318 ±0.27%	14,430 ±0.47%	+0.79%	14,656 ±0.50%	2.36%

Filter %	Ordered Sequence (ops/s)	Distributed Sequence (ops/s)	Diff Ordered vs Distributed	Shuffled Sequence (ops/s)	Distribution Impact (min-max)
0%	20,799 ±0.53%	19,859 ±0.28%	-4.52%	12,303 ±1.03%	40.85%
10%	7,669 ±1.18%	7,056 ±0.26%	-7.98%	4,825 ±0.58%	37.07%
25%	4,218 ±0.27%	2,747 ±0.23%	-34.88%	1,990 ±0.94%	52.81%
50%	2,476 ±1.00%	1,532 ±0.28%	-38.12%	1,021 ±0.70%	58.76%
75%	1,673 ±3.58%	1,528 ±2.67%	-8.68%	776 ±0.74%	53.57%
90%	1,427 ±2.12%	1,465 ±0.75%	+2.67%	924 ±1.28%	37.91%
100%	1,457 ±0.33%	1,461 ±0.25%	+0.30%	1,478 ±0.96%	1.48%

Key Observations: Sequences show significantly higher sensitivity to data distribution:

For 10% filter with batch size 100, changing from ordered to distributed data causes a dramatic 72.41% performance drop
For larger collections (100,000 items), distribution can impact performance by over 58%
Sequences maintain consistent performance regardless of distribution when the filter percentage is very high (90-100%)

Summary of Medium and Large Collection Analysis: Collection Operators are generally less affected by data distribution than Sequences. The impact of distribution becomes more pronounced for both approaches as collection size increases, with Sequences showing greater sensitivity to distribution patterns when dealing with low filter percentages.

The Case of Small Collections

For smaller collection sizes, I ran the benchmark with a different configuration:

Warm-ups: 10
Measurements: 10
Forks: 2
Mode: Throughput
Duration: 500ms
GC: G1
Fixed Heap size: 2GB
+AlwaysPreTouch: Ensure memory is fully allocated at startup

Please note that results for batch size 5 should be interpreted with caution, as the filtering logic may not be ideal for such small collections.

The table is based on data from [CSV ].

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	90,383,823 (±8.14%)	150,940,586 (±0.66%)	+67.00%
10%	55,406,888 (±0.71%)	90,949,266 (±0.71%)	+64.15%
25%	55,657,083 (±2.39%)	90,665,757 (±1.61%)	+62.90%
50%	36,332,025 (±0.52%)	57,613,969 (±0.73%)	+58.58%
75%	31,807,259 (±2.12%)	51,554,783 (±2.09%)	+62.08%
90%	29,086,237 (±0.41%)	33,116,184 (±0.48%)	+13.86%
100%	29,062,972 (±0.73%)	33,067,388 (±0.71%)	+13.78%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	34,480,704 (±7.84%)	100,852,381 (±1.09%)	+192.49%
10%	24,357,839 (±2.48%)	58,669,965 (±0.38%)	+140.87%
25%	16,613,017 (±3.99%)	44,866,760 (±1.31%)	+170.07%
50%	16,718,661 (±1.79%)	35,870,028 (±0.34%)	+114.55%
75%	12,612,989 (±32.56%)	28,001,904 (±0.53%)	+122.01%
90%	16,601,949 (±0.75%)	26,188,253 (±1.30%)	+57.74%
100%	15,926,913 (±0.56%)	19,520,430 (±0.34%)	+22.56%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	51,657,026 (±4.86%)	58,587,172 (±0.66%)	+13.42%
10%	11,049,179 (±10.36%)	21,250,762 (±18.15%)	+92.33%
25%	13,373,719 (±1.62%)	8,895,053 (±18.92%)	-33.49%
50%	10,458,196 (±0.31%)	12,943,845 (±10.29%)	+23.77%
75%	7,869,224 (±0.59%)	8,039,625 (±0.38%)	+2.17%
90%	6,548,830 (±3.77%)	7,766,415 (±0.45%)	+18.59%
100%	6,434,227 (±0.54%)	5,578,035 (±0.51%)	-13.31%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	33,632,347 (±1.92%)	34,421,169 (±0.91%)	+2.35%
10%	9,120,782 (±6.20%)	9,094,591 (±11.51%)	-0.29%
25%	7,823,834 (±1.54%)	8,458,477 (±25.74%)	+8.11%
50%	5,235,795 (±0.80%)	4,877,777 (±13.45%)	-6.84%
75%	3,913,007 (±0.62%)	3,123,934 (±0.46%)	-20.17%
90%	3,727,317 (±1.12%)	3,348,660 (±15.37%)	-10.16%
100%	3,333,589 (±0.60%)	2,898,578 (±0.34%)	-13.05%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	24,191,600 (±0.78%)	23,965,514 (±0.57%)	-0.93%
10%	7,153,022 (±4.58%)	5,567,417 (±11.44%)	-22.17%
25%	5,196,413 (±1.07%)	4,176,509 (±5.52%)	-19.63%
50%	3,428,483 (±1.18%)	2,867,721 (±0.96%)	-16.36%
75%	2,715,268 (±1.87%)	2,167,624 (±1.75%)	-20.17%
90%	2,339,701 (±2.03%)	2,700,372 (±0.76%)	+15.42%
100%	2,203,987 (±4.02%)	2,722,582 (±0.49%)	+23.53%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	20,671,915 (±1.03%)	20,931,777 (±0.12%)	+1.26%
10%	6,217,694 (±1.49%)	4,855,438 (±14.99%)	-21.91%
25%	4,304,343 (±1.73%)	4,596,305 (±0.39%)	+6.78%
50%	3,126,313 (±1.33%)	2,473,220 (±0.58%)	-20.89%
75%	2,354,969 (±1.74%)	2,635,813 (±0.38%)	+11.93%
90%	2,093,672 (±0.33%)	2,217,350 (±0.32%)	+5.91%
100%	1,958,377 (±0.34%)	2,363,563 (±0.30%)	+20.69%

Percent	collectionsOperators (ops/s)	sequence (ops/s)	Diff %
0%	18,757,307 (±0.43%)	19,031,045 (±0.54%)	+1.46%
10%	5,664,876 (±0.65%)	3,991,975 (±11.15%)	-29.53%
25%	3,994,290 (±0.88%)	4,197,072 (±0.85%)	+5.08%
50%	2,721,058 (±0.60%)	2,546,152 (±12.91%)	-6.43%
75%	2,082,445 (±1.08%)	2,294,545 (±0.50%)	+10.19%
90%	1,867,434 (±1.99%)	2,011,525 (±0.40%)	+7.72%
100%	1,745,965 (±1.63%)	2,166,105 (±0.29%)	+24.06%

Key Observations for Small Collections:

For very small collections (5-10 items), Sequences consistently outperform Collection Operators across all filter percentages
The advantage is most pronounced at low filter percentages (58-67% faster for 0-25% filters)
As collection size increases to 50+, the performance pattern starts to resemble that of medium-sized collections
For batch size 25, we observe higher variability in results, suggesting this might be a transitional size where JIT optimization patterns change

Summary of Small Collection Analysis: Very small collections (under 25 items) show a distinctly different pattern from medium and large collections. Sequences consistently outperform Collection Operators for these tiny collections, regardless of filter percentage or distribution pattern. This advantage diminishes as collection size increases, eventually transitioning to the more complex pattern observed with medium-sized collections.

Conclusion

Comparing Sequences and Collection Operators is not straightforward. Data distribution can impact performance, sometimes reversing the performance advantage between the two approaches. JIT and Hardware optimizations do a lot of work. Really. All this results are heavy depend on Branch predictor implementation and how particular JIT and Hardware work with them in case of Sequences and Collection operators.

Based on all results we can conclude:

Distribution matters: The arrangement of passing items in your collection can affect performance by up to 72% in some cases.
Collection size is crucial: Very small collections behave differently from medium and large ones.
Filter percentage affects sensitivity: Low and high filter percentages show different sensitivity to distribution patterns.
No universal winner: Neither approach consistently outperforms the other across all scenarios.

Rather than making a blanket recommendation, it’s more valuable to understand the factors that influence performance in your specific use case. Consider:

What is your typical collection size?
What percentage of items typically pass your filter?
Is your data likely to have a specific distribution pattern?
How critical is performance for this particular operation?
Do you need a more stable results?

Importand update

For Android devs: this results isn’t fit for an Android apps. The environemt is different, AOT and hardware different. Don’t try to apply it to your apps directly, but pay attention on your benchmarks.

P.S.: In future posts, I plan to explore:

Applying JMH samples recommendations to benchmarks and showing how results change, checking how it would impact Android
Finding a “baseline” for comparing Sequences and Collection Operators
Comparing memory usage between the two approaches

What Are We Really Comparing?#

The Pipeline Structure#

Setting Up#

Benchmark Parameters#

Environment#

Benchmarking Medium and Large Collections#

Ordered data#

Shuffled data#

Distributed data#

Comparative Analysis of Distribution Impact#

Impact on Collection Operators#

Impact on Sequences#

The Case of Small Collections#

Conclusion#

Importand update#

What Are We Really Comparing?

The Pipeline Structure

Setting Up

Benchmark Parameters

Environment

Benchmarking Medium and Large Collections

Ordered data

Shuffled data

Distributed data

Comparative Analysis of Distribution Impact

Impact on Collection Operators

Impact on Sequences

The Case of Small Collections

Conclusion

Importand update