在JVM常用概念之新对象实例化博客中我讲到了对象的实例化,主要包含分配(TLAB)、系统初始化、用户初始化,而我在JVM常用概念之线程本地分配缓冲区(ThreadLocal Allocation Buffer,TLAB)博客中也讲到TLAB分配的效率是非常高的,而系统初始化和用户初始化是可以进行合并的,那最后就剩下将数据写入内存这部分的成本没有讨论过,那对于对象初始化而言,写入内存这部分的成本是怎么样的呢?我们接下来通过实例讨论一下。
实验
源码
import org.openjdk.jmh.annotations.*;
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(value = 3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class Case {
@Param({"1", "10", "100", "1000", "10000", "100000"})
int size;
@Benchmark
public byte[] java() {
return new byte[size];
}
}
Score
Benchmark (size) Mode Cnt Score Error Units
# Time to allocate
CASE.java 1 avgt 15 20.307 ± 4.532 ns/op
CASE.java 10 avgt 15 26.657 ± 6.072 ns/op
CASE.java 100 avgt 15 106.632 ± 34.742 ns/op
CASE.java 1000 avgt 15 681.176 ± 124.980 ns/op
CASE.java 10000 avgt 15 4576.433 ± 909.956 ns/op
CASE.java 100000 avgt 15 44881.095 ± 13765.440 ns/op
# Allocation rate
CASE.java:·gc.alloc.rate 1 avgt 15 6228.153 ± 1059.385 MB/sec
CASE.java:·gc.alloc.rate 10 avgt 15 6335.809 ± 986.395 MB/sec
CASE.java:·gc.alloc.rate 100 avgt 15 6126.333 ± 1354.964 MB/sec
CASE.java:·gc.alloc.rate 1000 avgt 15 7772.263 ± 1263.453 MB/sec
CASE.java:·gc.alloc.rate 10000 avgt 15 11518.422 ± 2155.516 MB/sec
CASE.java:·gc.alloc.rate 100000 avgt 15 12039.594 ± 2724.242 MB/sec
内存分配率直接由所运行机器的内存带宽/分配率决定。
热代码-汇编
0x00007f1f094f650b: movq $0x1,(%rdx) ; store mark word
0.00% 0x00007f1f094f6512: prefetchnta 0xc0(%r9)
0.64% 0x00007f1f094f651a: movl $0xf80000f5,0x8(%rdx) ; store klass word
0.02% 0x00007f1f094f6521: mov %r11d,0xc(%rdx) ; store array length
0x00007f1f094f6525: prefetchnta 0x100(%r9)
0.05% 0x00007f1f094f652d: prefetchnta 0x140(%r9)
0.07% 0x00007f1f094f6535: prefetchnta 0x180(%r9)
0.09% 0x00007f1f094f653d: shr $0x3,%rcx
0.00% 0x00007f1f094f6541: add $0xfffffffffffffffe,%rcx
0x00007f1f094f6545: xor %rax,%rax
0x00007f1f094f6548: cmp $0x8,%rcx
╭ 0x00007f1f094f654c: jg 0x00007f1f094f655e ; large enough? jump
│ 0x00007f1f094f654e: dec %rcx
│╭ 0x00007f1f094f6551: js 0x00007f1f094f6565 ; zero length? jump
││↗ 0x00007f1f094f6553: mov %rax,(%rdi,%rcx,8) ; small loop init
│││ 0x00007f1f094f6557: dec %rcx
││╰ 0x00007f1f094f655a: jge 0x00007f1f094f6553
││ ╭ 0x00007f1f094f655c: jmp 0x00007f1f094f6565
↘│ │ 0x00007f1f094f655e: shl $0x3,%rcx
89.12% │ │ 0x00007f1f094f6562: rep rex.W stos %al,%es:(%rdi) ; large loop init
0.20% ↘ ↘ 0x00007f1f094f6565: mov %r8,(%rsp)
可发现大初始化循环的内联的rep stos序列占用了大部分的时间成本。
思考
可以创建一个分配未初始化的对象吗?
实际应用场景下,创建一个分配未初始化的对象没有实际意义,而通过Unsafe可以创建分配未初始化的对象,Unsafe不遵循Java规范,有时候甚至违反JVM规范,可以通过jdk.internal.*使用它,但使用Unsafe可能导致JVM崩溃等未知的问题及风险,请谨慎使用!
源码
import jdk.internal.misc.Unsafe;
import org.openjdk.jmh.annotations.*;
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(value = 3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class Case {
static Unsafe U;
static {
try {
Field field = Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
U = (Unsafe) field.get(null);
} catch (Exception e) {
throw new IllegalStateException(e);
}
}
@Param({"1", "10", "100", "1000", "10000", "100000"})
int size;
@Benchmark
public byte[] unsafe() {
return (byte[]) U.allocateUninitializedArray(byte.class, size);
}
}
Score
Benchmark (size) Mode Cnt Score Error Units
Case.unsafe 1 avgt 15 19.766 ± 4.002 ns/op
Case.unsafe 10 avgt 15 27.486 ± 7.005 ns/op
Case.unsafe 100 avgt 15 80.040 ± 15.754 ns/op
Case.unsafe 1000 avgt 15 156.041 ± 0.552 ns/op
Case.unsafe 10000 avgt 15 162.384 ± 1.448 ns/op
Case.unsafe 100000 avgt 15 309.769 ± 2.819 ns/op
Case.unsafe:·gc.alloc.rate 1 avgt 15 6359.987 ± 928.472 MB/sec
Case.unsafe:·gc.alloc.rate 10 avgt 15 6193.103 ± 1160.353 MB/sec
Case.unsafe:·gc.alloc.rate 100 avgt 15 7855.147 ± 1313.314 MB/sec
Case.unsafe:·gc.alloc.rate 1000 avgt 15 33171.384 ± 153.645 MB/sec
Case.unsafe:·gc.alloc.rate 10000 avgt 15 315740.299 ± 3678.459 MB/sec
Case.unsafe:·gc.alloc.rate 100000 avgt 15 1650860.763 ± 14498.920 MB/sec
热代码-汇编
0x00007f65fd722c74: prefetchnta 0xc0(%r11)
66.06% 0x00007f65fd722c7c: movq $0x1,(%rax) ; store mark word
0.40% 0x00007f65fd722c83: prefetchnta 0x100(%r11)
4.43% 0x00007f65fd722c8b: movl $0xf80000f5,0x8(%rax) ; store class word
0.01% 0x00007f65fd722c92: mov %edx,0xc(%rax) ; store array length
0x00007f65fd722c95: prefetchnta 0x140(%r11)
5.18% 0x00007f65fd722c9d: prefetchnta 0x180(%r11)
4.99% 0x00007f65fd722ca5: mov %r8,0x40(%rsp)
0x00007f65fd722caa: mov %rax,%rdx
将元数据写入内存占用了大部分时间成本。