关于java：在每100行10000中使用flush()方法会减慢事务速度

Using flush() method on each 100 rows of 10 000 slows transaction

我有一个示例项目，其中将spring-boot与spring-data-jpa和postgres db与一个表一起使用。

我试图将循环中的INSERT 1万条记录插入表中并测量执行时间-从每100条记录的EntityManager类启用或禁用flush()方法。

预期的结果是，启用了flush()方法的执行时间比禁用了flush()方法的执行时间少得多，但实际上我得到了相反的结果。

UserService.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14

package sample.data;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class UserService {
@Autowired
UserRepository userRepository;

public User save(User user) {
return userRepository.save(user);
}
}

UserRepository.java

1
2
3
4
5
6
7

package sample.data;

import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface UserRepository extends JpaRepository<User, Long> { }

应用程序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

package sample;

import org.springframework.data.jpa.repository.config.EnableJpaRepositories;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.transaction.annotation.Transactional;

import sample.data.User;
import sample.data.UserService;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;

@SpringBootApplication
@EnableJpaRepositories(considerNestedRepositories = true)
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}

@Autowired
private UserService userService;

@PersistenceContext
EntityManager entityManager;

@Bean
public CommandLineRunner addUsers() {
return new CommandLineRunner() {
@Transactional
public void run(String... args) throws Exception {
long incoming = System.currentTimeMillis();
for (int i = 1; i <= 10000; i++) {
userService.save(new User("name_" + i));

if (i % 100 == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.close();
System.out.println("Time:" + (System.currentTimeMillis() - incoming));
}
};
}
}

相关讨论

确保在持久性提供程序配置中启用JDBC批处理。如果您使用的是Hibernate，请将其添加到Spring属性中：

1	spring.jpa.properties.hibernate.jdbc.batch_size=20 // or some other reasonable value

如果不启用批处理，我想性能下降是由于每100个实体清除持久性上下文的开销而引起的，但是我不确定(必须测量)。

更新：

实际上，启用或禁用JDBC批处理不会影响以下事实：偶尔执行flush()不会比没有它快。您使用手册flush()控制的不是刷新的方式(通过批处理语句或单一插入)，而是控制何时刷新数据库。

因此，您要比较的是以下内容：

对于每100个对象flush()：在刷新时，您将100个实例插入数据库中，然后执行10000/100 = 100次。

如果不使用flush()：您只需在内存中收集上下文中的所有10000个对象，并在提交事务时进行10000次插入。

JDBC批处理另一方面会影响刷新的发生方式，但是使用flush()与不使用flush()发出的语句数量仍然相同。

循环中每隔一段时间刷新和清除一次的好处是，避免由于缓存中包含太多对象而可能导致的OutOfMemoryError。

相关讨论

编写微型基准测试非常困难，Aleksey Shipilev在他的" JMH vs Caliper：参考线程"帖子中对此进行了很好的说明。您的案例并不完全是一个微观基准，而是：

如果重复次数少于10,000，将无法让JVM进行预热并将JIT代码设置为默认设置。在评估代码性能之前，请预热JVM。

System.nanoTime()不是System.currentTimeMillis()，用于测量经过时间。如果在ms中进行测量，则结果会因System.currentTimeMillis()中的时钟漂移而产生偏差。

您很可能希望在数据库端进行度量以查明瓶颈。没有瓶颈，很难理解根本原因是什么，例如您的数据库可能位于大西洋的另一端，网络连接成本将使INSERT报表成本蒙上阴影。

您的基准测试是否足够隔离？如果数据库由多个用户和连接共享，则除了基准测试外，其性能也会有所不同。

在当前设置中找到瓶颈，对如何进行验证进行假设，更改基准以匹配该假设，然后再次进行测量以确认。这是找出答案的唯一方法。

相关讨论

您能否解释一下您为什么相信：

Expected result is that execution time with enabled flush() method is much less then with disabled one

在我看来，这是一个根本上错误的假设。没有充分的理由相信，进行一次10k次微不足道的操作会比不进行冲洗更快速。

只要所有记录都适合内存，我希望非中间刷新版本会更快。是什么表明执行网络IO来访问数据库100次应该比最后执行1次要快？

相关讨论

其他答案未提及的两个方面。除了刷新之外，您还需要清除休眠会话。如果不清除它，它将增长并会影响您的内存消耗，这可能会导致性能下降。

持久化实体时还需确保ID生成器使用hilosequence。如果您的ID是1,2,3,4,5 .....每个插入将有额外的往返次数以增加ID。

相关讨论

持久保存实体最昂贵的部分是写入数据库。相比之下，将实体保留在JPA中所花费的时间是微不足道的，因为它是纯内存操作。与内存相比，它是IO。

写入数据库也可能会产生相当大的静态开销，这意味着写入数据库的次数可能会影响执行时间。调用EntityManager#flush时，您指示Hibernate将所有未决的更改写入数据库。

因此，您正在做的是将执行100次数据库写入与执行一次数据库写入进行比较。由于IO的开销，前者的速度会大大降低。