使用Hibernate / JPA批量插入/更新

Batch Insert/Update with Hibernate/JPA

1.概述

在本教程中,我们将研究如何使用Hibernate / JPA批量插入或更新实体。

批处理使我们可以在单个网络调用中向数据库发送一组SQL语句。 这样,我们可以优化应用程序的网络和内存使用率。

2.设定

2.1。 样本数据模型

让我们看一下将在示例中使用的示例数据模型。

首先,我们将创建一个School实体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
@Entity
public class School {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private long id;

    private String name;

    @OneToMany(mappedBy ="school")
    private List<Student> students;

    // Getters and setters...
}

每所学校将有零个或多个学生:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
@Entity
public class Student {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private long id;

    private String name;

    @ManyToOne
    private School school;

    // Getters and setters...
}

2.2。 跟踪SQL查询

在运行示例时,我们需要验证插入/更新语句确实是批量发送的。 不幸的是,我们无法从Hibernate日志语句中了解SQL语句是否已批处理。 因此,我们将使用数据源代理来跟踪Hibernate / JPA SQL语句:

1
2
3
4
5
6
7
8
9
10
private static class ProxyDataSourceInterceptor implements MethodInterceptor {
    private final DataSource dataSource;
    public ProxyDataSourceInterceptor(final DataSource dataSource) {
        this.dataSource = ProxyDataSourceBuilder.create(dataSource)
            .name("Batch-Insert-Logger")
            .asJson().countQuery().logQueryToSysOut().build();
    }
   
    // Other methods...
}

3.默认行为

Hibernate默认情况下不启用批处理。 这意味着它将为每个插入/更新操作发送单独的SQL语句:

1
2
3
4
5
6
7
8
9
@Transactional
@Test
public void whenNotConfigured_ThenSendsInsertsSeparately() {
    for (int i = 0; i < 10; i++) {
        School school = createSchool(i);
        entityManager.persist(school);
    }
    entityManager.flush();
}

在这里,我们坚持了10个School实体。 如果查看查询日志,可以看到Hibernate分别发送每个插入语句:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School1","1"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School2","2"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School3","3"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School4","4"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School5","5"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School6","6"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School7","7"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School8","8"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School9","9"]]
"querySize":1,"batchSize":0,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School10","10"]]

因此,我们应该配置Hibernate以启用批处理。 为此,我们应该将hibernate.jdbc.batch_size属性设置为大于0的数字。

如果我们手动创建EntityManager,则应将hibernate.jdbc.batch_size添加到Hibernate属性中:

1
2
3
4
5
6
7
public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.jdbc.batch_size","5");
   
    // Other properties...
    return properties;
}

如果使用Spring Boot,则可以将其定义为应用程序属性:

1
spring.jpa.properties.hibernate.jdbc.batch_size=5

4.批量插入单个表

4.1。 批量插入,无需显式冲洗

首先,让我们看一下在仅处理一种实体类型时如何使用批处理插入。

我们将使用先前的代码示例,但是这次启用了批处理:

1
2
3
4
5
6
7
8
@Transactional
@Test
public void whenInsertingSingleTypeOfEntity_thenCreatesSingleBatch() {
    for (int i = 0; i < 10; i++) {
        School school = createSchool(i);
        entityManager.persist(school);
    }
}

在这里,我们坚持了10个School实体。 查看日志时,我们可以验证Hibernate是否批量发送insert语句:

1
2
3
4
"batch":true,"querySize":1,"batchSize":5,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School1","1"],["School2","2"],["School3","3"],["School4","4"],["School5","5"]]
"batch":true,"querySize":1,"batchSize":5,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School6","6"],["School7","7"],["School8","8"],["School9","9"],["School10","10"]]

这里要提到的重要一件事是内存消耗。 当我们持久化一个实体时,Hibernate将其存储在持久化上下文中。 例如,如果我们在一个事务中保留100,000个实体,则最终将在内存中拥有100,000个实体实例,可能会导致OutOfMemoryException。

4.2。 批处理插入与显式冲洗

现在,我们将研究如何在批处理操作期间优化内存使用。 让我们深入研究持久性上下文的作用。

首先,持久性上下文将新创建的实体以及修改后的实体存储在内存中。 同步事务后,Hibernate将这些更改发送到数据库。 这通常发生在交易结束时。 但是,调用EntityManager.flush()也会触发事务同步。

其次,持久性上下文用作实体缓存,因此也称为第一级缓存。 要在持久性上下文中清除实体,我们可以调用EntityManager.clear()。

因此,为了减少批处理期间的内存负载,只要达到批处理大小,我们就可以在应用程序代码上调用EntityManager.flush()和EntityManager.clear():

1
2
3
4
5
6
7
8
9
10
11
12
@Transactional
@Test
public void whenFlushingAfterBatch_ThenClearsMemory() {
    for (int i = 0; i < 10; i++) {
        if (i > 0 && i % BATCH_SIZE == 0) {
            entityManager.flush();
            entityManager.clear();
        }
        School school = createSchool(i);
        entityManager.persist(school);
    }
}

在这里,我们在持久性上下文中刷新实体,从而使Hibernate将查询发送到数据库。 此外,通过清除持久性上下文,我们从内存中删除了School实体。 批处理行为将保持不变。

5.批量插入多个表

现在,让我们看看在一个事务中处理多种实体类型时如何配置批处理插入。

当我们要保留几种类型的实体时,Hibernate为每种实体类型创建一个不同的批处理。 这是因为单个批次中只能有一种类型的实体。

此外,由于Hibernate收集插入语句,因此每当遇到与当前批处理中不同的实体类型时,它将创建一个新批处理。 即使已经有该实体类型的批次,也是如此:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Transactional
@Test
public void whenThereAreMultipleEntities_ThenCreatesNewBatch() {
    for (int i = 0; i < 10; i++) {
        if (i > 0 && i % BATCH_SIZE == 0) {
            entityManager.flush();
            entityManager.clear();
        }
        School school = createSchool(i);
        entityManager.persist(school);
        Student firstStudent = createStudent(school);
        Student secondStudent = createStudent(school);
        entityManager.persist(firstStudent);
        entityManager.persist(secondStudent);
    }
}

在这里,我们要插入一所学校,并为其分配两个学生,然后重复此过程10次。

在日志中,我们看到Hibernate以几批大小为1的方式发送School插入语句,而我们原本只希望以2批大小为5的方式发送School插入语句。此外,Student插入语句也以几批大小为2的方式发送,而不是4批大小为5的发送方式。 :

1
2
3
4
5
6
7
8
9
10
11
12
13
"batch":true,"querySize":1,"batchSize":1,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School1","1"]]
"batch":true,"querySize":1,"batchSize":2,"query":["insert into student (name, school_id, id)
  values (?, ?, ?)"
],"params":[["Student-School1","1","2"],["Student-School1","1","3"]]
"batch":true,"querySize":1,"batchSize":1,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School2","4"]]
"batch":true,"querySize":1,"batchSize":2,"query":["insert into student (name, school_id, id)
  values (?, ?, ?)"
],"params":[["Student-School2","4","5"],["Student-School2","4","6"]]
"batch":true,"querySize":1,"batchSize":1,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School3","7"]]
"batch":true,"querySize":1,"batchSize":2,"query":["insert into student (name, school_id, id)
  values (?, ?, ?)"
],"params":[["Student-School3","7","8"],["Student-School3","7","9"]]
Other log lines...

要批处理具有相同实体类型的所有插入语句,我们应该配置hibernate.order_inserts属性。

我们可以使用EntityManagerFactory手动配置Hibernate属性:

1
2
3
4
5
6
7
public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.order_inserts","true");
   
    // Other properties...
    return properties;
}

如果使用的是Spring Boot,则可以在application.properties中配置属性:

1
spring.jpa.properties.hibernate.order_inserts=true

添加此属性后,我们将获得1批用于School插入的内容和2批针对Student插入的内容:

1
2
3
4
5
6
7
8
"batch":true,"querySize":1,"batchSize":5,"query":["insert into school (name, id) values (?, ?)"],
 "params":[["School6","16"],["School7","19"],["School8","22"],["School9","25"],["School10","28"]]
"batch":true,"querySize":1,"batchSize":5,"query":["insert into student (name, school_id, id)
  values (?, ?, ?)"
],"params":[["Student-School6","16","17"],["Student-School6","16","18"],
  ["Student-School7","19","20"],["Student-School7","19","21"],["Student-School8","22","23"]]
"batch":true,"querySize":1,"batchSize":5,"query":["insert into student (name, school_id, id)
  values (?, ?, ?)"
],"params":[["Student-School8","22","24"],["Student-School9","25","26"],
  ["Student-School9","25","27"],["Student-School10","28","29"],["Student-School10","28","30"]]

6.批量更新

现在,让我们继续进行批处理更新。 与批处理插入类似,我们可以对多个更新语句进行分组,然后一次性将它们发送到数据库。

为此,我们将配置hibernate.order_updates和hibernate.jdbc.batch_versioned_data属性。

如果我们手动创建EntityManagerFactory,则可以通过编程方式设置属性:

1
2
3
4
5
6
7
8
public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.order_updates","true");
    properties.put("hibernate.batch_versioned_data","true");
   
    // Other properties...
    return properties;
}

如果使用Spring Boot,则将它们添加到application.properties中:

1
2
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.batch_versioned_data=true

配置完这些属性后,Hibernate应该将更新语句分批分组:

1
2
3
4
5
6
7
8
9
10
@Transactional
@Test
public void whenUpdatingEntities_thenCreatesBatch() {
    TypedQuery<School> schoolQuery =
      entityManager.createQuery("SELECT s from School s", School.class);
    List<School> allSchools = schoolQuery.getResultList();
    for (School school : allSchools) {
        school.setName("Updated_" + school.getName());
    }
}

在这里,我们更新了学校实体,并且Hibernate分2批发送了大小为5的SQL语句:

1
2
3
4
5
6
"batch":true,"querySize":1,"batchSize":5,"query":["update school set name=? where id=?"],
 "params":[["Updated_School1","1"],["Updated_School2","2"],["Updated_School3","3"],
  ["Updated_School4","4"],["Updated_School5","5"]]
"batch":true,"querySize":1,"batchSize":5,"query":["update school set name=? where id=?"],
 "params":[["Updated_School6","6"],["Updated_School7","7"],["Updated_School8","8"],
  ["Updated_School9","9"],["Updated_School10","10"]]

7. @Id生成策略

当我们要对插入/更新使用批处理时,我们应该了解主键生成策略。 如果我们的实体使用GenerationType.IDENTITY标识符生成器,则Hibernate将以静默方式禁用批处理插入/更新。

由于示例中的实体使用GenerationType.SEQUENCE标识符生成器,因此Hibernate启用批处理操作:

1
2
3
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE)
private long id;

8.总结

在本文中,我们研究了使用Hibernate / JPA进行批处理插入和更新。

在Github上查看本文的代码示例。