关于Scala:使用sbt-assembly的组装-合并-策略问题

assembly-merge-strategy issues using sbt-assembly

我正在尝试使用sbt-assembly将scala项目转换为可部署的胖罐。 当我在sbt中运行组装任务时,出现以下错误:

1
2
3
4
Merging 'org/apache/commons/logging/impl/SimpleLog.class' with strategy 'deduplicate'
    :assembly: deduplicate: different file contents found in the following:
    [error] /Users/home/.ivy2/cache/commons-logging/commons-logging/jars/commons-logging-1.1.1.jar:org/apache/commons/logging/impl/SimpleLog.class
    [error] /Users/home/.ivy2/cache/org.slf4j/jcl-over-slf4j/jars/jcl-over-slf4j-1.6.4.jar:org/apache/commons/logging/impl/SimpleLog.class

现在从sbt-assembly文档中:

If multiple files share the same relative path (e.g. a resource named
application.conf in multiple dependency JARs), the default strategy is
to verify that all candidates have the same contents and error out
otherwise. This behavior can be configured on a per-path basis using
either one of the following built-in strategies or writing a custom one:

  • MergeStrategy.deduplicate is the default described above
  • MergeStrategy.first picks the first of the matching files in classpath order
  • MergeStrategy.last picks the last one
  • MergeStrategy.singleOrError bails out with an error message on conflict
  • MergeStrategy.concat simply concatenates all matching files and includes the result
  • MergeStrategy.filterDistinctLines also concatenates, but leaves out duplicates along the way
  • MergeStrategy.rename renames the files originating from jar files
  • MergeStrategy.discard simply discards matching files

通过此操作,我如下设置了build.sbt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import sbt._
import Keys._
import sbtassembly.Plugin._
import AssemblyKeys._
name :="my-project"
version :="0.1"
scalaVersion :="2.9.2"
crossScalaVersions := Seq("2.9.1","2.9.2")

//assemblySettings
seq(assemblySettings: _*)

resolvers ++= Seq(
   "Typesafe Releases Repository" at"http://repo.typesafe.com/typesafe/releases/",
   "Typesafe Snapshots Repository" at"http://repo.typesafe.com/typesafe/snapshots/",
   "Sonatype Repository" at"http://oss.sonatype.org/content/repositories/releases/"
)

libraryDependencies ++= Seq(
   "org.scalatest" %%"scalatest" %"1.6.1" %"test",
   "org.clapper" %%"grizzled-slf4j" %"0.6.10",
   "org.scalaz" %"scalaz-core_2.9.2" %"7.0.0-M7",
   "net.databinder.dispatch" %%"dispatch-core" %"0.9.5"
)

scalacOptions +="-deprecation"
mainClass in assembly := Some("com.my.main.class")
test in assembly := {}
mergeStrategy in assembly := mergeStrategy.first

在build.sbt的最后一行中,我有:

1
mergeStrategy in assembly := mergeStrategy.first

现在,当我运行SBT时,出现以下错误:

1
2
error: value first is not a member of sbt.SettingKey[String => sbtassembly.Plugin.MergeStrategy]
    mergeStrategy in assembly := mergeStrategy.first

有人可以指出我在这里做错了什么吗?

谢谢


对于当前版本0.11.2(2014-03-25),定义合并策略的方式不同。

在此处进行记录,相关部分为:

NOTE:
mergeStrategy in assembly expects a function, you can't do

1
mergeStrategy in assembly := MergeStrategy.first

新方法是(从同一来源复制):

1
2
3
4
5
6
7
8
9
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax","servlet", xs @ _*)         => MergeStrategy.first
    case PathList(ps @ _*) if ps.last endsWith".html" => MergeStrategy.first
    case"application.conf" => MergeStrategy.concat
    case"unwanted.txt"     => MergeStrategy.discard
    case x => old(x)
  }
}

这可能也适用于早期版本,我不知道确切何时更改。


我刚刚建立了一个小的sbt项目,该项目需要重新连接一些mergeStrategies,发现答案有些过时,让我为版本添加我的工作代码(截至2015年7月4日)

  • sbt 0.13.8
  • scala2.11.6
  • 组件0.13.0

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    mergeStrategy in assembly := {
      case x if x.startsWith("META-INF") => MergeStrategy.discard // Bumf
      case x if x.endsWith(".html") => MergeStrategy.discard // More bumf
      case x if x.contains("slf4j-api") => MergeStrategy.last
      case x if x.contains("org/cyberneko/html") => MergeStrategy.first
      case PathList("com","esotericsoftware", xs@_ *) => MergeStrategy.last // For Log$Logger.class
      case x =>
         val oldStrategy = (mergeStrategy in assembly).value
         oldStrategy(x)
    }


我认为应该是大写的MergeStrategy.firstmergeStrategy in assembly := MergeStrategy.first


这是合并大多数常见的java / scala项目的正确方法。
它照顾了META-INF和类。

此外,还会处理META-INF中的服务注册。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
assemblyMergeStrategy in assembly := {
case x if Assembly.isConfigFile(x) =>
  MergeStrategy.concat
case PathList(ps @ _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
  MergeStrategy.rename
case PathList("META-INF", xs @ _*) =>
  (xs map {_.toLowerCase}) match {
    case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
      MergeStrategy.discard
    case ps @ (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
      MergeStrategy.discard
    case"plexus" :: xs =>
      MergeStrategy.discard
    case"services" :: xs =>
      MergeStrategy.filterDistinctLines
    case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
      MergeStrategy.filterDistinctLines
    case _ => MergeStrategy.first
  }
case _ => MergeStrategy.first}

对于新的sbt版本(sbt-version:0.13.11),我收到slf4j的错误;对于暂时采取的简便方法:也请在此处检查答案Scala SBT程序集由于StaticLoggerBinder.class中的重复数据删除错误而无法合并,其中提到了sbt-dependency-graph工具,手动执行此操作很酷

1
2
3
4
5
6
assemblyMergeStrategy in assembly <<= (assemblyMergeStrategy in assembly) {
  (old) => {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first
  }
}

快速更新:不建议使用mergeStrategy。使用assemblyMergeStrategy。除此之外,早期的反应仍然很可靠


在build.sbt中添加以下内容以将kafka添加为源或目标

1
2
3
4
5
6
7
 assemblyMergeStrategy in assembly := {
 case PathList("META-INF", xs @ _*) => MergeStrategy.discard
 //To add Kafka as source
 case"META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" =>
 MergeStrategy.concat
 case x => MergeStrategy.first
 }