关于密码学:为什么Git不使用更现代的SHA?

Why doesn't Git use more modern SHA?

我读到有关Git使用SHA-1摘要作为修订的ID的信息。 为什么不使用更新版本的SHA?


更新:上面的问题和答案是从2015年开始的。此后,Google宣布了第一次SHA-1冲突:https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

显然,我只能从外面推测Git为什么继续使用SHA-1,但这可能是以下原因之一:

  • Git是Linus Torvald的创造物,Linus显然现在不想用另一种哈希算法替代SHA-1。
  • 他提出合理的说法,即基于SHA-1的成功碰撞对Git的攻击要比实现自身的碰撞要困难得多,并且考虑到SHA-1的脆弱性没有应有的程度,没有被完全破坏,这使其与实际情况相去甚远。至少在今天是可行的攻击。此外,他指出,如果碰撞对象比现有对象晚到达,"成功"攻击将收效甚微,因为后者被认为与有效对象相同且被忽略(尽管其他人指出,相反的情况可能会发生。
  • 更改软件非常耗时且容易出错,尤其是当存在必须迁移的基于现有协议的现有基础结构和数据时。即使是那些以加密安全性为系统重点的软件和硬件产品生产商,也仍在逐步远离SHA-1和其他弱算法的过程中。试想一下所有这些硬编码的unsigned char[20]缓冲区;-),从一开始就为加密敏捷性进行编程要容易得多,而不是以后进行改进。
  • SHA-1的性能优于各种SHA-2哈希(可能现在还不算是一个大问题,但在10年前可能是一个关键点),并且SHA-2的存储容量更大。
  • 一些链接:

    • 关于如果在Git中发生碰撞会发生什么的Stackoverflow问题
    • 新闻组帖子显示,Linus在2005年SHA-1的主要弱点被人们认识后的几个月,就此主题发表了简短评论
    • 讨论弱点以及2006年可能转向sha-256的主题(来自Linus的回复)
    • NIST关于SHA-1弃用的声明,并建议"迅速过渡到更强大的SHA-2哈希函数系列"

    我个人的观点是,尽管实际的攻击可能尚需时日,但即使确实发生,人们也可能会最初通过改变散列算法本身的方式缓解攻击,如果您确实关心安全性,那您应该犯错谨慎选择算法,并不断提高安全性,因为攻击者的能力也只朝一个方向发展,因此将Git用作榜样是不明智的,特别是作为Git的目的使用SHA-1并不意味着要加密安全。


    Why does it not use a more modern version of SHA?

    2017年12月:会的。 Git 2.16(Q1 2018)是第一个说明和实现该意图的版本。

    注意:请参阅下面的Git 2.19:它将是SHA-256。

    Git 2.16将提出一个基础架构,以定义Git中使用的哈希函数,并将开始努力探索各种代码路径中的哈希函数。

    参见Ramsay Jones(``)的commit c250e02(2017年11月28日)。
    请参阅brian m的commit eb0ccfd,commit 78a6766,commit f50e766,combade65(2017年11月12日)。卡尔森(bk2204)。
    (由Junio C Hamano-gitster-在commit 721cc43中合并,2017年12月13日)

    Add structure representing hash algorithm

    Since in the future we want to support an additional hash algorithm, add a structure that represents a hash algorithm and all the data that must go along with it.
    Add a constant to allow easy enumeration of hash algorithms.
    Implement function typedefs to create an abstract API that can be used by any hash algorithm, and wrappers for the existing SHA1 functions that conform to this API.

    Expose a value for hex size as well as binary size.
    While one will always be twice the other, the two values are both used extremely
    commonly throughout the codebase and providing both leads to improved readability.

    Don't include an entry in the hash algorithm structure for the null object ID.
    As this value is all zeros, any suitably sized all-zero object ID can be used, and there's no need to store a given one on a per-hash basis.

    The current hash function transition plan envisions a time when we will accept input from the user that might be in SHA-1 or in the NewHash format.
    Since we cannot know which the user has provided, add a constant representing the unknown algorithm to allow us to indicate that we must look the correct value up.

    Integrate hash algorithm support with repo setup

    In future versions of Git, we plan to support an additional hash
    algorithm.
    Integrate the enumeration of hash algorithms with repository setup, and store a pointer to the enumerated data in struct repository.
    Of course, we currently only support SHA-1, so hard-code this value in
    read_repository_format.
    In the future, we'll enumerate this value from the configuration.

    Add a constant, the_hash_algo, which points to the hash_algo structure pointer in the repository global.
    Note that this is the hash which is used to serialize data to disk, not the hash which is used to display items to the user.
    The transition plan anticipates that these may be different.
    We can add an additional element in the future (say, ui_hash_algo) to provide for this case.

    对于Git 2.19(2018年第三季度),2018年8月更新,Git似乎选择SHA-256作为NewHash。

    参见Jonathan Nieder(artagnon)的commit 0ed8d8d(2018年8月4日)。
    参见?var Arnfj?re Bjarmason(avar)的commit 13f5e09(2018年7月25日)。
    (由Junio C Hamano-gitster-在commit 34f2297中合并,2018年8月20日)

    doc hash-function-transition: pick SHA-256 as NewHash

    From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, K12, and so on are all believed to have similar security properties.
    All are good options from a security point of view.

    SHA-256 has a number of advantages:

    • It has been around for a while, is widely used, and is supported by just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, SecureTransport, etc).

    • When you compare against SHA1DC, most vectorized SHA-256 implementations are indeed faster, even without acceleration.

    • If we're doing signatures with OpenPGP (or even, I suppose, CMS), we're going to be using SHA-2, so it doesn't make sense to have our security depend on two separate algorithms when either one of them alone could break the security when we could just depend on one.

    So SHA-256 it is.
    Update the hash-function-transition design doc to say so.

    After this patch, there are no remaining instances of the string
    "NewHash", except for an unrelated use from 2008 as a variable name in
    t/t9700/test.pl.

    您可以看到Git 2.20(Q4 2018)正在进行向SHA 256的过渡:

    请参阅提交0d7c419,提交dda6346,提交eccb5a5,提交93eb00f,提交d8a3a69,提交fbd0e37,提交f690b6b,提交49d1660,提交268babd,提交fa13080,提交7b5e614,提交58ce21b,提交2f0c9e9,提交2018年10月544544a 。卡尔森(bk2204)。
    参见SZEDERGábor(szeder)的commit 6afedba(2018年10月15日)。
    (由Junio C Hamano-gitster-在commit d829d49中合并,2018年10月30日)

    replace hard-coded constants

    Replace several 40-based constants with references to GIT_MAX_HEXSZ or
    the_hash_algo, as appropriate.
    Convert all uses of the GIT_SHA1_HEXSZ to use the_hash_algo so that they
    are appropriate for any given hash length.
    Instead of using a hard-coded constant for the size of a hex object ID,
    switch to use the computed pointer from parse_oid_hex that points after
    the parsed object ID.

    GIT_SHA1_HEXSZ被Git 2.22(Q2 2019)进一步删除/替换并提交d4e568b。

    Git 2.21(2019年第一季度)将继续这种过渡,该版本添加了sha-256哈希,并将其插入代码以允许使用" NewHash"构建Git。

    请参阅brian m,参见commit 4b4e291,commit 27dc04c,commit 13eeedb,commit c166599,commit 37649b7,commit a2ce0a7,commit 50c817e,commit 9a3a0ff,commit 0dab712,commit 47edb64(2018年11月14日)以及commit 2f90b9d,commit 1ccf07c(2018年10月22日) 。卡尔森(bk2204)。
    (由Junio C Hamano合并-gitster-在commit 33e4ae9中,2019年1月29日)

    Add a base implementation of SHA-256 support (Feb. 2019)

    SHA-1 is weak and we need to transition to a new hash function.
    For some time, we have referred to this new function as NewHash.
    Recently, we decided to pick SHA-256 as NewHash.
    The reasons behind the choice of SHA-256 are outlined in this thread and in the commit history for the hash function transition document.

    Add a basic implementation of SHA-256 based off libtomcrypt, which is in
    the public domain.
    Optimize it and restructure it to meet our coding standards.
    Pull in the update and final functions from the SHA-1 block implementation, as we know these function correctly with all compilers. This implementation is slower than SHA-1, but more performant implementations will be introduced in future commits.

    Wire up SHA-256 in the list of hash algorithms, and add a test that the
    algorithm works correctly.

    Note that with this patch, it is still not possible to switch to using SHA-256 in Git.
    Additional patches are needed to prepare the code to handle a larger hash algorithm and further test fixes are needed.

    hash: add an SHA-256 implementation using OpenSSL

    We already have OpenSSL routines available for SHA-1, so add routines
    for SHA-256 as well.

    On a Core i7-6600U, this SHA-256 implementation compares favorably to
    the SHA1DC SHA-1 implementation:

    1
    2
    SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
    SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

    sha256: add an SHA-256 implementation using libgcrypt

    Generally, one gets better performance out of cryptographic routines written in assembly than C, and this is also true for SHA-256.
    In addition, most Linux distributions cannot distribute Git linked against
    OpenSSL for licensing reasons.

    Most systems with GnuPG will also have libgcrypt, since it is a dependency of GnuPG.
    libgcrypt is also faster than the SHA1DC implementation for messages of a few KiB and larger.

    For comparison, on a Core i7-6600U, this implementation processes 16 KiB
    chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
    MiB/s.

    In addition, libgcrypt is licensed under the LGPL 2.1, which is
    compatible with the GPL. Add an implementation of SHA-256 that uses
    libgcrypt.


    这是关于从SHA1迁移到Mercurial的紧迫性的讨论,但它也适用于Git:https://www.mercurial-scm.org/wiki/mpm/SHA1

    简而言之:如果您今天不十分勤快,则漏洞的安全性要比sha1小得多。但是尽管如此,Mercurial还是在10多年前就开始准备从sha1迁移出去。

    work has been underway for years to retrofit Mercurial's data structures and protocols for SHA1's successors. Storage space was allocated for larger hashes in our revlog structure over 10 years ago in Mercurial 0.9 with the the introduction of RevlogNG. The bundle2 format introduced more recently supports the exchange of different hash types over the network. The only remaining pieces are choice of a replacement function and choosing a backwards-compatibility strategy.

    如果git在Mercurial之前没有从sha1迁移过来,则始终可以通过使用hg-git保留本地Mercurial镜像来增加另一级别的安全性。


    现在有一个向更强的哈希值过渡的计划,因此看起来将来它将使用比SHA-1更现代的哈希值。根据当前的过渡计划:

    Some hashes under consideration are SHA-256, SHA-512/256, SHA-256x16, K12, and BLAKE2bp-256