关于linux：如何使用sed仅返回正则表达式匹配项的第n次出现？

How to return only the nth occurrence of a regex match using sed?

信息

我有一个包含时间信息的字符串。我只需要从该字符串中获取月数即可。我正在尝试使用sed完成此操作，但是我还没有碰到太多运气。对我来说，使用bash命令行工具是必需的。

当前尝试

1	echo"1969 years 12 months 25 days 19 hours 38 minutes 24 seconds since last release" \| sed -r"s/^([0-9]+).*/\\1/"

返回1969

奖金

更一般地说，我的问题是字符串的整个"1969 years 12 months"部分。我正在使用date来获取时差，但我必须考虑1970年的开始日期。我感到奇怪的是，我的当前输出将1970表示为1969 year and 12 months。

编辑：我正在尝试从这篇文章中使用nunk的解决方案：

How to find the difference in days between two dates?

除了几个月之外，它适用于所有内容。月份数减少了(打印的12个月数)。

相关讨论

我接下来将要这样做：

1
2
3
4
5
6

str="1969 years 12 months 25 days 19 hours 38 minutes 24 seconds since last release"

read y m d H M S < <(echo $(grep -oP '\\d+' <<<"$str"))
echo"month: $m"
echo"year: $y"
#etc for $d $H $M $S

打印：

1 2	month: 12 year: 1969

grep过滤掉所有数字(每个数字单独一行)
echo用数字将空格分隔开
将读取read并将其分配给相应变量的每个数字

这件衣服是否适合：

1	echo"1969 years 12 months 25 days 19 hours 38 minutes 24 seconds since last release" \| sed -r 's/.* ([0-9]+) months.*/\\1/'

给予

这是怎么回事：

如果按顺序满足以下所有条件，则表达式的"匹配"部分将成功：
- .*任何东西
- 一个空格
- [0-9]+一个或多个数字(使用(和)"捕获")
- 另一个空格
- months单词" months"
- .*任何东西
如果成功，则由于两端都是.*，因此可以保证匹配整行。
另外，如果成功，则匹配(整行)将替换为\\1，这是一个特殊的代码，表示"第一组捕获括号的内容。您将看到上面括号捕获的值是单词" months"前面的数字。

您可以使用Perl来分离键值对：

1
2
3
4
5
6
7
8

$ str="1969 years 12 months 25 days 19 hours 38 minutes 24 seconds since last release"
$ echo $str | perl -lane 'print"$1 $2" while /(\\d+)\\s(\\w+)/g'
1969 years
12 months
25 days
19 hours
38 minutes
24 seconds

然后使用grep抓取您想要的一个：

1 2	$ echo $str \| perl -lane 'print"$1 $2" while /(\\d+)\\s(\\w+)/g' \| grep 'months' 12 months

听起来您正在尝试通过运行有问题的工具来修正输出来区分日期。只需使用GNU awk代替：

1
2
3

$ echo"2014-09-03T14:44:48+00:00" |
gawk '{gsub(/[-T:+]/,""); print (systime() - mktime($0)) / (24*60*60)}'
5.86991

根据输出天数进行任意舍入，例如：

1
2
3
4

$ echo"2014-09-03T14:44:48+00:00" |
gawk '{gsub(/[-T:+]/,""); printf"%.0f\
", (systime() - mktime($0)) / (24*60*60)}'
6

请注意，使用"％.0f"进行舍入的实现取决于系统-可能舍入0.5或将其舍入到最接近的偶数。如果这是一个问题，请检查系统并在必要时编写自己的舍入函数。

How to *return* *only* the nth occurrence of a regex match using sed?

How to return only the nth occurrence of a regex match using sed?