Pandas: How to create a datetime object from Week and Year?
我有一个数据框,它提供两个整数列,分别是年份和年份:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import pandas as pd import numpy as np L1 = [43,44,51,2,5,12] L2 = [2016,2016,2016,2017,2017,2017] df = pd.DataFrame({"Week":L1,"Year":L2}) df Out[72]: Week Year 0 43 2016 1 44 2016 2 51 2016 3 2 2017 4 5 2017 5 12 2017 |
我需要从这两个数字创建一个datetime-object。
我试过了,但是抛出一个错误:
1 | df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1) |
然后我尝试了一下,它可以工作,但是给出了错误的结果,即它完全忽略了一周:
1 2 3 4 5 6 7 8 9 10 11 12 | df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str) df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y')) df Out[74]: Week Year S DT 0 43 2016 43-2016 2016-01-01 1 44 2016 44-2016 2016-01-01 2 51 2016 51-2016 2016-01-01 3 2 2017 2-2017 2017-01-01 4 5 2017 5-2017 2017-01-01 5 12 2017 12-2017 2017-01-01 |
我真的迷失在Python的
我正在使用Python 3,如果这在任何方面都有意义的话。
编辑:
从Python 3.8开始,可以使用针对datetime.date对象的新引入的方法轻松解决该问题:https://docs.python.org/3/library/datetime.html#datetime.date.fromisocalendar
就像@Gianmario Spacagna提到的日期时间较长(例如2018年)一样,将
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | L1 = [43,44,51,2,5,12,52,53,1,2,5,52] L2 = [2016,2016,2016,2017,2017,2017,2018,2018,2019,2019,2019,2019] df = pd.DataFrame({"Week":L1,"Year":L2}) df['new'] = pd.to_datetime(df.Week.astype(str)+ df.Year.astype(str).add('-1') ,format='%V%G-%u') print (df) Week Year new 0 43 2016 2016-10-24 1 44 2016 2016-10-31 2 51 2016 2016-12-19 3 2 2017 2017-01-09 4 5 2017 2017-01-30 5 12 2017 2017-03-20 6 52 2018 2018-12-24 7 53 2018 2018-12-31 8 1 2019 2018-12-31 9 2 2019 2019-01-07 10 5 2019 2019-01-28 11 52 2019 2019-12-23 |
试试这个:
1 2 3 4 5 6 7 8 9 10 | In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \\ pd.to_timedelta(df.Week.mul(7).astype(str) + ' days') Out[19]: 0 2016-10-28 1 2016-11-04 2 2016-12-23 3 2017-01-15 4 2017-02-05 5 2017-03-26 dtype: datetime64[ns] |
Initially I have timestamps in
s
从UNIX纪元时间戳解析它要容易得多:
1 | df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s') |
10M行DF的计时:
建立:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date']) In [27]: df.shape Out[27]: (10000000, 1) In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9 In [30]: df Out[30]: date unix_ts 0 1970-01-01 00:00:00 0 1 1970-01-01 00:01:00 60 2 1970-01-01 00:02:00 120 3 1970-01-01 00:03:00 180 4 1970-01-01 00:04:00 240 5 1970-01-01 00:05:00 300 6 1970-01-01 00:06:00 360 7 1970-01-01 00:07:00 420 8 1970-01-01 00:08:00 480 9 1970-01-01 00:09:00 540 ... ... ... 9999990 1989-01-05 10:30:00 599999400 9999991 1989-01-05 10:31:00 599999460 9999992 1989-01-05 10:32:00 599999520 9999993 1989-01-05 10:33:00 599999580 9999994 1989-01-05 10:34:00 599999640 9999995 1989-01-05 10:35:00 599999700 9999996 1989-01-05 10:36:00 599999760 9999997 1989-01-05 10:37:00 599999820 9999998 1989-01-05 10:38:00 599999880 9999999 1989-01-05 10:39:00 599999940 [10000000 rows x 2 columns] |
检查:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | In [31]: pd.to_datetime(df.unix_ts, unit='s') Out[31]: 0 1970-01-01 00:00:00 1 1970-01-01 00:01:00 2 1970-01-01 00:02:00 3 1970-01-01 00:03:00 4 1970-01-01 00:04:00 5 1970-01-01 00:05:00 6 1970-01-01 00:06:00 7 1970-01-01 00:07:00 8 1970-01-01 00:08:00 9 1970-01-01 00:09:00 ... 9999990 1989-01-05 10:30:00 9999991 1989-01-05 10:31:00 9999992 1989-01-05 10:32:00 9999993 1989-01-05 10:33:00 9999994 1989-01-05 10:34:00 9999995 1989-01-05 10:35:00 9999996 1989-01-05 10:36:00 9999997 1989-01-05 10:37:00 9999998 1989-01-05 10:38:00 9999999 1989-01-05 10:39:00 Name: unix_ts, Length: 10000000, dtype: datetime64[ns] |
定时:
1 2 | In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s') 10 loops, best of 3: 156 ms per loop |
结论:我认为156毫秒转换1000万行并不算慢
从2019年开始的几周中发生了一些麻烦。ISO-8601标准将2018年12月31日指定为2019年的第1周。其他方法基于:
1 2 | pd.to_datetime(df.Week.astype(str)+ df.Year.astype(str).add('-2') ,format='%W%Y-%w') |
从2019年开始将给出变化的结果。
为了符合ISO-8601标准,您必须执行以下操作:
1 2 3 4 5 6 7 8 9 | import pandas as pd import datetime L1 = [52,53,1,2,5,52] L2 = [2018,2018,2019,2019,2019,2019] df = pd.DataFrame({"Week":L1,"Year":L2}) df['ISO'] = df['Year'].astype(str) + '-W' + df['Week'].astype(str) + '-1' df['DT'] = df['ISO'].map(lambda x: datetime.datetime.strptime(x,"%G-W%V-%u")) print(df) |
它打印:
1 2 3 4 5 6 7 | Week Year ISO DT 0 52 2018 2018-W52-1 2018-12-24 1 53 2018 2018-W53-1 2018-12-31 2 1 2019 2019-W1-1 2018-12-31 3 2 2019 2019-W2-1 2019-01-07 4 5 2019 2019-W5-1 2019-01-28 5 52 2019 2019-W52-1 2019-12-23 |
2018年的第53周将被忽略并映射到2019年的第1周。
请在https://www.epochconverter.com/weeks/2019上进行验证。