关于python:无法上传>2GB的文件到云存储

can't upload > ~2GB to Google Cloud Storage

跟踪如下。

相关的python代码段:

1
2
3
bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)

最终触发(来自SSL库):

OverflowError: string longer than 2147483647 bytes

我想我缺少一些特殊的配置选项?

这可能与这个~1.5年前的问题有关:https://github.com/googledatalab/datalab/issues/784。

感谢您的帮助!

全迹:

[File"/usr/src/app/gcloud/download_data.py", line 109, in *******
blob.upload_from_filename(source_path)

File"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename
size=total_bytes)

File"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file
client, file_obj, content_type, size, num_retries)

File"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload
client, stream, content_type, size, num_retries)

File"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload
transport, data, object_metadata, content_type)

File"/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit
retry_strategy=self._retry_strategy)

File"/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request
func, RequestsMixin._get_status_code, retry_strategy)

File"/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry
response = func()

File"/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request
method, url, data=data, headers=request_headers, **kwargs)

File"/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)

File"/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)

File"/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send
timeout=timeout

File"/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)

File"/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)

File"/usr/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)

File"/usr/lib/python3.5/http/client.py", line 1151, in _send_request
self.endheaders(body)

File"/usr/lib/python3.5/http/client.py", line 1102, in endheaders
self._send_output(message_body)

File"/usr/lib/python3.5/http/client.py", line 936, in _send_output
self.send(message_body)

File"/usr/lib/python3.5/http/client.py", line 908, in send
self.sock.sendall(data)

File"/usr/lib/python3.5/ssl.py", line 891, in sendall
v = self.send(data[count:])

File"/usr/lib/python3.5/ssl.py", line 861, in send
return self._sslobj.write(data)

File"/usr/lib/python3.5/ssl.py", line 586, in write
return self._sslobj.write(data)

OverflowError: string longer than 2147483647 bytes


问题是它试图将整个文件读取到内存中。从upload_from_filename的链状图可以看出它是stats文件,然后将其作为一个上传部分作为上传大小传入。

相反,在创建对象时指定chunk_size将触发它以多个部分上载:

1
2
3
# Must be a multiple of 256KB per docstring    
CHUNK_SIZE = 10485760  # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)

快乐黑客!