.\" Man page generated from reStructuredText. . .TH "AZURE-DATALAKE-STORE" "1" "Oct 19, 2020" "0.0.51" "azure-datalake-store" .SH NAME azure-datalake-store \- azure-datalake-store Documentation . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .sp A pure\-python interface to the Azure Data\-lake Storage system, providing pythonic file\-system and file objects, seamless transition between Windows and POSIX remote paths, high\-performance up\- and down\-loader. .sp This software is under active development and not yet recommended for general use. .SH INSTALLATION .sp Using \fBpip\fP: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C pip install azure\-datalake\-store .ft P .fi .UNINDENT .UNINDENT .sp Manually (bleeding edge): .INDENT 0.0 .IP \(bu 2 Download the repo from \fI\%https://github.com/Azure/azure\-data\-lake\-store\-python\fP .IP \(bu 2 checkout the \fBdev\fP branch .IP \(bu 2 install the requirements (\fBpip install \-r dev_requirements.txt\fP) .IP \(bu 2 install in develop mode (\fBpython setup.py develop\fP) .IP \(bu 2 optionally: build the documentation (including this page) by running \fBmake html\fP in the docs directory. .UNINDENT .SH AUTH .sp Although users can generate and supply their own tokens to the base file\-system class, and there is a password\-based function in the \fBlib\fP module for generating tokens, the most convenient way to supply credentials is via environment parameters. This latter method is the one used by default in library. The following variables are required: .INDENT 0.0 .IP \(bu 2 azure_tenant_id .IP \(bu 2 azure_username .IP \(bu 2 azure_password .IP \(bu 2 azure_store_name .IP \(bu 2 azure_url_suffix (optional) .UNINDENT .SH PYTHONIC FILESYSTEM .sp The \fBAzureDLFileSystem\fP object is the main API for library usage of this package. It provides typical file\-system operations on the remote azure store .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C token = lib.auth(tenant_id, username, password) adl = core.AzureDLFileSystem(store_name, token) # alternatively, adl = core.AzureDLFileSystem() # uses environment variables print(adl.ls()) # list files in the root directory for item in adl.ls(detail=True): print(item) # same, but with file details as dictionaries print(adl.walk(\(aq\(aq)) # list all files at any directory depth print(\(aqUsage:\(aq, adl.du(\(aq\(aq, deep=True, total=True)) # total bytes usage adl.mkdir(\(aqnewdir\(aq) # create directory adl.touch(\(aqnewdir/newfile\(aq) # create empty file adl.put(\(aqremotefile\(aq, \(aq/home/myuser/localfile\(aq) # upload a local file .ft P .fi .UNINDENT .UNINDENT .sp In addition, the file\-system generates file objects that are compatible with the python file interface, ensuring compatibility with libraries that work on python files. The recommended way to use this is with a context manager (otherwise, be sure to call \fBclose()\fP on the file object). .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C with adl.open(\(aqnewfile\(aq, \(aqwb\(aq) as f: f.write(b\(aqindex,a,b\en\(aq) f.tell() # now at position 9 f.flush() # forces data upstream f.write(b\(aq0,1,True\(aq) with adl.open(\(aqnewfile\(aq, \(aqrb\(aq) as f: print(f.readlines()) with adl.open(\(aqnewfile\(aq, \(aqrb\(aq) as f: df = pd.read_csv(f) # read into pandas. .ft P .fi .UNINDENT .UNINDENT .sp To seamlessly handle remote path representations across all supported platforms, the main API will take in numerous path types: string, Path/PurePath, and AzureDLPath. On Windows in particular, you can pass in paths separated by either forward slashes or backslashes. .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C import pathlib # only >= Python 3.4 from pathlib2 import pathlib # only <= Python 3.3 from azure.datalake.store.core import AzureDLPath # possible remote paths to use on API p1 = \(aq\e\efoo\e\ebar\(aq p2 = \(aq/foo/bar\(aq p3 = pathlib.PurePath(\(aq\e\efoo\e\ebar\(aq) p4 = pathlib.PureWindowsPath(\(aq\e\efoo\e\ebar\(aq) p5 = pathlib.PurePath(\(aq/foo/bar\(aq) p6 = AzureDLPath(\(aq\e\efoo\e\ebar\(aq) p7 = AzureDLPath(\(aq/foo/bar\(aq) # p1, p3, and p6 only work on Windows for p in [p1, p2, p3, p4, p5, p6, p7]: with adl.open(p, \(aqrb\(aq) as f: print(f.readlines()) .ft P .fi .UNINDENT .UNINDENT .SH PERFORMANT UP-/DOWN-LOADING .sp Classes \fBADLUploader\fP and \fBADLDownloader\fP will chunk large files and send many files to/from azure using multiple threads. A whole directory tree can be transferred, files matching a specific glob\-pattern or any particular file. .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C # download the whole directory structure using 5 threads, 16MB chunks ADLDownloader(adl, \(aq\(aq, \(aqmy_temp_dir\(aq, 5, 2**24) .ft P .fi .UNINDENT .UNINDENT .SH AUTHOR TBD .SH COPYRIGHT TBD .\" Generated by docutils manpage writer. .