Category : Microsoft Azure Data Engineering | Sub Category : Databricks | By Prasad Bonam Last updated: 2023-09-23 10:03:00 Viewed : 497
In Azure Databricks, the dbutils.fs
(Databricks Utilities File System) module is a powerful utility that provides a set of commands for performing file and directory operations within the Databricks File System (DBFS). DBFS is a distributed file system that allows you to interact with and manage files and directories in Databricks workspaces. Here are some common commands and use cases for dbutils.fs
:
List Files and Directories (dbutils.fs.ls
):
dbutils.fs.ls(path)
to list files and directories in the specified path.pythonfiles = dbutils.fs.ls("/mnt/<mount-name>")
for file in files:
print(file.path)
Read a File (dbutils.fs.head
):
dbutils.fs.head(file_path)
to read the content of a file in DBFS.pythonfile_content = dbutils.fs.head("/mnt/<mount-name>/file.txt")
print(file_content)
Write to a File (dbutils.fs.put
):
dbutils.fs.put(source, destination, overwrite=True)
to write data to a file in DBFS.pythondata = "This is some sample data."
dbutils.fs.put("/mnt/<mount-name>/new-file.txt", data, overwrite=True)
Copy Files (dbutils.fs.cp
):
dbutils.fs.cp(source_path, destination_path)
to copy files from one location to another within DBFS.pythondbutils.fs.cp("/mnt/<source-mount>/source-file.csv", "/mnt/<destination-mount>/destination-file.csv")
Create a Directory (dbutils.fs.mkdirs
):
dbutils.fs.mkdirs(directory_path)
to create a new directory in DBFS.pythondbutils.fs.mkdirs("/mnt/<mount-name>/new-directory")
Delete a File or Directory (dbutils.fs.rm
):
dbutils.fs.rm(file_or_directory_path, recurse=False)
to delete a file or directory in DBFS. Use recurse=True
to delete directories and their contents recursively.pythondbutils.fs.rm("/mnt/<mount-name>/file-to-delete.csv")
# To delete a directory and its contents recursively:
dbutils.fs.rm("/mnt/<mount-name>/directory-to-delete", recurse=True)
Move or Rename a File (dbutils.fs.mv
):
dbutils.fs.mv(source_path, destination_path)
to move or rename a file in DBFS.pythondbutils.fs.mv("/mnt/<mount-name>/old-file.txt", "/mnt/<mount-name>/new-file.txt")
Get File Size (dbutils.fs.fsStats
):
dbutils.fs.fsStats(file_path)
to retrieve the size of a file in DBFS.pythonfile_stats = dbutils.fs.fsStats("/mnt/<mount-name>/file.txt")
file_size = file_stats[0].size
print(f"File size: {file_size} bytes")
Check If a File Exists (dbutils.fs.fsExists
):
dbutils.fs.fsExists(file_or_directory_path)
to check if a file or directory exists in DBFS.pythonfile_exists = dbutils.fs.fsExists("/mnt/<mount-name>/file.txt")
if file_exists:
print("File exists.")
else:
print("File does not exist.")
These commands and utilities provided by dbutils.fs
make it easier to work with files and directories within the Databricks File System (DBFS) in Azure Databricks notebooks.