File System utility(dbutils.fs) of Databricks Utilities in Azure Databricks

Category : Microsoft Azure Data Engineering | Sub Category : Databricks | By Prasad Bonam Last updated: 2023-09-23 10:03:00 Viewed : 497


In Azure Databricks, the dbutils.fs (Databricks Utilities File System) module is a powerful utility that provides a set of commands for performing file and directory operations within the Databricks File System (DBFS). DBFS is a distributed file system that allows you to interact with and manage files and directories in Databricks workspaces. Here are some common commands and use cases for dbutils.fs:

  1. List Files and Directories (dbutils.fs.ls):

    • Use dbutils.fs.ls(path) to list files and directories in the specified path.
    • Example:
      python
      files = dbutils.fs.ls("/mnt/<mount-name>") for file in files: print(file.path)
  2. Read a File (dbutils.fs.head):

    • Use dbutils.fs.head(file_path) to read the content of a file in DBFS.
    • Example:
      python
      file_content = dbutils.fs.head("/mnt/<mount-name>/file.txt") print(file_content)
  3. Write to a File (dbutils.fs.put):

    • Use dbutils.fs.put(source, destination, overwrite=True) to write data to a file in DBFS.
    • Example:
      python
      data = "This is some sample data." dbutils.fs.put("/mnt/<mount-name>/new-file.txt", data, overwrite=True)
  4. Copy Files (dbutils.fs.cp):

    • Use dbutils.fs.cp(source_path, destination_path) to copy files from one location to another within DBFS.
    • Example:
      python
      dbutils.fs.cp("/mnt/<source-mount>/source-file.csv", "/mnt/<destination-mount>/destination-file.csv")
  5. Create a Directory (dbutils.fs.mkdirs):

    • Use dbutils.fs.mkdirs(directory_path) to create a new directory in DBFS.
    • Example:
      python
      dbutils.fs.mkdirs("/mnt/<mount-name>/new-directory")
  6. Delete a File or Directory (dbutils.fs.rm):

    • Use dbutils.fs.rm(file_or_directory_path, recurse=False) to delete a file or directory in DBFS. Use recurse=True to delete directories and their contents recursively.
    • Example:
      python
      dbutils.fs.rm("/mnt/<mount-name>/file-to-delete.csv") # To delete a directory and its contents recursively: dbutils.fs.rm("/mnt/<mount-name>/directory-to-delete", recurse=True)
  7. Move or Rename a File (dbutils.fs.mv):

    • Use dbutils.fs.mv(source_path, destination_path) to move or rename a file in DBFS.
    • Example:
      python
      dbutils.fs.mv("/mnt/<mount-name>/old-file.txt", "/mnt/<mount-name>/new-file.txt")
  8. Get File Size (dbutils.fs.fsStats):

    • Use dbutils.fs.fsStats(file_path) to retrieve the size of a file in DBFS.
    • Example:
      python
      file_stats = dbutils.fs.fsStats("/mnt/<mount-name>/file.txt") file_size = file_stats[0].size print(f"File size: {file_size} bytes")
  9. Check If a File Exists (dbutils.fs.fsExists):

    • Use dbutils.fs.fsExists(file_or_directory_path) to check if a file or directory exists in DBFS.
    • Example:
      python
      file_exists = dbutils.fs.fsExists("/mnt/<mount-name>/file.txt") if file_exists: print("File exists.") else: print("File does not exist.")

These commands and utilities provided by dbutils.fs make it easier to work with files and directories within the Databricks File System (DBFS) in Azure Databricks notebooks.

Search
Related Articles

Leave a Comment: