Improving Workflow with Cloud Storage FUSE
- Stefan Gouyet
- Aug 31, 2020
- 2 min read

I’ve been working with Cloud Storage Fuse for over a year now and have found it to be useful in automating/improving data engineering project workflow. As an adapter to the open-source FUSE implementation, Cloud Storage Fuse allows us to mount a GCS bucket as a file system.
This keeps our syntax clean, as instead of multiple lines of code devoted to reading/writing via the GCS client library, we can refer to our files with local paths.
Here’s how to get it set up:
Note: I am using a Ubuntu 18.04 LTS image in my Compute Engine VM; GCS Fuse does not currently work with Windows (only for Linux/MacOS).
Step 0: Install the Google Cloud SDK (if you have not already)
Step 1: Authentication
First, we need to authenticate for GCS Fuse to work. FUSE will auto-discover the credentials via any of three methods:

For our purposes, we will set the environment variable with the third option:
gcloud auth application-default login

Step 2: Install Cloud Storage Fuse
After authenticating, we will install Cloud Storage FUSE on our Ubuntu machine. This information can be found on Github, and provides instructions for other Linux distributions and Mac OS.
export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -sudo apt-get update
sudo apt-get install gcsfuse

Step 3: Mount Cloud Storage Bucket
Now that we have installed gcsfuse on our machine, we will mount our GCS bucket as a file system. I have created a bucket with default configurations using the function below, and then uploaded two csv files.
gsutil mb gs://test-gcsfuse8592

To mount Cloud Storage FUSE, we first create a new empty folder and then run gcsfuse to mount our bucket inside of the folder.
mkdir test-bucketgcsfuse --implicit-dirs test-gcsfuse8592 test-bucket/

Note: There are several flags available to change how gcsfuse works; I have personally always ran the function with the implicit-dirs flag, but there are some downsides to this, especially if you have many files and are worried about latency. More information can be found at: https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/semantics.md
The above commands leave us with our bucket mounted as a file system; our bucket contents (our two CSV files) can now be found in the test-bucket folder.

Step 4: Unmount Bucket
Unmounting is very simple as well, with the following command:
fusermount -u test-bucket
After unmounting the bucket, the folder remains, but the contents will have disappeared.

Thanks for reading!
This article was originally published on Medium on May 4, 2020.
Comentários