FTP server on Kubernetes with cloud storage and pubsub

Getting an FTP server running on Kubernetes is a little tricky. The FTP service uses multiple ports in its negotiation and you need to make sure that the conversation always connects to the same Kubernetes pod. FTP is not a great choice nowadays, SFTP is better, but in this case, I needed to build a server that would handle uploads from devices that were not that configurable, so it had to be plain FTP to get going. Much better though to secure it with tls if possible.

In addition, I wanted to write more or less directly to cloud storage (for backup) and use pubsub to initiate a workflow based on the uploaded data.

Page Content hide

1 FTP server on Node

1.1 Kubernetes cluster

2 Staging, dev, production

2.1 Whitelisting

3 Yarn/npm

3.1 common package.json

3.1.1 ftp app package.json

4 Ftp app

4.1 Event handling

5 The custom file system

6 Deployment

6.1 The external load balancer

7 Building the app

7.1 The docker file

7.1.1 running locally

8 Private files

9 Pubsub

10 Summary

FTP server on Node

I’m not using a pre-baked server because of all the reasons I mentioned earlier, so instead, I chose this https://github.com/trs/ftp-srv mainly because

You can create a custom file system, which allows me to create one based on cloud storage
FTP is not secure, so I wanted to lock it down as much as possible – limiting it to only being able to process small uploads, and nothing else.
ftp-srv supports tls, although I’m not yet using it in this proof of concept

The code for this entire project is at https://github.com/brucemcpherson/sensor

Kubernetes cluster

For testing, I’m using a preemptible cluster to keep the costs down. I always find it easier to create Kubernetes stuff through gcloud and kubectl rather than using the UI in the cloud console. Like that you can easily repeat it. So first, let’s get a small cluster up and running in your cloud project

gcloud beta container --project "yourproject" clusters create "sensor" \
--zone "europe-west4-b" \
--username "admin"  \
--machine-type "g1-small" \
--image-type "COS" \
--disk-type "pd-standard" \
--disk-size "100"  \
--preemptible \
--num-nodes "2" \
--enable-autoscaling \
--min-nodes "2" \
--max-nodes "4" \
--enable-autoupgrade \
--enable-autorepair

Set up credentials and check cluster looks ok

gcloud container clusters get-credentials sensor --zone europe-west4-b
kubectl config get-contexts 
kubectl get nodes

Assign yourself as admin

kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=bruce@mcpher.com

Staging, dev, production

I generally have 3 different environments for testing locally, testing in docker and Kubernetes, and a final version. These will (potentially) each have different parameters and settings, so the first task is to create a settings file (secrets.js) which will be shared between all apps in this project and which partitions the parameters into local, development and production. These will be labelled ‘tl’,’td’ and ‘tp’. It looks something like this

const settings = {
  tp,
  td,
  tl
}

module.exports = {
  settings
}

and each of tp,td and tl contain stuff like this. I’ll get into the details of these settings when they are needed. I always find it’s best to start with this kind of structure to avoid refactoring later and making it easy to add (or remove) different environments.

const td =   {
  pubsub: {
    use: true,
    includeData: true,
    forcePush: false,
    dataArrived: 'xxxxxx-td',
    dataReady: 'xxxxxxxx-td',
    pusher: false,
    idleTime: 60000,
    gcp: {
      info: {
        serviceAccountFile: './private/sensorgcpupload.json'
      },
      get creds () {
        return {
          credentials: require(this.info.serviceAccountFile)
        }
      }
    }
  },
  ftp: {
    tmp: '/tmp/',
    users: require('./private/users.json').td,
    logging: {name: 'td', level: 'warn'},
    gcp: {
      use: true,
      info: {
        serviceAccountFile: './private/sensorgcpupload.json',
        bucketName: 'xxxxxx',
        folderName: 'ftp-dev'
      },
      get creds () {
        return {
          credentials: require(this.info.serviceAccountFile),
          bucketName: this.info.bucketName,
          folderName: this.info.folderName
        }
      }
    },
    instance: {  
      anonymous: false, 
      url: 'ftp://127.0.0.1:170021',
      pasv_url: 'ftp://127.0.0.1',
      pasv_min: 18101,
      pasv_max: 18104,
      greeting: 'dev runmode',
      whitelist
    }
  }
}

Whitelisting

ftp-srv gives the ability to whitelist only certain FTP directives. Since I only want to upload files, and nothing more, this is the minimum I can get away with.

const whitelist = [
  'STOR',
  'PASS',
  'PASV',
  'USER',
  'TYPE',
  'QUIT',
  'ABOR',
  'FEAT'
]

Yarn/npm

Because multiple apps will make up this project, I have a top level package.json for things that are used in every app, then a specific for each app.

common package.json

{
  "name": "sensor",
  "version": "1.0.0",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "@google-cloud/pubsub": "^1.7.2",
    "@google-cloud/storage": "^4.7.0",
    "pluck-deep": "^3.0.0"
  }
}

ftp app package.json

{
  "name": "ftp",
  "version": "1.0.0",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "bunyan": "^1.8.12",
    "ftp-srv": "^4.3.1",
    "mime-types": "^2.1.26",
    "through": "^2.3.8"
  }
}

Ftp app

I won’t replicate all the code here as it’s on GitHub, but just highlight and explain a few pieces

const { server } = require('./src/server')

// -- THESE ARE shared across all apps in the project
const { libPubSub } = require ('../common/libpubsub')
const { settings:sa } = require ('../common/secrets')

// -- ENVIRONMENT overrides default settings, which are determined by the run mode
// -- LIKE this we can build for each environment with the same source code
const {env} = process
const runmode = env.RUNMODE || 'tl'
const defaultSettings = sa[runmode]
const defaultFtp = defaultSettings.ftp
const defaultInstance = defaultFtp.instance
const settings = {
  runmode,
  version:'1.0.0',
  ...defaultSettings,
  ftp: {
    ...defaultFtp,
    instance: {
      ...defaultInstance,
      url: env.URL || defaultInstance.url,
      pasv_url: env.PASVURL || defaultInstance.pasv_url,
      pasv_min: env.PASVMIN || defaultInstance.pasv_min,
      pasv_max: env.PASVMAX || defaultInstance.pasv_max
    }
  }
}
const instance = settings.ftp.instance

console.log(`${new Date().getTime()}:starting ftp server ${settings.runmode}-${settings.version} on ${instance.url}`)
console.log(`${new Date().getTime()}:passive ports ${instance.pasv_min}-${instance.pasv_max} on ${instance.pasv_url}`)
// -- THIS kicks off the pubsub environment
libPubSub.init ({settings})

// intialize server to run in spectified runmode
server.init ({settings})

// kick it off
server.instance.listen()

Event handling

The only client/server events that need handling are login, and client-error. Login checks the user/password combination and then goes on to upload the file. When the upload is done, I also need to handle sending a pubsub message (the pubsub code is on GitHub), consisting of some control information and the contents of the file just uploaded.

const { handle } = require('./handle')
const {FtpSrv} = require('ftp-srv')
const bunyan = require('bunyan')
const { libPubSub } = require ('../../common/libpubsub')

const server = {
  // the server instance
  instance: null,

  // call this to kick everything off
  init ({settings}) {
    this.settings = settings
    const options = this.settings.ftp
    
    const instance = new FtpSrv({ 
      ...options.instance,
      log: options.logging ? bunyan.createLogger(options.logging) : null
    })
    this.instance = instance
    console.log(`listening for ftp on ${options.instance.url} (active) ${options.instance.pasv_url} (passive)`)
    
    // cusotm event handlers 
    this.handlers({
      onStorageError ({connection, error, fileName, settings}) {
        // handle a failure uploading
        console.error('failed on upload ', error, fileName)
      
      },
      onStorageFinished ({connection, storageName, settings, fileName, data}) {
        // if we're pubsubbing, now is the time
        const now = new Date().getTime()
        if (settings.pubsub.use) {
          /// the data is a stream buffer, so we'll encode to base64
          libPubSub.publish({
            ob: {
              sentAt: now,
              fileName,
              storageName,
              data: data.toString('base64')
            },
            name: 'dataArrived'
          })
        }
        console.log(`${now}:upload completion detected ${fileName} uploaded to ${storageName}`)

      }
    })
      
  },
  // setup event handlers
  handlers (customHandlers) {
    this.instance.on('login',(data, resolve, reject) =>
      handle.login({settings: this.settings, data, resolve, reject,...customHandlers}))

    this.instance.on('client-error', (connection, context, error) =>
      handle.clientError({settings: this.settings, connection, context, error})
    )
  }
}

module.exports = {
  server
}

The custom file system

ftp-srv allows the creation of a custom file system. The only method that needs to be overridden is write – which is called to handle streaming of data as it arrives since all other operations are not whitelisted. Rather than writing it to a file, which is the normal action, I need to stream it to cloud storage. (As it turned out, ftp-srv had some problem that made its stream incompatible with cloud storage streaming, so in practice, the file is temporarily written to the container’s local storage, then that file is streamed to cloud storage and finally deleted)

const {FileSystem} = require('ftp-srv');
const fs = require('fs');
const mime = require('mime-types');
const Storage = require('@google-cloud/storage').Storage;
const through = require('through')
const path = require('path')

const {createWriteStream} = require('fs');

class GcpFileSystem extends FileSystem {
  
  constructor() {
    super(...arguments);
    const [connection, options] = [...arguments] 
    this.settings = options.settings
    this.connection = connection

    // these wil get fired when storage is uploaded to 
    this.onStorageError = options.onStorageError
    this.onStorageFinished = options.onStorageFinished
  }

  /**
   * get an authorized storage object to stream to
   */
  getStorage () {
    const {creds } = this.settings.ftp.gcp;
    const { credentials, bucketName, folderName } = creds;
    const { projectId } = credentials;
    return {
      storage: new Storage({
        projectId,
        credentials,
      }),
      bucketName,
      folderName
    };
  };

  /* create the cloud storage stream
  * the credentials/bucket name and filename are in the secrets file
  * @param {string} name the filename to stream from
  */
  storageWriteStream  ({name}) {
    
    const mimeType = mime.lookup(name);
    const fileTarget = this.getFileTarget({name});
    const options = {
      contentType: mimeType
    };

    return {
      stream: fileTarget.file.createWriteStream(options),
      name: fileTarget.file.name,
      bucket: fileTarget.file.bucket.name
    }
  };

/**
 * create the cloud storage file target
 * the credentials/bucket name and filename are in the secrets file
 * @param {string} name the filename to stream from
 */
  getFileTarget  ({name}) {
    // get a storage obect
    let {storage, bucketName, folderName} = this.getStorage();
    // handle gs: type names too
    if (name.slice(0, 5) === 'gs://') {
      bucketName = name.replace(/gs:\/\/([^\/]+).*/, '$1');
      name = name.replace('gs://' + bucketName + '/', '');
    }
    name = folderName + '/' + name;
    const bucket = storage.bucket(bucketName);
    // we'll actually be streaming to/from  this file
    return {
      file: bucket.file(name)
    }

  };


  write(fileName, {append = false, start = undefined} = {}) {

    // for now we'll temporarily write it to a unique path
    // then load to storage from there
    // then delete the file
    // because of this https://github.com/trs/ftp-srv/issues/199

    const {settings, onStorageError, onStorageFinished, connection} = this
    const {pubsub, ftp} = settings
    // get a unique name for the file and add the extension
    const fsPath = ftp.tmp + this.getUniqueName() + path.extname(fileName)

    // and a temp stream
    const stream = createWriteStream(fsPath, {flags: !append ? 'w+' : 'a+', start});

    const removeTemp = (f) => {
      // delete the temp file
      fs.unlink(f, err=> {
        if(err) {
          console.log('failred to remove temp file',f)
        } else {
          console.log('temp file', f, 'removed')
        }
      })
    }
    const storageError = (error) => {
      // delete the temp file
      removeTemp(fsPath)
      console.error('stream failure', error)
      if(onStorageError) {
        onStorageError({connection, error, fileName, settings})
      } 
    }
    
    // if this fails then it wasnt able to write temp file
    stream.once('error', error=> storageError(error))
    
    // temp file was written, now write to storage
    stream.once('close', () => {
      // temp file is created, now stream it to storage
      const ws = this.storageWriteStream ({name:fileName})
      const storageStream = ws.stream
      const readStream = fs.createReadStream(fsPath);
      const storageName = `${ws.bucket}/${ws.name}`
      const before = new Date().getTime()
      console.log(`${before}:started writing ${fileName} to gcp at ${storageName}`)

      // copy file to storage through a pipe so we can catch the data for publishing
      // but only do this if the files are expected to be small
      const streamedData = []
      readStream.pipe(through(
        function write (data) {
          // this is where we'll catch the data
          if (pubsub.includeData) {
            streamedData.push(data)
          }
          this.queue(data) 
        },
        function end (data) {
          this.queue(null) 
        }
      )).pipe(storageStream)


      // This catches any errors that happen while creating the readable stream (usually invalid names)
      readStream.once('error', error => storageError(error))
      storageStream.once('error', error => storageError(error))
      
      // when done, we can resolve the stream
      storageStream.on('finish', function() {
        const now = new Date().getTime()
        console.log(`${now}:finished writing ${fileName} to gcp at ${storageName} (${now-before}ms)`)
        // signal all over
        stream.end()
        // call the thing to do when its over
        if(onStorageFinished) {
          onStorageFinished({connection, storageName, settings, fileName, data: Buffer.concat(streamedData)})
        } 
        // delete the temp file
        removeTemp(fsPath)

      });

    })
    return {
      stream
    };
  }
}
module.exports = {
  GcpFileSystem
}

Deployment

kubectl apply -f deploy.yaml

This is set up to use ports 18101 – 18104 for passive FTP. The number of ports in use will define how many simultaneous uploads can happen. The range is passed to the FTP app via an environment variable. Note the selector app=ftptd. This will be used by the service that exposes the pods of this deployment externally as a service target.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ftptd
  namespace: default

spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: ftptd
  template:
    metadata:
      labels:
        app: ftptd
    spec:
      containers:
      - image: gcr.io/fid-sql/ftp:td
        imagePullPolicy: Always
        name: ftptd
        resources: {}
        ports:
        - containerPort: 18021
          protocol: TCP
        - containerPort: 18022
          protocol: TCP
        - containerPort: 18101
          protocol: TCP
        - containerPort: 18102
          protocol: TCP
        - containerPort: 18103
          protocol: TCP
        - containerPort: 18104
          protocol: TCP         
        env:
        - name: URL
          value: "ftp://0.0.0.0:18021"
        - name: PASVURL
          value: "ftp://to.be.assigned.later-just use 0.0.0.0 for now"
        - name: PASVMIN
          value: "18101"
        - name: PASVMAX
          value: "18104"
        - name: RUNMODE
          value: "td"

The PASVURL environment variable defines the externally facing IP address that the load balancer service, to be created next, will assign. We don’t know it at this point, but once assigned it can just be patched in here and the .yaml file reapplied.

The external load balancer

kubectl apply -f svc.yaml

The deployment pods need a service to expose them externally. For simplicity, I’m using the same ports here as the pods, but if required these could be forwarded to different target ports.

apiVersion: v1
kind: Service
metadata:
  name: ftptd
  namespace: default
spec:
  externalTrafficPolicy: Local
  ports:
  - name: port-1
    port: 18021
    protocol: TCP
    targetPort: 18021
  - name: port-2
    port: 18022
    protocol: TCP
    targetPort: 18022
  - name: port-3
    port: 18101
    protocol: TCP
    targetPort: 18101
  - name: port-4
    port: 18102
    protocol: TCP
    targetPort: 18102
  - name: port-5
    port: 18103
    protocol: TCP
    targetPort: 18103
  - name: port-6
    port: 18104
    protocol: TCP
    targetPort: 18104
  selector:
    app: ftptd
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800
  type: LoadBalancer

Because passive ftp is not stateless, it’s important that clients always connect to the same pod instance. Using sessionAffinity: ClientIP means that the same client will connect to the same pod with second and subsequent requests. The client also checks that the IP matches the one it supplied, so we need to set externalTrafficPolicy: Local to avoid the ip address being translated from an external to an internal one in Passive mode.

After a little while, kubectl get service will show an actual external ip address for the service (it will show <pending> for while). Copy this external address into the deploy.yaml

- name: PASVURL 
  value: "ftp://the external ip address"

and redo

kubectl apply -f deploy.yaml

Building the app

I prefer to build locally rather than using cloud build, as it makes it easier to test the docker image locally before trying it on the cluster.

This build script takes 2 arguments – the name of the app plus the run mode, builds the image, tags it (note that the images are tagged with the run mode to allow different images to be used in development versus production on the same cluster), pushes it to the cloud container service, then deletes the matching pods allowing them to be recreated with the updated image. You could further enhance the tagging with version number if necessary – but the tag should match the tagged image name in the deployment YAML.

sh build.sh ftp td

D=$1
R=$2
echo "building $D and deploying $D$R"
echo "set the context to the project"
#need to run this from top of the project 
cd ~/sensor
NODE_ENV=production
P="yourproject"
Z="europe-west4-b"
C="sensor"
echo "starting build $D"
docker build -f containers/$D.dockerfile . --tag gcr.io/$P/$D:$R
docker push gcr.io/$P/$D:$R
echo "delete pods"
kubectl get pods | grep -Po  "^$D$R-[\w]+-[\w]+\s" | while read line; do kubectl delete pod $line ; done
kubectl get pods
kubectl get pods | grep -Po  "^$D$R-[\w]+-[\w]+\s" | while read line; do kubectl logs $line ; done

The docker file

Since the environment variables are set up in the YAML files, this can be very minimal.

FROM nikolaik/python-nodejs
LABEL maintainer="Bruce Mcpherson <bruce@mcpher.com>"
# build docker image for ftp dev mode
WORKDIR /usr/src/td/ftp
COPY package.json .
COPY ./common ./common
COPY ./ftp ./ftp
RUN yarn  --production=true
RUN cd ftp && yarn  --production=true
CMD [ "node", "ftp/index.js" ]

running locally

If you want to test the docker image locally (any simple FTP client should do – I’m using ncftp), this script should do it

docker run --env-file env.list -p 18021-18022:18021-18022 -p 18101-18104:18101-18104  gcr.io/fid-sql/ftp:td

with an env.list file of

URL=ftp://0.0.0.0:18021
PASVURL=ftp://0.0.0.0
PASVMIN=18101
PASVMAX=18104
RUNMODE=td

Alternatively, you can run it completely locally with node index

Private files

I haven’t published these, but you need 2 files in common/private

A service account file with the capability to write to storage and pubsub, exactly as downloaded from the cloud console. You can specify these through env variables, but I prefer to do it this way
A user file with username passwords that looks like this. There are much better ways to handle passwords, but this will do to get started with

{
  "tp":[{
    "name": "johndoe",
    "password": "some password",
    "role": "citizen"
  }],

  "td":[{
    "name": "bruce",
    "password": "another password",
    "role": "boss"
    }]  
}

These are referenced in the gcp and user properties in the settings file.

Pubsub

In this article, I won’t go into consuming the messages sent, but we need a topic and a subscription, which will probably be different between run modes. These are referenced in the settings file with the abstracted names ‘dataArrived’ and ‘dataReady’. Ftp publishes to the dataArrived topic, and any consumers will subscribe to ‘dataReady’.

Setting up topics and subscriptions through the UI can be error prone and laborious, so here’s a script to bulk delete and create the subscriptions needed for this project.

RO="roles/editor"

# service account email
SA="serviceAccount:xxxx@xxxx.iam.gserviceaccount.com"

# project 
P='your project'

# ack deadline
AD=420

# topics
TPA='zzzz-tp'
TDA='zzzz-td'

# subscriptions
TPR='xxxx-tp'
TDR='xxxx-td'

# message duration
MD="60m"
TMD="10m"

# make sure we're on the right project
gcloud config set project $P

# tidy up and remake the things
gcloud pubsub subscriptions delete projects/$P/subscriptions/$TPR
gcloud pubsub subscriptions delete projects/$P/subscriptions/$TDR
gcloud pubsub topics delete projects/$P/topics/$TPA
gcloud pubsub topics delete projects/$P/topics/$TDA

gcloud pubsub topics create $TPA
gcloud pubsub topics create $TDA

gcloud pubsub topics add-iam-policy-binding $TPA --member="$SA" --role=$RO
gcloud pubsub topics add-iam-policy-binding $TDA --member="$SA" --role=$RO

gcloud pubsub subscriptions create $TPR --topic=projects/$P/topics/$TPA --ack-deadline=$AD --expiration-period=never --message-retention-duration=$MD
gcloud pubsub subscriptions add-iam-policy-binding $TPR --member="$SA" --role="$RO"

gcloud pubsub subscriptions create $TDR --topic=projects/$P/topics/$TDA --ack-deadline=$AD --expiration-period=never --message-retention-duration=$MD
gcloud pubsub subscriptions add-iam-policy-binding $TDR --member="$SA" --role="$RO"