CJ Virtucio

Fault-tolerant File Uploads

2017-08-28
Introduction

The internet is a system of components that are independent and interact only through messages. This is the RESTful web architecture. Clients and servers send each other representations of state, and do not share state. Hence the name Representational State Transfer (REST).

In law, we have this concept of ‘fortuitous events’. Things can go wrong, and many times they’re beyond anyone’s control. Maybe your internet goes down. Maybe a flash flood inundates the server room. Generally, events like these relieve both parties of a contract of their obligations to each other. Sounds a lot like REST, doesn’t it? Stateless system, independent components; they’re not bound to each other beyond the messages they actually send.

But because ‘the contract is the law between the parties’, you can ‘override’ this rule. Sometimes it’s helpful when you need to incentivize another to join in on the contract. Similarly, for a network, it’s possible to structure your system to gracefully handle faults. I’ll be talking about one example in file uploads.


The TUS Protocol

The TUS Protocol is a standard for creating fault-tolerant file uploads. The idea is that the user first sends a preliminary HEAD request to see if the file is already in the process of being written. If it isn’t, a POST request is sent to create a new ‘upload’ resource. The server responds with the endpoint for that resource, and the user makes a series of PATCH requests to send a file to the server in chunks. Since state about the resource is maintained on the server, the user can always continue from where they left off. All they’d have to do is send another HEAD request to the server, and some service will retrieve the old file. For our project, we used an identifier based on the request headers.


Controllers

We kick off the process with a POST request to create a new ‘Resumable Upload’ resource. If you’re using Spring MVC, you’d probably write a controller class, and then a “create” method with the @PostMapping annotation.

The method signature will involve a couple of headers. The “Content-Length” header tells the server how large the payload will be (this is set automatically by the browser, and the browser will complain if you try to touch this on the front-end). “Upload-Length” is the anticipated file size; it helps if the server is made aware of this in advance, though the protocol does allow you to defer it. “Tus-Resumable” is how the client states what version of the protocol to use, and “Upload-Metadata” just holds additional information about the upload. Things like the filename and mime type come to mind.

The method would handle this request using a couple of services (we’ll go into that later). You definitely don’t want too much complicated logic in your controller; its job is really just to map the Data Layer to the View Layer.

When the resource is created, the controller redirects the user to it. The protocol doesn’t specify what identifier to use, but most implementations use randomly generated Base64 string. You could probably make it even more unique by concatenating that string with the username, like Basic HTTP Authentication, though that might not be very secure.

The “PATCH” update isn’t all too different. You definitely want to include an “Upload-Offset” header as an expected Request Header. The server needs to know where to continue. This is useful if you want to speed things up a bit by taking a concurrent approach to uploading the file. A discrepancy between this offset and the actual file pointer can be handled in many ways; a simple solution, if you’re not interested in concurrency, is to just use the file pointer as your offset.

Once the service has written the number of bytes specified in the request, the server prepares a response with the new offset. Make sure to update whatever datastore solution you have for remembering things like the file’s current size.

@Controller
public class Uploads {
  // ...
  
  @RequestMapping(value="/uploads", method={ RequestMethod.HEAD })
  public ResponseEntity<?> headUploads(
    @RequestHeader(value="Upload-Metadata") String meta, 
    @RequestHeader(value="Tus-Resumable") String ver
  ) {
    // ...
  }
  
  @RequestMapping(value="/uploads", method={ RequestMethod.POST })
  public ResponseEntity<?> postUploads(
    @RequestHeader(value="Content-Length") Long conLen,
    @RequestHeader(value="Upload-Length") Long upLen,
    @RequestHeader(value="Upload-Metadata") String meta,
    @RequestHeader(value="Tus-Resumable") String ver
  ) {
    // ...
  }
  
  @RequestMapping(value="/uploads/{id}", method={ RequestMethod.PATCH })
  public ResponseEntity<?> patchUpload(
    @RequestHeader(value="Content-Length") Long conLen,
    @RequestHeader(value="Upload-Length") Long upLen,
    @RequestHeader(value="Upload-Offset") Long upOff,
    @RequestHeader(value="Upload-Metadata") String meta,
    @RequestHeader(value="Tus-Resumable") String ver,
    HttpServletRequest req
  ) {
    // You'll be streaming the data from the request into your file using one of Java's many I/O libraries.
    // ...
  }

Services

There’s several ways to approach handling the logic that actually deals with writing the file. You could have a global singleton that maintains state about the different files in virtual memory, and commit changes to that state in a functional way, similar to Facebook’s flux approach to keeping track of client-side state. If you need persistence, Spring has a simple API with Hibernate. Designing your services depends on the needs of your application, but generally you should have a service that maintains state, and another service to act upon it, sending messages about what to write and to what extent. These services would then be injected into your controller class in order to deal with the data being streamed from the client.

// org.yournamespace.Store.java
@Service
public class Store {
  private Map<String, Info> fileState;
  
  public Store(Map fileState) {
    this.fileState = fileState;
  }
  
  @Bean
  public void update(Info info) {
    // ...
  }
  
  @Bean
  public void peek(Info info) {
    // ...
  }
}

// org.yournamespace.WritingUtil.java
@Service
public class WritingUtil {
  private Info fileInfo;
  
  @Bean
  public WritingUtil(Info info) {
    this.fileInfo = info;
  }
  
  @Bean
  public Long write() {
    // ...
  }
  
  @Bean
  public Long peek() {
    // ...
  }
}

The Tus website has a number of implementations that you can look into for inspiration. There are implementations in java and node, but the official one is Tus team’s very own Golang-based service, called tusd. You can check it out here.