Just to keep everyone abreast of the current state of the vault progress, I thought I’d post a quick video of our test network running (you’ll probably need to view it fullscreen, high quality if you want to see it properly).
The test’s very basic – just a single client node creating an account, then storing, retrieving and deleting a chunk of data. However, this represents quite a significant step forward. The ability to delete data reliably from a peer-to-peer network was one of the more difficult problems we faced. There are a few issues here.
A big one is; if a vault is asked to delete a chunk which it’s currently storing, how does it know the request is valid? The requester has to have originally stored the chunk to be allowed to delete it. Another biggie; if the same chunk gets stored several times, we only want to keep a maximum of four copies (it’s wasteful otherwise). This means if a chunk gets stored say 100 times, we only want to really delete it from the network once 100 delete requests have been made.
We looked at a variety of ways to handle these problems, but ultimately we found a way which hardly involves the client (the requester). It boils down to how the storing is done in the first place.
To store the data, the client passes the chunk to the four vaults with names closest to his own name. These vaults (let’s call them ClientManagers) then check that the client has enough “credit” left to be allowed to store the chunk, i.e. that he’s got a vault which is doing work for the network. If so, they pass the chunk to the four vaults with names closest to the name of the chunk.
These guys (we call them DataManagers) then check to see if that chunk has already been stored. If so, they increase a counter by one and the job’s done. If not, each DataManager chooses one random vault on the network to do the actual storing (we call these vaults PMID Nodes*) and gives the chunk to them.
Well, that’s a slight lie – each DataManager gives the chunk to the four vaults with names closest to the PMID Node! Obviously these chaps are called PMID Managers; they manage the account of the PIMD Node. They credit the PMID Node’s account with the size of the chunk (this increases the rank of that vault and hence allows its owner to get more credit) and then send the chunk to the PMID Node on the last leg of its journey.
The data’s now safe and sound. This complex mechanism solves several issues – but getting back to the delete problems, it gives us a way to solve both of these.
When the client first stored the data, it gave the chunk to its managers. At that stage, they logged this fact, so they know definitively whether that same client should be allowed to delete a chunk or not. First problem solved.
Next, if the client’s request is valid, the ClientManagers pass the request on to the DataManagers. These were the vaults which kept a count of how many times the same chunk had been stored. So all they do is reduce the counter by one for that chunk and see if the counter hits zero. If not, the delete is finished. If so, they have to tell the PMID Nodes (via their managers) to actually delete the chunk from the network. Second problem solved!
We had originally thought that an awful lot of the storing/deleting process would involve the vaults signing requests to one another and keeping these for ages and passing about copies of these signed requests. This was shaping up to be a horrible solution – much more complex than the process described above, but worse, it would be fairly hard on the vaults doing the work and worse still would place significant demands on the clients.
We want the clients to have as little work to do as possible. The client is maybe a user’s old PC or possibly one of these newfangled phone things I hear so much about. They need to be kept fast and responsive. So our fix is to rely on the closeness of vault names. Vaults can’t pretend to be close to a certain name (that’s for another discussion – but they really can’t) and as a group of four all close to a certain name, they can act as one to manage data; their authority being not four signatures, but rather just the fact that they’re all close.
I’ve only scratched the surface here (well, maybe more of a first-degree burn than a scratch). We’ve got to consider the vaults all switching off and on randomly, malicious vaults deliberately trying to corrupt the data or spoof the network, vaults accidentally losing the data, small vaults not being able to hold much data, firewalls, Windows vaults talking to Mac vaults (sort of like cats and dogs with access to firearms living together), clients spewing masses of data all at once, and so on.
So I guess the point here is that we have a really nice solution to several complex problems. Getting to the solution was a long and error-prone path, and we’ve still a fair bit of path left, but now seeing the network doing its thing is pretty exciting. Well it is for me anyway!
* It stands for “Proxy MaidSafe ID”, but I really don’t want to get into that here!