Stacking Code

Files in Databases

Saturday, 18 December, 2010 @ 7:42 PM < Adam Boddington
Tags: Architecture, Building Neno

This is post #26 in the Building Neno series. Please click here for a description of the Building Neno project and instructions on how to access the source code for this post.

So the view model problem has been addressed, it's time to crack on with attachments. Attachments in Neno are really just files -- images, zip files, any kind of media that will help a post make its point. Up until now I've been FTPing all my media up to the web server after I create my post. What I would really like to do is upload them through the application as attachments and ditch the reliance on an FTP client.

When it comes to files in applications, there's two basic storage mechanisms to consider. Storing files as actual files on the web server or a file server, or storing files as Binary Large OBjects (BLOBs) in the database along with the rest of the data. The files are data -- in some applications, they're the data. Why not keep them in the database?

In the ancient past suggesting files go into a database used to get a lot of people all frothy at the mouth. Databases didn't have a datatype for BLOBs, then they did but they couldn't take that much data without a large performance hit. Also, if the database was off somewhere else, transferring megabytes from the application server to the database server could saturate the network (yeah, yeah, yeah, I said megabytes, I'm going way back). It was much easier to leave files on the application server, or a nearby file server, rather than think about putting them in a database.

So time wore on and storing files as files became the norm for many years. There are a couple of drawbacks to storing files as files though, which many corporate in-house developers aren't always aware of.

Transactions. Using transactions for database operations is something most developers do without thinking. It's like breathing. Using transactions that cover both database and a file operations on the other hand is not like breathing -- it's rarely considered. Why? It can be tricky and usually involves something complicated, time consuming, and in the past somewhat unreliable. For small to medium sized in-house applications, the issue is simply ignored, resulting in inconsistent data in certain scenarios.
Keying. This is an issue I see in a lot of systems. How is the file on the file server identified? What stops that relationship from being broken? One system I replaced used a field from the parent object/record as the file name -- a field the users had full control over and changed whenever they wanted to. They learnt to stop changing it however when their files started disappearing -- the application simply didn't know how to find the file on the file server anymore. A much worse situation was when a non-unique field was used. The wrong file would show up.
Backups. Backing up databases is easy. Backing up file servers is easy. Backing up both in a synchronised fashion... not so easy. I have a database backup from midnight last night, I have a file server backup from an hour before that... oh crap, I'm missing some files.
Security. Your DBA has locked down the database so just she, some scheduled jobs, and your application can get into it. Your security expert has locked down your service layer methods to users with the right roles. It's all as secure as can be... Except your super secret plans for product XYZ are in a file stored on a web server somewhere, and the intern (or CTO in one case) just deleted it because he didn't know what it was or why it was there. Or worse yet, the intern threw it on a thumbdrive and walked out the door with it.

Storing files as BLOBs in the database with the rest of the data can alleviate these issues. Database transactions are cheap and easy to do. The BLOBs can be eternally linked to the rest of the data using standard database mechanisms like foreign keys. Backups are always a complete picture of the data, not a partial snapshot. And you don't have to worry about file ACLs in addition to your normal application security.

There are certainly some drawbacks to storing BLOBs in databases, but I'll try to address some of those in the next post when I talk about the implementation of attachments.

Stacking Code

Files in Databases

Comments

Leave a Comment

Projects

Recent Posts

Tags