How to detect File Type through HTML5
Generally, a file’s type is identified by server side to check if this file is legal for storing. For instance, a cloud document editor can only allow users to import .doc
/.docx
/.pages
files.
However, only detecting the expanded-name is not enough, some of users can change the file expanded-name to avoid this detection. Therefore, we have to find a solution to address this unsafe issue.
Could a file’s content can be modified ?
If we just modify the file’s expanded-name, the file’s content won’t be changed. We can do an experiment. When we try to play a video file with a .pdf
extension, it can still play well. In other words, the video player has another method to detect whether it is a video file or not. After searching on Google, I found that every file has a File Signature (or Magic Number), which represents for the real type of a file. Fortunately, it is constantly embedded in the header of a file (first 4~8 bytes).
However, most solutions provided online are implementation on server side. It is not a good way to use. Only the file that has been uploaded to the server can be detected. A large file has to be waited for a long time to upload, which might make users mad. Therefore, I intended to find a front-end solution.
How to detect file type through HTML5 ?
Thanks to File API of HTML5, we can easily get the file signature of a file.
Here is the source code:
In Practice
However, when I use it in practice, the file signature’s length is not constant. Some of files only need first 4 bytes to detect, but others need first 8 bytes. Even worse, some files’ signatures begin from the 512 bytes. How to use a universal solution to detect these different signatures ?
My current solution is that establishing a signatures library at first, and then using the expanded-name of files to match the signature of such kind of extension. If the signature can’t match with the expanded-name, it will be considered illegally.
This is the signatures library I create:
Implementation:
Summary
Detecting file type through front-end may be the fastest way, but there are also some cons.
1. Part of files has the same signatures
For instance, the signatures of files that Microsoft Office create (.xlsx
,.docx
,.pptx
, etc.) are equal to that of zip files.
Test:
File Type | Successfully Intercepted |
---|---|
mp4 to pdf | ✅ |
zip to docx | ✅ |
zip to jpg | ✅ |
docx to zip | ❌ |
docx to xlsx | ❌ |
2. Compatibility
Unfortunately, HTML5’s FileReader
is not supported by all browsers.
Feature | Firefox (Gecko) | Chrome | Internet Explorer* | Opera* | Safari |
---|---|---|---|---|---|
Basic support | 3.6 (1.9.2) | 7 | 10 | 🚫 | 🚫 |
3. File Signature Library
The library of file signatures can be found on File Signature Database, but not all the files has a signature on it, even though this website keeps updating them.
Reference
[1] FileReader API