Ok, no much introduction, we will directly jump to the subject matter. If you want to get some basics on what we are talking about please read my earlier article - Crack password of documents - Word, Excel, Pdf - security concerns
This article is for learning purpose only, shows the vulnerability of legacy RC4 40 bit encryption on documents.
As explained in my previous hub, we will brute force the encryption key instead of password, the easiest and possible way. So we need to validate each possible key available in the key space against the ‘verifier hash’ which is stored in the RC4 encryption header in the document (word/excel).
RC4 Encryption Header
Now we will check the document RC4 header structure and see what is stored there.
EncryptionVersionInfo (4 bytes): Version information of the product or feature, in our case the encryption. There are two part for this – version major and version minor, and values for these should be 1 (0x00001), which tell us this is the RC4 encryption.
Salt (16 bytes): A randomly generated array of bytes, which is the salt value used during the password generation.
EncryptedVerifier (16 bytes): Additional 16 byte verifier encrypted using a 40 bit RC4 cipher. Read more about this here
EncryptedVerifierHash (16 bytes): A 40-bit RC4 encrypted MD5 hash of the verifier used to generate the EncryptedVerifier field.
Simply, we need all these 3 fields – Salt, EncryptedVerifier and EncryptedVerifierHash to generate a final decrypted hash value which then will be compared against each key in the key space (brute forcing). And if a match is found, then that’s our actual key which can be used to decrypt the document content.
How to read the document header?
Microsoft word and excel are compound/OLE documents, which means, it has different sections (object) stored in one file and each section carries different types of information. So our RC4 header would be stored in one section, the encrypted content would be in another section and so on (called Ole Storage).
It would be a good idea to use an OLE programming method to read the file so that we can directly read the RC4 header information instead of searching and seeking through the file and reach the correct position of the RC4 header.
Each section of the file (OLE Storage) has a unique name which can be used to access that particular section. RC4 header section name is “1table” so in our code we will get access to this section through OLE by using this unique name (there are other sections also like "0table", "worddocument" etc. in a word document.)
For programming on .NET framework, we can use OLE interopservice class available in .NET (System.Runtime.InteropServices) with Win32 API call to "ole32.dll". If you are comfortable with any other OLE implementation that should be fine, choice is yours. And if it is not for testing, but you really want to develop something robust then I suggest C or C++, may be with VC++ .net.
Once we read the content (stream) available in the “1table” section, we will take first 52 bytes of ‘1table’ stream which has all our required details to brute force.
The first 4 bytes has version Major and version Minor info. As mentioned above, it should be 1 (0x00001) to ensure that we a have proper version of encryption header.
The next 16 byte is Salt.
The Next 16 byte is EncryptedVerifier and the
Last 16 byte is EncryptedVerifierHash
Well, now we have got all the required information to brute force the key. And we use these details to build the final decrypted verifier hash to compare against each key in the key space.
So here we have two things to do mainly:
1. Write an algorithm to get all available keys in the key space. – you may search on the net for a code piece which will output all the key one by one in a 40 bit key space, or you can write your own code. It is just looping through..yea our “for int i=0….” stuff only.
2. Write the code to create the ‘decrypted verifier hash’ using header details (salt, encrypted verifier etc.) to validate against the key. I have given a link below to get some sample code, go through the link and try your self. My time is limited now, and when I get time probably I will write fully optimized code to test this and add a link here.
Then finally match the ‘verifier hash’ with each key and if we found a match - decrypted verifier hash = key – we go the key to decrypt the document content. Use an RC4 decryption algorithm to decrypt the content using the key, once decrypted save this changes. Our document should now be unprotected, enjoy.
Here is the link to sample source code. But in this code, the word file is accessed via direct file stream operation (File.OpenRead) but not OLE method. When I tried this, the code is failed to show me the RC4 encryption header details. Then I did some search on the net, changed the file reading to OLE and read the “1table” stream and it worked well. Also I had to do some minor changes. So test yourself and learn, it is interesting (to me at least ;-))
And final words, there are tools called guaword and guaexcel which does all these. You can download demo versions of them, and its beta version is free. But no source obviously!
Let me know if you like this hub and your comments.
learner on December 25, 2013:
I like this post very much,
I chk the link you maintained in your link http://offcrypto.codeplex.com/releases/view/22783
but its not working and you also said in your post its not working but when u use file reading to OLE and read 1table stream its work.
can u now please provide me your working sample for this to decrypt the word file.
dashka on May 02, 2013:
how to sample source code OLE ABOUT help implentation
dashka on April 30, 2013:
throw new Exception("Incorrect Version"); error word password help me
varun on April 11, 2013:
The comment in the sample code says that "the key is always 128 bit" , but the key should be 40 bits ! What does that mean ?
yd on May 27, 2012:
u rock man...
psf (author) from Canada on August 12, 2011:
If you want to do it programmatically, you need to search on the internet by using the knowledge you earned from this article (try RC4 decryption algorithm or MD5). Or if your intention is just to decrypt the document search for guaword and it should help you. Its basic version is free.
Flávio Freitas on August 11, 2011:
I have an encrypted Word 97 doc and I know the initial first characteres of the same file. How it can be helpful to extract the rest of the text? Or, what can be more fun, the password? If you can, please send a mail to zz4fff (AT) yahoo.com.br. Thanks!
psf (author) from Canada on June 05, 2011:
Hi chip19, thanks for your comment.
There is an extra ")" in your link, hence I have given the correct link below again:
chip19 on June 04, 2011:
Thank you very much for writing this article. I did a lot of research into cracking Word passwords, and I heard that 97-2000 used 40-bit RC4. However, it wasn't until I read your hub page that I understood exactly how it worked.
One thing I would add for people who want to try to implement this on your own -- make sure you do use the ManagedRC4 (available from the same site as the above source code -- http://offcrypto.codeplex.com/releases/view/21506)... This implementation of RC4 differs from the usual implementation, and is required for the verification to actually work.