I ran into some importing old Excel files into STATA 12. Since I thought others would probably be encountering the same problem, I decided to write a blog post about it.
We’re getting ready to produce a draft release of our China Multigenerational Panel Dataset – Shuangcheng (CMGPD-SC) so that users can kick the tires and report problems before we submit a final version to ICPSR for dissemination there.
As part of the preparation, we wanted to take advantage of the new facility in Stata 12 that allows Excel files to be opened directly. Our ‘raw’ data consist of Excel spreadsheets entered by our coders, one per register. Registers are annual or triennial. For our Liaoning dataset, we have 737 registers coded. For Shuangcheng, we have 338. Previously, our procedures for automating the import of the registers in Stata were clumsy, and rarely survived upgrades to Stata or Windows. At one point we were using the odbc command to loop through and read all the registers, but that broke when we moved to computers that were running 64 bit windows. Then we wrote a macro to loop through the Excel files and write them to tab-delimited text fields, which STATA could read.
Converting our programs to use import excel was fairly straightforward. Basically it just mean substituting import excel for insheet.
When we began running the programs, however, STATA was reporting that it could not load files, and came back with an r(603). I did notice it could open all .xlsx files, but had more trouble with .xls files. I began to wonder if the problem was with older versions of Excel files. Perhaps the import capability assumed a recent version of Excel. I saved some of the files as .xlsx files and sure enough, STATA could read them.
At that point, it became necessary to convert the thousand or so files that were in older versions of Excel to .xlsx files. Opening them one by one and saving them to .xlsx would be impractical.
I poked around on the net, and found that Microsoft had an Office File Converter tool available for download. Here is an introduction and here is the download. The tool requires that the Microsoft Office Compatibility Pack be installed. By modifying the ofc.ini file, and adding the name of a folder under [FoldersToConvert] it is possible to direct OFC to attempt to convert all the old .xls files it finds in a specified folder to .xlsx.
[FoldersToConvert]
fldr=C:UsersCameronDropboxSharedSkydriveCMGPD DataLN
Here is what my [ConversionInfo] section ended up looking like:
SourcePathTemplate=********
DestinationPathTemplate=*1*2*3*4*5*6*7*8Converted
I ran ofc and sure enough, it chugged through the files and converted them and placed them in a directory under the original folder that was called Converted.
Now Stata is happily chewing through the converted files.