Binary Files

 

In a sense, all files are "binary" in that they are just a collection of bytes stored in an operating system construct called a file. However, when we talk about binary files, we are really referring to the way VB.NET opens and processes the file. In the previous topics on text files, techniques for reading and writing those files on a field-by-field or line-by-line basis were demonstrated. In the upcoming topic on random files, the record-oriented techniques for processing those types of files will be demonstrated. On the other hand, binary files typically do not have a simple line-based or record-based structure, but rather have complex internal structures that require special programs to process them. A typical example is an image file, which requires a program such MS-Paint or Adobe Photoshop to do something useful with it.

 

However, any file can be processed in binary mode; the key is that you must traverse or parse through the file to get at the data that you need.

 

In this topic we will look at the techniques for processing an "unstructured" binary file (using techniques such as "FileGet" and "FilePut", which are retooled versions of "Get" and "Put" from pre-.NET versions of VB), as well as "new-in-.NET" features to process record-oriented binary files (using BinaryReader and BinaryWriter).

 

In the first set of sample programs, the following functions will be used:

 

FileOpen

Description:

Opens a file for input or output. You must open a file before any I/O operation can be performed on it. FileOpen allocates a buffer for I/O to the file and determines the mode of access to use with the buffer. If the file specified by FileName doesn't exist, it is created when a file is opened for Append, Binary, Output, or Random modes. The channel to open can be found using the FreeFile() function.

Syntax:

FileOpen(FileNumber, FileName, Mode, Access)

 

The parameters are explained as follows:

                                                             

FileNumber

Required. Any valid file number. Use the FreeFile function to obtain the next available file number.

FileName

Required. String expression that specifies a file name — may include directory or folder, and drive.

Mode

Required. Enum specifying the file mode: Append, Binary, Input, Output, or Random. (In this set of sample programs, OpenMode.Binary will be used.)

Access

Optional. Keyword specifying the operations permitted on the open file: Read, Write, or ReadWrite. Defaults to ReadWrite. (In this set of sample programs, OpenAccess.Read and OpenAccess.Write will be used.).

Example:

This example opens the file in Binary mode for writing operations only.

FileOpen(1, "C:\TESTFILE.TXT", OpenMode.Binary, OpenAccess.Write)

 

 

LOF

Description:

Gets the size, in bytes, of a file opened using the FileOpen function. ("LOF" = "Length Of File")

Syntax:

LOF(FileNumber)

 

where FileNumber is any valid file number

Example:

Dim lngFileSize As Long

FileOpen(1, "C:\TESTFILE.TXT", OpenMode.Input) ' Open file.

lngFileSize = LOF(1)   ' Get length of file.

Console.WriteLine("File Size is {0} bytes.", lngFileSize)

FileClose(1)   ' Close file.

 

FileGet

Description:

Reads data from an open disk file into a variable. Valid only for files opened in Random or Binary mode.

Syntax:

FileGet(FileNumber, Value [, RecordNumber])

 

The parameters are explained as follows:

                                                             

FileNumber

Required. Any valid file number.

Value

Required. Valid variable name into which data is read.

RecordNumber

Optional. Record number (Random mode files) or byte number (Binary mode files) at which reading begins.

Example:

The following statements read 20 bytes from file number 1. (The number of bytes read equals the number of characters already in the string – and the current length of strStomeData is 20.)
 
Dim strSomeData As New String(" ", 20)
FileOpen(1, "C:\TESTFILE.txt", OpenMode.Binary, OpenAccess.Read)
FileGet(1, strSomeData)
Console.WriteLine(strSomeData)
FileClose(1)

 

FilePut

Description:

Writes data from a variable to a disk file. Valid only for files opened in Random or Binary mode.

Syntax:

FilePut(FileNumber, Value [, RecordNumber])

 

The parameters are explained as follows:

                                                             

FileNumber

Required. Any valid file number.

Value

Required. Valid variable name containing data written to disk.

RecordNumber

Optional. Record number (Random mode files) or byte number (Binary mode files) at which writing begins.

Example:

The following statements write 7 bytes to the file number 1. (The number of bytes written equals the number of characters already in the string – and the current length of strSomeData is 7 because it contains the string "Hey now".)
 
Dim strSomeData As String = "Hey now"
FileOpen(1, "C:\TESTFILE.txt", OpenMode.Binary, OpenAccess.Write)
FilePut(1, strSomeData)
FileClose(1)

 

InputString

Description:

Returns String value containing characters from a file opened in Input or Binary mode.

Syntax:

FilePut(FileNumber, CharCount)

 

The parameters are explained as follows:

                                                             

FileNumber

Required. Any valid file number.

CharCount

Required. Any valid numeric expression specifying the number of characters to be read..

Example:

The following statements store the entire contents of TESTFILE.txt into the variable strSomeData. Note that the second parameter of InputString specifies "LOF(1)", meaning the number of bytes to be read from the file should be the number of bytes that make up the file size.
 
Dim strSomeData As String 
FileOpen(1, "C:\TESTFILE.txt", OpenMode.Binary, OpenAccess.Read)
strSomeData = InputString(1, LOF(1))
FileClose(1)
 

Sample Programs

 

Three sample programs will now be presented, using the functions described above. All three read in the same input file and write out the same output file; the difference is in how the input file is read. The first sample program uses the FileGet function to process the file in "chunks", and second uses the FileGet function to process the file all at once, and third uses the InputString function to process the file all at once.

 

The job of the sample programs is to read in an HTML file, strip out all tags (i.e., everything between the "less than" and "greater than" angle brackets as well as the brackets themselves), and write out the remaining text.

 

The figure below shows excerpts of both the HTML input file and the plain text output file.

 

HTML Input File (excerpt)

Plain Text Output File (excerpt)

<title>Working with Files</title>

<!--[if gte mso 9]><xml>

 <o:DocumentProperties>

  <o:Author>Harry Dodson</o:Author>

  <o:LastAuthor>UNI</o:LastAuthor>

  <o:Revision>2</o:Revision>

  <o:TotalTime>42</o:TotalTime>

  <o:Created>2005-08-14T13:20:00Z</o:Created>

  <o:LastSaved>2005-08-14T13:20:00Z</o:LastSaved>

  <o:Pages>1</o:Pages>

  <o:Words>2393</o:Words>

  <o:Characters>13644</o:Characters>

  <o:Company>Logical Decisions</o:Company>

  <o:Lines>113</o:Lines>

  <o:Paragraphs>32</o:Paragraphs>

  <o:CharactersWithSpaces>16005</o:CharactersWithSpaces>

  <o:Version>10.3311</o:Version>

 </o:DocumentProperties>

</xml><![endif]--><!--[if gte mso 9]><xml>

 <w:WordDocument>

  <w:DoNotHyphenateCaps/>

  <w:PunctuationKerning/>

  <w:Compatibility>

   <w:BreakWrappedTables/>

   <w:SnapToGridInCell/>

   <w:WrapTextWithPunct/>

   <w:UseAsianBreakRules/>

  </w:Compatibility>

  <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>

 </w:WordDocument>

</xml>

. . .

Working with Files

 

 

  Harry Dodson

  UNI

  2

  42

  2005-08-14T13:20:00Z

  2005-08-14T13:20:00Z

  1

  2393

  13644

  Logical Decisions

  113

  32

  16005

  10.3311

 

 

 

 

 

 

  

  

  

  

 

  MicrosoftInternetExplorer4

 

 

 

. . .

 

 

Sample Program 1 – Using the Get Statement to Read a Binary File In "Chunks"

 

The first sample program uses the technique of reading and processing a binary file one "chunk" at a time (in this case 10,000 bytes at a time) using the Get statement. Since the file size is a little over 80,000 bytes, it will take nine passes to read through the file. The code listed below is heavily commented to aid in the understanding of how the program works.

 

Code:

 

Module Module1

 

    Public Sub Main()

 

        Dim strHTMFileName As String

        Dim strTextFileName As String

        Dim intHTMFileNbr As Integer

        Dim intTextFileNbr As Integer

 

        Dim strBuffer As String = ""

        Dim strCurrentChar As String

        Dim blnTagPending As Boolean

 

        Dim intX As Integer

        Dim intBytesRemaining As Integer

        Dim intCurrentBufferSize As Integer

        Const intMAX_BUFFER_SIZE As Integer = 10000

 

        strHTMFileName = My.Application.Info.DirectoryPath & "\Files_Lesson1.htm"

        strTextFileName = My.Application.Info.DirectoryPath & "\TestOut.txt"

 

        Console.WriteLine("Opening files ...")

 

        'Open the input file ...

        intHTMFileNbr = FreeFile()

        FileOpen(intHTMFileNbr, strHTMFileName, OpenMode.Binary, OpenAccess.Read)

 

        ' If the file we want to open for output already exists, delete it ...

        If Dir(strTextFileName) <> "" Then

            Kill(strTextFileName)

        End If

 

        ' Open the output file ...

        intTextFileNbr = FreeFile()

        FileOpen(intTextFileNbr, strTextFileName, OpenMode.Binary, OpenAccess.Write)

 

        ' Initialize the "bytes remaining" variable to the length of the input file ...

        intBytesRemaining = LOF(intHTMFileNbr)

 

        ' Set up a loop which will process the file in "chunks" of 10,000 bytes at a time.

        ' We will keep track of how many bytes we have remaining to process, and

        ' the loop will continue as long as there are bytes remaining.

 

        Do While intBytesRemaining > 0

 

            Console.WriteLine("Processing 'chunk' ...")

 

            ' Note: The "buffer" is simply a string variable into which the "current

            ' chunk" of the file will be read.

 

            ' Set the current buffer size to be either the maximum size (10,000) as

            ' long as there are least 10,000 bytes remaining. If there are less (as

            ' there would be the last time through the loop), set the buffer size

            ' equal to the number of bytes remaining.

 

            If intBytesRemaining >= intMAX_BUFFER_SIZE Then

                intCurrentBufferSize = intMAX_BUFFER_SIZE

            Else

                intCurrentBufferSize = intBytesRemaining

            End If

 

            ' Because the FileGet function relies on the size of the string variable (the

            ' "buffer") into which the data will be read to know how many bytes to read

            ' from the file, we fill the buffer string variable with a number of blank

            ' spaces - where the number of blank spaces was determined in the statement

            ' above.

 

            strBuffer = New String(" ", intCurrentBufferSize)

 

            ' The FileGet function now reads the next chunk of data from the input file

            ' and stores it in the strBuffer variable.

 

            FileGet(intHTMFileNbr, strBuffer)

 

            ' The For loop below now processes the current chunk of data character by

            ' character, writing out only the characters that are NOT enclosed in the

            ' HTML tags (i.e., it is skipping every character between a pair of angle

            ' brackets "<" and ">") ...

 

            For intX = 1 To intCurrentBufferSize

                strCurrentChar = Mid(strBuffer, intX, 1)

                Select Case strCurrentChar

                    Case "<"

                        blnTagPending = True

                    Case ">"

                        blnTagPending = False

                    Case Else

                        If Not blnTagPending Then

                            ' The current character is outside of the tag brackets, so

                            ' write it out ...

                            FilePut(intTextFileNbr, strCurrentChar)

                        End If

                End Select

            Next

 

            ' Adjust the "bytes remaining" variable by subtracting the current buffer size

            ' from it ...

            intBytesRemaining = intBytesRemaining - intCurrentBufferSize

        Loop

 

        Console.WriteLine("Closing files ...")

 

        ' Close the input and output files ...

        FileClose(intHTMFileNbr)

        FileClose(intTextFileNbr)

 

        Console.WriteLine("Done.")

        Console.ReadLine()

    End Sub

 

End Module

 

Screenshot of run:

 

Download the VB project code for the example above here.

 

Sample Program 2 – Using the Get Statement to Read a Binary File All At Once

 

The second sample program uses the technique of reading and processing a binary file all at once, using the Get statement in conjunction with the LOF function. The code listed below is heavily commented to aid in the understanding of how the program works.

 

Code:

 

Module Module1

 

    Public Sub Main()

 

        Dim strHTMFileName As String

        Dim strTextFileName As String

        Dim intHTMFileNbr As Integer

        Dim intTextFileNbr As Integer

 

        Dim strBuffer As String

        Dim strCurrentChar As String

        Dim blnTagPending As Boolean

 

        Dim intX As Integer

 

        strHTMFileName = My.Application.Info.DirectoryPath & "\Files_Lesson1.htm"

        strTextFileName = My.Application.Info.DirectoryPath & "\TestOut.txt"

 

        Console.WriteLine("Opening files ...")

 

        'Open the input file ...

        intHTMFileNbr = FreeFile()

        FileOpen(intHTMFileNbr, strHTMFileName, OpenMode.Binary, OpenAccess.Read)

 

        ' If the file we want to open for output already exists, delete it ...

        If Dir(strTextFileName) <> "" Then

            Kill(strTextFileName)

        End If

 

        ' Open the output file ...

        intTextFileNbr = FreeFile()

        FileOpen(intTextFileNbr, strTextFileName, OpenMode.Binary, OpenAccess.Write)

 

        Console.WriteLine("Reading input file ...")

 

        ' Note: The "buffer" is simply a string variable into which the "current

        ' chunk" of the file will be read.

 

        ' Because the FileGet function relies on the size of the string variable (the

        ' "buffer") into which the data will be read to know how many bytes to read

        ' from the file, we fill the buffer string variable with a number of blank

        ' spaces - where the number of blank spaces is equal to the size of the

        ' entire file (as determined by the LOF function) ...

 

        strBuffer = New String(" ", LOF(intHTMFileNbr))

 

        ' The FileGet function now reads the entire contents of the input file

        ' and stores it in the strBuffer variable.

 

        FileGet(intHTMFileNbr, strBuffer)

 

        Console.WriteLine("Generating output file ...")

 

        ' The For loop below now processes the contents of the file character by

        ' character, writing out only the characters that are NOT enclosed in the

        ' HTML tags (i.e., it is skipping every character between a pair of angle

        ' brackets "<" and ">") ...

 

        For intX = 1 To Len(strBuffer)

            strCurrentChar = Mid(strBuffer, intX, 1)

            Select Case strCurrentChar

                Case "<"

                    blnTagPending = True

                Case ">"

                    blnTagPending = False

                Case Else

                    If Not blnTagPending Then

                        ' The current character is outside of the tags, so write it out ...

                        FilePut(intTextFileNbr, strCurrentChar)

                    End If

            End Select

        Next

 

        Console.WriteLine("Closing files ...")

 

        ' Close the input and output files ...

        FileClose(intHTMFileNbr)

        FileClose(intTextFileNbr)

 

        Console.WriteLine("Done.")

        Console.ReadLine()

 

    End Sub

 

End Module

 

Screenshot of run:

 

Download the VB project code for the example above here.

 

Sample Program 3 – Using the Input Function to Read a Binary File All At Once

 

The third sample program uses the technique of reading and processing a binary file all at once, using the InputString function in conjunction with the LOF function. The code listed below is heavily commented to aid in the understanding of how the program works.

 

Code:

 

Module Module1

 

    Public Sub Main()

 

        Dim strHTMFileName As String

        Dim strTextFileName As String

        Dim intHTMFileNbr As Integer

        Dim intTextFileNbr As Integer

        Dim strBuffer As String

        Dim strCurrentChar As String

        Dim intX As Integer

        Dim blnTagPending As Boolean

 

        strHTMFileName = My.Application.Info.DirectoryPath & "\Files_Lesson1.htm"

        strTextFileName = My.Application.Info.DirectoryPath & "\TestOut.txt"

 

        Console.WriteLine("Opening files ...")

 

        'Open the input file ...

        intHTMFileNbr = FreeFile()

        FileOpen(intHTMFileNbr, strHTMFileName, OpenMode.Binary, OpenAccess.Read)

 

        ' If the file we want to open for output already exists, delete it ...

        If Dir(strTextFileName) <> "" Then

            Kill(strTextFileName)

        End If

 

        ' Open the output file ...

        intTextFileNbr = FreeFile()

        FileOpen(intTextFileNbr, strTextFileName, OpenMode.Binary, OpenAccess.Write)

 

        Console.WriteLine("Reading input file ...")

 

        ' Note: The "buffer" is simply a string variable into which the "current

        ' chunk" of the file will be read.

 

        ' The InputString function reads a number of bytes from a file. The first

        ' argument specifies the file number of the file from which the data is to be

        ' read. The resulting data is stored in the "strBuffer" variable. The second argument

        ' of the function specifies how many bytes to read, which in this case is

        ' the size of the entire file (as determined by the LOF function).

 

        strBuffer = InputString(intHTMFileNbr, LOF(intHTMFileNbr))

 

        Console.WriteLine("Generating output file ...")

 

        ' The For loop below now processes the contents of the file character by

        ' character, writing out only the characters that are NOT enclosed in the

        ' HTML tags (i.e., it is skipping every character between a pair of angle

        ' brackets "<" and ">") ...

 

        For intX = 1 To Len(strBuffer)

            strCurrentChar = Mid(strBuffer, intX, 1)

            Select Case strCurrentChar

                Case "<"

                    blnTagPending = True

                Case ">"

                    blnTagPending = False

                Case Else

                    If Not blnTagPending Then

                        ' The current character is outside of the tags, so write it out ...

                        FilePut(intTextFileNbr, strCurrentChar)

                    End If

            End Select

        Next

 

        Console.WriteLine("Closing files ...")

 

        ' Close the input and output files ...

        FileClose(intHTMFileNbr)

        FileClose(intTextFileNbr)

 

        Console.WriteLine("Done.")

        Console.ReadLine()

    End Sub

 

End Module

 

Screenshot of run:

 

Download the VB project code for the example above here.

 

 

Sample Program 4 – Using BinaryWriter and BinaryReader to Write and Read a Binary Data File

 

This sample program demonstrates the BinaryWriter and BinaryReader objects by first writing out a data file with fields in their native format (string, integer, date, and single). Then the program uses BinaryReader to read that file back in and displaying each record on the console, line by line.

 

In the first part of the program, the binary data file is populated from a text file of employee data. Each record of the employee text file consists of a record containing employee name (read into a String variable), department number (read into an Integer variable), job title (read into a String variable), hire date (read into a Date variable), and hourly rate (read into a Single variable).  The Write method of the BinaryWriter object variable that has been established is then used to populate the binary file, field by field. One thing to note is that when a Date variable is written to this type of binary file, it must be converted to a Long Integer (Int64) using the ToBinary method of the Date variable.

 

In the second portion of the program, the binary data file is read back, using the ReadXXXX methods of the BinaryReader object variable that has been established. The "ReadXXX" methods are a set of methods to read in the specific data types of the data you are expecting: ReadString, ReadInt32, ReadSingle, etc. A Date variable requires special handling: assuming that the Date field was written to the file using the "ToBinary" method, the Date variable must be read back as a Long Integer with the ReadInt64 method, and then the Date.FromBinary method must be used on it to convert it back to a Date data type for use in the program. Once one "record's worth" of data has been read in with the appropriate ReadXXXX methods, the data is formatted into a string and displayed on the console.

 

Code:

 

Imports System.IO

 

Module Module1

 

     Public Sub Main()

 

        Dim strSeqEmpFileName As String

        Dim strBinEmpFileName As String

        Dim intSeqEmpFileNbr As Integer

 

        Dim intRecordCount As Integer

 

        Dim strEmpName As String

        Dim intDeptNbr As Integer

        Dim strJobTitle As String

        Dim dtmHireDate As Date

        Dim sngHrlyRate As Single

 

        strSeqEmpFileName = My.Application.Info.DirectoryPath & "\EMPLOYEE.txt"

        strBinEmpFileName = My.Application.Info.DirectoryPath & "\EMPLOYEE.BIN"

 

        '-----------------------------------------------------------------------

        ' In the first part of this sample program, we will create, or load,

        ' a binary access version of the comma-delimited sequential employee

        ' file that was used in one of the sample programs for sequential access

        ' files.

        '-----------------------------------------------------------------------

 

        ' Open the sequential employee file for input ...

        intSeqEmpFileNbr = FreeFile()

        FileOpen(intSeqEmpFileNbr, strSeqEmpFileName, OpenMode.Input)

 

        ' If the binary employee file we want to write already exists,

        ' delete it ...

        If File.Exists(strBinEmpFileName) Then

            File.Delete(strBinEmpFileName)

        End If

 

        ' Open the binary employee file for writing ...

 

        Dim objFS As New FileStream(strBinEmpFileName, FileMode.Create, FileAccess.Write)

        Dim objBW As New BinaryWriter(objFS)

 

        ' Initialize record count variable to keep track of how many records will

        ' be written to the binary file ...

        intRecordCount = 0

 

        ' This loop will read a record from the comma-delimited sequential employee file

        ' and write a corresponding record to its binary access counterpart ...

        Do Until EOF(intSeqEmpFileNbr)

            ' Read a record's worth of fields from the comma-delimited employee file,

            ' storing the fields into their corresponding variables ...

            Input(intSeqEmpFileNbr, strEmpName)

            Input(intSeqEmpFileNbr, intDeptNbr)

            Input(intSeqEmpFileNbr, strJobTitle)

            Input(intSeqEmpFileNbr, dtmHireDate)

            Input(intSeqEmpFileNbr, sngHrlyRate)

            ' Write a record to the binary file (one field at a time)

            objBW.Write(strEmpName)

            objBW.Write(intDeptNbr)

            objBW.Write(strJobTitle)

            objBW.Write(dtmHireDate.ToBinary)

            objBW.Write(sngHrlyRate)

            ' Increment the record count variable ...

            intRecordCount = intRecordCount + 1

        Loop

 

        ' Close the sequential file and the binary file ...

        FileClose(intSeqEmpFileNbr)

        objBW.Close()

        objFS.Close()

 

        '-----------------------------------------------------------------------

        ' In the next part of this sample program, we will display the records

        ' written to the binary file by reading them back and outputting their

        ' contents to the console one by one.

        '-----------------------------------------------------------------------

 

        ' Print headings ...

        Console.WriteLine("{0} employee records were written to the binary file.", intRecordCount)

        Console.WriteLine()

        Console.WriteLine("Contents as follows:")

        Console.WriteLine()

 

        Console.WriteLine("EMP NAME".PadRight(20) & " " & _

                          "DEPT".PadRight(4) & " " & _

                          "JOB TITLE".PadRight(25) & " " & _

                          "HIRE DATE".PadRight(10) & " " & _

                          "HRLY RATE".PadRight(7))

 

        Console.WriteLine("--------".PadRight(20) & " " & _

                          "----".PadRight(4) & " " & _

                          "---------".PadRight(25) & " " & _

                          "---------".PadRight(10) & " " & _

                          "---------".PadRight(7))

 

        ' Open the binary file for reading ...

        objFS = New FileStream(strBinEmpFileName, FileMode.OpenOrCreate, FileAccess.Read)

        Dim objBR As New BinaryReader(objFS)

 

        ' Loop thru the binary file to get one "record's worth" of fields

        ' and display each record (the fields) on a console line in each pass of the loop

        Do While objBR.PeekChar <> -1

            strEmpName = objBR.ReadString

            intDeptNbr = objBR.ReadInt32

            strJobTitle = objBR.ReadString

            dtmHireDate = Date.FromBinary(objBR.ReadInt64)

            sngHrlyRate = objBR.ReadSingle

            Console.WriteLine(strEmpName.PadRight(20) & " " & _

                              intDeptNbr.ToString.PadLeft(4) & " " & _

                              strJobTitle.PadRight(25) & " " & _

                              Format(dtmHireDate, "MM/dd/yyyy").PadRight(10) & " " & _

                              Format(sngHrlyRate, "Standard").PadLeft(7))

 

        Loop

 

        Console.WriteLine()

 

        ' Close the binary file ...

        objBR.Close()

        objFS.Close()

 

        Console.ReadLine()

 

    End Sub

 

End Module

 

 

Screenshot of run:

 

Download the VB project code for the example above here.