Sorting data (i.e., arranging data in a particular order, such as listing a group of names alphabetically, or listing a set of numbers from low to high) is one of the most important computing applications. Since the dawn of the computing age, researchers and computer scientists have sought to come up with algorithms that sort data ever faster and more efficiently. There are several well-known sorting algorithms, with names like the Bubble Sort, Shell Sort, Tag Sort, Heap Sort, and QuickSort (the list goes on).
As far as getting applications done in VB is concerned, you may never need to implement your own sort. If you are doing database application programming, you tell the DBMS how you want the data sorted by specifying the ORDER BY clause in your query, and let the DBMS do the work. As far as displaying information on a VB form, many of the controls that can display lists of data (such as the ListBox, ListView, TreeView, MSFlexGrid, etc.) have a "Sorted" property – so sorting can be as easy as setting the Sorted property to True (again, let VB do the work and figure out the implementation details).
Still, there may be occasions where you need to sort data "on your own" with VB code. This article demonstrates how to sort the contents of an array using the bubble sort technique. The bubble sort is the simplest to understand, and it provides a good initial exposure to sorting algorithms. However, it also the slowest and least efficient of the sorting algorithms – not a big deal if you have a relatively small amount of data to sort, but if you need to sort an array containing thousands of items, you had better research one of the faster algorithms.
The basic bubble sort algorithm calls for repeatedly looping through the array to be sorted, comparing each item to its adjacent item (i.e., first time through, compare item 1 with item 2, second time compare item 2 with item 3, and so on). When the current pair of two items is compared, if the first of the two items has a higher value than the second of the two, then the two items are exchanged, or swapped, within the array. The bubble sort is so named because the smaller values "bubble" their way up to the top of the array. (Looking at it another way, a less-commonly used term for this type of sort is the "sinking sort" because larger values "sink" their way to the bottom of the array.) Due to the way that successive adjacent comparisons are made, a large value can move down the array several positions on one pass. On the first pass, the largest value will always sink to its proper position in the last element of the array; on the second pass, the next largest value will sink to its proper position in the next-to-last element of the array, and so on. Meanwhile, on any given pass, a smaller value will move up only one position.
In a "pure" implementation of the bubble sort of "N" items, N – 1 comparisons must be made N – 1 times. For example, for an array of five items, four comparisons must be made four times, for a total of 16 comparisons; for an array of 10 items, 81 (9 X 9) comparisons must be made; for an array of 1000 items, 998001 (999 X 999) comparisons must be made. The bubble sort algorithm can be modified to possibly cut down the number of comparisons to be made through the use of a Boolean "exchange flag".
A generalized Sub routine that implements the "pure" bubble sort algorithm to sort an array of integers is shown below. Note that nested For/Next loops are used to step through the array. The starting loop index for each For loop is the lower bound of the array to be sorted (LBound(paintArray)). If the array was declared with the boundaries 1 To 5, then the lower bound would be 1. The ending loop index for each For loop is the upper bound of the array to be sorted, less 1 ((UBound (paintArray) - 1)). Again, if the array was declared with the boundaries 1 To 5, then the upper bound less 1 would be 4. Note that the body of the inner loop performs the comparison of the current element of the array with its adjacent "neighbor", and if the value of the current element is greater than the adjacent element, the elements are exchanged.
Note the coding required to implement the exchange of items: the current item of the array (paintArray(lngy)) is assigned to a temporary, or "hold" variable (intTemp). The adjacent item of the array (paintArray(lngY + 1)) is then copied to the current item (at this point, both the current and the adjacent items contain the same value). The temporary variable is then assigned to the adjacent item, completing the exchange.
The screen shot below shows a graphical representation of a test case using the algorithm presented above to sort an array of five integers. It shows what happens on each pass through the loop, highlighting the current pair of items being compared (and possibly exchanged).
With the "pure" version of the bubble sort, as discussed above, N – 1 comparisons (the inner loop) must be made N – 1 times (the outer loop). That can add up to quite a number of comparisons and can take up quite a bit of processor time. So it would make sense to reduce the number of comparisons if we possibly can. A way to do it is to use a Boolean flag in conjunction within the inner loop: the flag is set to False at the beginning of the inner loop, but set to True if an exchange is made at any time within the inner loop. If no exchange within the inner loop is made, the flag remains False, indicating that the array is now in sorted order. The outer loop continues only if the flag is True; if the flag is False, the process ends, possibly avoiding unnecessary passes through the array.
A generalized Sub routine that implements the "modified" version of the bubble sort algorithm using the "exchange made" flag is shown below:
The screen shot below shows a graphical representation of a test case using the modified bubble sort algorithm presented above to sort an array of five integers. It shows what happens on each pass through the loop, highlighting the current pair of items being compared (and possibly exchanged). Note that after the second pass through the array, the algorithm "realizes" that the sort is complete (because blnExchangeMade remained False), and so the sort process ends early. In this case, use of the blnExchangeMade flag saved two full passes through the array (eight comparisons). If the original "pure" bubble sort was used here, it would have kept going because it does not have the "intelligence" built in to "know" that an early exit was possible.
Sample Program 1
Sample Program 1 implements the modified bubble sort algorithm (The "SortIntegerArray" Sub) shown above to sort an array of five integers. When the program is run, clicking on the "Get Random Numbers" button loads an array with five random numbers between 100 and 999 and displays them on the form:
Upon clicking on the "Sort Numbers" button, the bubble sort algorithm is implemented and the sorted results are displayed on the form:
The code for Sample Program 1 is shown below:
When the "Get Random Numbers" button is clicked, the cmdGet_Click event is fired, which first clears the form with the Cls statement, then issues the Randomize statement, which should be executed at least once in any program that uses the Rnd function to generate random numbers. The following For/Next loop is then executed to load the maintRandomNumbers array with five random numbers between 100 and 999, printing each number after it is generated and assigned to the next array element:
For intX = 1 To 5
maintRandomNumbers(intX) = GetRandomNumber(100, 999)
On each pass through the above loop, the random number between 100 and 999 is obtained by calling the GetRandomNumber function, which implements the formula (introduced in an earlier article) to generate a random number between a range of numbers (where a range of numbers has a lowerbound and an upperbound – in this case, 100 is the lowerbound and 999 is the upperbound):
Int((upperbound – lowerbound + 1) * Rnd + lowerbound)
The code for the function is as follows:
Private Function GetRandomNumber(pintLowerBound As Integer, _
pintUpperBound As Integer) _
' This function will return a random integer that falls with the range
' of the two arguments passed.
GetRandomNumber = Int((pintUpperBound - pintLowerBound + 1) * Rnd + pintLowerBound)
When the "Sort Numbers" button is clicked, the cmdSort_Click event is fired, which first prints a blank line below the list of numbers printed by the cmdGet_Click event procedure, followed by a line with the text "The numbers sorted are:". Then, the statement
calls the bubble sort routine (SortIntegerArray), passing it the maintRandomNumbers array we loaded via the cmdGet_Click event procedure. After the SortIntegerArray procedure "does its thing", control returns to the cmdSort_Click event code, where we loop through the newly sorted maintRandomNumbers array, printing its sorted contents.
Full code for Sample Program 1 (in Form1):
Download the project files for Sample Program 1 here.
Sample Program 2
Sample Program 2 applies the modified bubble sort algorithm to sort an array of "records", where the "record" is simply a string containing fields from an input record of a sequential file as well as an extra area where we build a "sort key". In the bubble sort routine, it is only the sort key area of the strings that are compared.
The sample program reads in a file of unsorted customer data and can sort the records of that file in one of two ways. The first way is by customer last name alone; the second is by customer last name within state. The results of the first way (customer name alone) is shown in the screen shot below:
The results of the second sort option (customer name within state) is shown in the screen shot below. Note that here, it is the state field that is the primary sort key (note that the state field is in alphabetical order). For each set of records where the state field is the same, the customers are listed alphabetically by last name.
The code for Sample Program 2 is provided below.
The Click event of both of the "sort buttons" (cmdSortLast and cmdSortState) each call the Sub DisplayCustInfo; however the cmdSortLast_Click Sub passes "L" as an argument to DisplayCustInfo, while cmdSortState_Click passes "S" as an argument. The DisplayCustInfo Sub uses the "L" or "S" to determine which of the two options to use to sort the data.
In the main processing loop of the DisplayCustInfo sub (beginning "Do Until EOF(intCustFileNbr)"), a record from the customer file is read in using the Input # statement, which stores the fields of the current record into their respective variables (this is a comma-delimited file consisting of the following fields: Last Name, First Name, Address, City, State, and Zip). Following the Input # statement, the statement
strFixedRec = ""
clears the variable strFixedRec, which is the record string that we will build to store in the array.
If pstrSortType = "L" Then
strFixedRec = Left$(strLastName & Space$(20), 20)
strFixedRec = Left$(strState & strLastName & Space$(20), 20)
uses the "L" or "S" argument that was passed in to create a 20 character string consisting of either last name alone (if "L" was passed) or state followed by last name (if "S" was passed). This 20 character string will serve as our "sort key". The technique used to create a fixed-length, left-justified string is
Left$(stringexpression & Space$(n), n)
where n is the length of the desired string. This code says, "take the source string and concatenate a number of blank spaces to it equal to the length of the desired resulting string – then apply the Left$ function to truncate the resulting string to the desired length". For example, if stringexpression was a customer name that was 10 characters in length, concatenating 20 spaces to that would give us an "intermediate" string that is 30 characters in length. Applying the Left$ function to that intermediate result to get the leftmost 20 characters gives us a final resulting string that is 20 characters in length, consisting of the 10 character customer name plus 10 spaces.
strFixedRec = strFixedRec _
& Left$(strLastName & Space$(15), 15) _
& Left$(strFirstName & Space$(15), 15) _
& Left$(strCity & Space$(25), 25) _
appends additional fixed length strings to strFixedRec: 15 characters for the last name, 15 characters for the first name, 25 characters for the city, and the state which is always 2 characters. (The address and zip code from the record are not used.)
intArrX = intArrX + 1
ReDim Preserve mastrCustRec(1 To intArrX)
mastrCustRec(intArrX) = strFixedRec
add a new element to the dynamic mastrCustRec array and assigns the newly built strFixedRec (our "record string" with a 20-character "sort key" in front) to the newly added element.
After all the records have been loaded into the array, the input file is closed, and then the sort routine is called as follows:
SortRecordArray mastrCustRec, 1, 20
An examination of the SortRecordArray Sub header shows that it expects three parameters: the string array to be sorted, the starting position of the sort key, and the length of the sort key:
Private Sub SortRecordArray(pastrRecArray() As String, _
plngSortKeyStart As Long, _
plngSortKeyLength As Long)
Note that in the portion of the SortRecordArray routine that compares the current and adjacent elements of the array, the starting position and length parameters are used as arguments for the Mid$ functions which isolate the "sort keys" for comparison:
If Mid$(pastrRecArray(lngY), plngSortKeyStart, plngSortKeyLength) _
> Mid$(pastrRecArray(lngY + 1), plngSortKeyStart, plngSortKeyLength) _
After the SortRecordArray routine "does its thing", control is returned to the DisplayCustInfo sub, where the results are printed to the form.. The statements
For intX = 1 To UBound(mastrCustRec)
Print Mid$(mastrCustRec(intX), 21)
loop through the mastrCustRec array (now sorted) and prints each element from the 21st position on (recall that we had reserved positions 1 through 20 of each element for the "sort key").
Full code for Sample Program 2 (in Form1):
Download the project files for Sample Program 2 here.