Processing Arrays and Using Functions

by Jacqueline A. Jones, Brooklyn College, CUNY

Using an Index Register

Your text doesn’t show any examples of true use of an index register. An index register is used to hold consecutive offsets from the beginning of an array. If the array consists of doublewords, those items are 4 bytes apart, and the index register will hold first 0, then 4, then 8, then C, etc. An index register is like an array subscript in C++. Traditionally, the legal index registers are esi and edi, though more recent assemblers allow other registers to be used in this fashion.

Example 1: Using an index register.

Sum the values in the arr array using an index register.

arr dword 9, 8, 15, -5

n dword 4

...

mov ecx,n ; loop counter

sub esi,esi ; index reg

sub eax,eax ; sum = 0

add_t: add eax,[arr+esi]

add esi, 4

loop add_t

Using a Base Register

Using an index register to process an array is clear, because it allows you to mention the name of the array in the address reference. However, it is not possible to use this form of notation in a procedure which cannot see the names of variables declared in main. Therefore, another method is needed. This method is using a base register. A base register initially holds the address of the array and is incremented by the size of the array elements. Traditionally, the base register can be ebx or ebp, though esi and edi can be similarly used. Note that ebp points into the stack, not into the data segment, so there are extra steps needed to use it to process an array

Example 2: Using a base register.

Sum the values in the arr array using a base register (the summation is done in main).

arr dword 9, 8, 15, -5

n dword 4

...

mov ecx,n ; loop counter

lea ebx,arr ; base reg

sub eax,eax ; sum = 0

add_t: add eax,[ebx]

add ebx, 4

loop add_t

Passing Parameters in Registers

When calling a procedure, you can pass parameters in registers. Obviously, this is limited by the number of available registers (eax, ebx, ecx, edx, esi, edi, ebp). For our purposes, it is simpler than passing parameters on the stack.

When possible, put the item in the register in which it will be used; this saves time in the procedure. Pass the address of an array in ebx (though using other registers is legal), and pass the number of values in an array in ecx, if you plan to use the loop instruction.

Example 3: Passing parameters in registers.

main:

arr dword 9, 8, 15, -5

n dword 4

...

mov ecx,n ; number of values in the array

lea ebx,arr ; address of the array

call myproc

In the procedure, simply use the registers as you would if you had loaded them in the procedure.

; modify array in proc, using base register notation

myproc proc near32

push eax

push ebx

push ecx

mov eax,9

top: add [ebx],eax ; add value to each array element

add ebx,4

loop top

pop ecx

pop ebx

pop eax

ret

myproc endp

Returning to the Top of an Array in a Procedure

Sometimes, it is useful to be able to keep a pointer to the beginning of the array while still processing through the array. For example, in Example 3, what if you wanted to print the changed array after the end of the loop? The ebx register points past the last element in the array (and ecx is 0). You can’t use lea to access the address of arr, because the function may not refer to names declared in main.

You can solve the problem by popping registers from the stack, as shown in Example 4.

Example 4: Popping the stack to retrieve address and counter values

; modify array in proc, using base register notation;

; pop stack to call another proc

myproc proc near32

push eax

push ebx

push ecx

mov eax,9

top: add [ebx],eax ; add value to each array element

add ebx,4

loop top

pop ecx ; retrieve values from stack

pop ebx

push ebx ; push them back onto stack in case printarr changes them

push ecx

call printarr

pop ecx ; popping is always the last event in a proc

pop ebx

pop eax

ret

myproc endp

Note that the procedure pops the necessary registers and then pushes them back onto the stack. That is a precaution in case printarr is badly-behaved and fails to preserve register values. Each procedure should take charge of preserving the original values of the registers it changes.

Base-Indexed Addressing

Another way to return to the top of an array is to use a different form of addressing, one which does not change the base register at all while processing the array. Base-indexed addressing uses both a base and and index register, changing the index register. The base register continues to point to the top of the array, as shown in Example 5, while esi changes. Starting over inside the proc requires only resetting esi to 0; ebx remains unchanged.

Base-indexed notation in its most traditional form allows one base and one index register per address: ebx+esi, or ebx+edi, or ebp+esi or ebp+edi. Note that ebp points into the stack, not into the data segment, so there are extra steps needed to use it to process an array.

Example 5: Using base-indexed addressing to process an array

; modify array in proc, using base-indexed register notation

myproc proc near32

push eax

push esi ; esi changes, but ebx doesn’t

push ecx

mov eax,9

sub esi,esi ; index register

top: add [ebx+esi],eax ; add value to each array element

add esi,4 ; modify esi rather than ebx

loop top

pop ecx ; retrieve counter from stack

push ecx ; push ecx back in case printarr changes it

call printarr

pop ecx ; popping is always the last event in a proc

pop esi

pop eax

ret

myproc endp

Base+Displacement or Base-Indexed+Displacement Notation

There are other forms of addressing that are useful. The addition of a displacement to base address notation allows processing adjacent values in an array without changing the base address. For example, if [ebx] points to an element in an array of double words, [ebx+4] points to the next element. The extra displacement can also be used with base-indexed addressing, as in [ebx+esi+4]. Example 6 shows using this notation to find whether any two adjacent elements in an array have the same value. If the array contains 4, 5, 5, -2. -2, 9, the message will print twice, once because of the adjacent 5s and once because of the adjacent -2s.

Example 6: Using base+displacement addressing to print a message when adjacent array elements have the same value

; using base+displacement notation to compare array elements

findequal proc near32

.data

msg byte "adjacent elements are equal",13,10,0

.code

push eax

push ebx

push ecx

dec ecx ; process only to element n - 1

top: mov eax,[ebx]

cmp eax,[ebx+4]

jne next

output msg ; special case

add ebx,4 ; all cases

loop top

pop ecx

pop ebi

pop eax

ret

findequal endp

Loop Organization

Example 6 also shows how to organize a loop that has a jump. Do not repeat code. If there is a jump, it jumps over the code which is not done in all cases; I’ve labelled this code as "special case". It jumps forward to the code that is done in all cases (which I’ve labelled "all cases"). The "all cases" code is the necessary indexing and jumping that must be done every time through the loop. It should not be repeated, but instead, you should jump to it.

Location of Procedures

Internal procedures are placed in the same file as the main program. They are placed at the end of the main program, between the statement "invoke exit_process,0" and the end statement. They must go before the directive end, because end stops assembly of the code. They must go after "invoke exit_process" so that they will not be executed when they are encountered in the code and so that you won't have to jump to that statement.

External (Separately-Assembled) Procedures

Often procedures are written to be used frequently. A procedure of this sort should not be placed in the same file as the main program. It is written in a separate file, assembled separately, and linked together with the main program file during the link step.

The extrn Directive

A main program that calls an external procedure must contain an extrn statement, as shown below, where "sample" is the name of the external procedure.

.code

extrn sample: near32 ; extrn says that sample is defined

; outside this file

...

call sample

Without the extrn statement, the assembler would not know what to do when it encountered the call to sample. The extrn statement is a promise that a sample function will be made available later, outside this file.

A separately assembled file must have some of the scaffolding of a main program, but not all. It does not need to have a stack segment, because the stack is declared in the main program. It does not specify the starting positions, since the main program also does that.

Example 7: Format of an external procedure

.386

.model flat

public sample ; make name available to linker

include io.h ; needed if proc does I/O

.data ; data goes ABOVE proc header

string1 byte 40 dup(?), 0 ; variables used must be declared in proc

.code

sample proc near32 ; proc goes inside .code segment

... ; processing goes here

ret

sample endp

end

The public Directive

The external procedure contains the line "public sample", where "sample" is the name of the array. The public directive allows the name "sample" to be available to the linker. The name can be seen outside the file; any names that have not been made public can’t be seen outside the file.

Assembling and Linking an External File

After writing and saving the external procedures, you must assemble them separately from the main procedure (and the same way that you assemble main). The assembly process will produce two object files. If main is called main.asm and this external procedure is called sample.asm, you will end up with main.obj and sample.obj.

In the link step, you must add the file sample.obj to the list of files being linked together. The link step links together main.obj and other files to produce an executable file. (You’ve already done this with io.obj.) The link step "resolves external references." That is, it looks for a matching label for the label specified in the extrn statement in main. It finds it in the item specifed in the public statement, and that allows it to provide the appropriate addresses for the function calls.