new public domain code available

Bulletin Board and Q&A for x86 assembly language topics

I was inspired by an email message from a person by the name of  Phil 
Taylor who had visited my web site and in re-acquainting himself with 
assembly language programming.  As an exercise, he had written a program 
to generate and print each term of the Fibonacci series from the first 
to the Nth and he emailed his code and the timings that he had measured 
for various numbers of terms.

I created a program to do the same thing, but using a much faster method 
that doesn't require any mul or div instructions and I have posted the 
source code (DOS and Linux versions) on my web site

Like most all of the code there, I have donated these programs to the 
public domain, so you're free to use them for any purpose.  I'd be 
interested in seeing if anybody can come up with faster code to do this. 
  The algorithm is decent, but I am sure that I left a bunch of 
potential optimizations lying around in there, and there may be a better 
mechanism entirely.  Have fun!

Hi Ed,

As always, thanks for the example code. (thanks to Phil
Taylor for inspiring you, too :)

Not concerned that MicroSoft will steal it, eh? :)

Okay! :)

So far, I've found (to my surprise!) that this is a little

	cmp	edi,ebp		; are we at a new significant digit?
;	loopnz	.top		; keep going until we've got them all

	jz .predone
	dec ecx
	jnz .top

	cmp	al,'0'		; is it a zero?
	jnz	.done		; yes, so keep
	inc	ebp		;

I know "loop" is supposed to be slow compared to "dec
ecx"/"jnz ...", but I thought the "loopnz" would be a win
compared to *two* conditional jumps, but it seems to run
around vs for the "loopnz". I'm working
with a K6-300 - not exactly a "typical" machine to optimize
for, these days...

I guess that doesn't do quite the same thing (doesn't seem
to matter)... okay...

;	loopnz	.top		; keep going until we've got them all

	dec ecx
	jz .predone
	cmp	edi,ebp		; are we at a new significant digit?
	jnz .top

	cmp	al,'0'		; is it a zero?
	jnz	.done		; yes, so keep
	inc	ebp		;

This seems maybe a little faster still... matter of seconds
on a nearly 5 minute run... What we need, I think, is a
"32-bit aaa", so we can add 4 digits at once... I'm not
coming up with it now... maybe another day...

I'm a little puzzled by your initialization strategy... you
initialize the counter buffer to ascii '0', but the number
buffers to 0, and then overwrite them with ascii '0' at
runtime. Eliminating this would surely be faster, but not
enough to matter... I'm even more mystified by initializing
"msg2" to "middle" at runtime. Did you have something in
mind that I'm missing, or did it just turn out that way?

Nope.  In fact, if it inspires them to write code that's a little 
faster, a little smaller, and a little less buggy, it would be well 
worth it.

Thanks for your optimization ideas!  I'll have to try them out on my 

Actually, I have something in mind that the *code* is missing.  My 
intent was to do two further optimizations.  One is to dynamically 
allocate the memory, in which case the initialization will be necessary. 
  The other is to reduce the number of system calls to two from the 
current three per line.  Since the first part of the line is of the form 
"Fibonacci(x): ", my intent is to dynamically manage the term counter 
(the "x" part) and then call the print sys_call just once to print that 
whole thing.  To do that, of course, I'd need to move back the "): " 
part to make room as x expands.  I hadn't done that yet but I figured 
I'd post the code anyway.

I've been considering porting all of my example code to Linux using NASM 
or GAS as the assembler, but I haven't yet gotten 'round to it.  I'd be 
more enthusiastic about it if NASM had both a better license and better 
support for structures, but neither will ever happen and I don't care 
enough about it to finish my own assembler at this point.

I see. That would make sense. I wonder if you could, after
completing the "number" part, stick "middle" in there, then
the counter, then copy "(iccanobiF" to it, and print the
whole thing in *one* sys_call?


Sorry to hear that (although I probably wouldn't have liked
it anyway :) I assume what you'd like for structure support
would involve "type checking"... or at least "type
remembering"? Have you looked at Fasm at all? I don't know
how their structure support is, but the license might be
more to your liking (... or not). At least, Fasm will
complain about:

myvar db 0
mov [myvar], eax

.... which I take it you'd see as a good sign(?).

Yes, it's possible, but I don't think I'd do it that way.  The way to do 
it would be to construct the first part of the next line "Fibonacci(x): 
" and append it to the end of the term so that each call would print out 
the number, the newline and then the first part of the next line.

'Taint done yet.

Why don't you think you'd have liked it?

It's a start, but FASM, NASM, and most of the other open source 
assemblers have rather rotten support for structures.  I think it's 
because the people who write assemblers generally don't have much 
experience using them.

First, a structure should have a real type associated with it, so that I 
can't just reference any old chunk of memory and refer to it as though 
structure member names were just offsets (a la NASM's lame add-on 
macro-based structure support).  Second, it should be simple to 
instatiate structures just as it's easy to instatiate simple variables. 
  Third, sub-byte members should be supported.

As an example of why this would be useful, declare and instantiate a 
copule of segment descriptors (data, code, stack, for example) using 
NASM and compare it to how you'd do it with MASM.  MASM's structure 
support is far, far superior, and I would prefer even more than it provides.