Archived Blog Posts
Visit Counter: 5480
Tuesday 24th March 2010
This fragment (from the benchmark "asgni") runs fine:
; 9 j=3 mov [#00864EEC] (j), dword 3 ;#00864B5D: 307005 EC4E8600 03000000 uv 00 00 1 5 ; 10 end for mov esi,[#00864EF4] (i) ;#00864B67: 213065 F44E8600 vu 40 00 1 5 mov edx,40000000 ;#00864B6D: 272 005A6202 uv 04 00 1 6 add esi,1 ;#00864B72: 203306 01 vu 40 40 1 6 cmp esi,edx ;#00864B75: 073362 uv 00 44 1 7 mov [#00864EF4] (i),esi ;#00864B77: 211065 F44E8600 vu 00 40 1 7 jle #00864B5D ;#00864B7D: 176 DE v 00 00 1 8
But this is 40-50 times slower (gulp):
; 9 j=3 mov [#00864DEC] (j), dword 3 ;#00864A5D: 307005 EC4D8600 03000000 uv 00 00 1 5 ; 10 end for mov esi,[#00864DF4] (i) ;#00864A67: 213065 F44D8600 vu 40 00 1 5 mov edx,40000000 ;#00864A6D: 272 005A6202 uv 04 00 1 6 add esi,1 ;#00864A72: 203306 01 vu 40 40 1 6 cmp esi,edx ;#00864A75: 073362 uv 00 44 1 7 mov [#00864DF4] (i),esi ;#00864A77: 211065 F44D8600 vu 00 40 1 7 jle #00864A5D ;#00864A7D: 176 DE v 00 00 1 8
So, can you spot the difference? All the addresses differ by #100, and in terms of solving this puzzle that my friend is your final and only clue. The listings above are from "p -d! bench\benchtst" and "p -d! bt" respectively; that’s the exact same file, but (presumably, see also below) the tiny difference in the amount of memory allocated for the command_line() and subsequent processing of it was enough to trigger a problem.
Needless to say, I was pretty stumped. I even had things that went wrong when run from bench\bench.exw but worked fine when run from the command line, hence my imagination started running pretty wild.
Suddenly I had the brainwave: it is basically self-modifying code. I even wrote a test asm program which exhibited the same problem, that went away when I added a filler/buffer of 1016 bytes between variables i/j and the machine code (or more, but not with 1015 bytes or less). Hence I saw this PC’s 1024 (=#400) byte code cache, in action, as it were. So, in both cases, the code and the data were separated by #36F or so bytes - not quite enough, but something was still not quite right: surely #800..#BFF are on one page and #C00..FFF are on the next, whereas both the above appear to be on different sides of the boundary. But of course "p -d! bench\benchtst" and "p -d! bt" were how I got the above listings, whereas "p bench\benchtst" and "p bt" were how things were actually run. Would that count for enough? Well actually, it might very well, since processCommandLine() splits off a "-d!" string, and then there’s one more element on the array, and so on. To agree with the evidence, the above addresses want to drop another #200, and it all fits. I added an otherwise null-effect "-e!" command line option, so you can now interpret/debug at the exact same addresses as shown in the list.asm file. Of course it was trivial to modify pemit.e to round up allocations, though I went for 4096 rather than bother with querying the CPU for actual cache line size.
They say you learn something new every day. Today felt like two months worth.
New Product, New Site, New Blog.
Friday 12th March 2010
Only problem left now is what to say.
After 3 solid man-years effort, it still shocks me how much there is to do. More tests, documentation, help files, cleaning up the source code, and that’s before I can get on the internal/important stuff. It does make be wonder about just how insane this project is. Take the most complicated thing you can find, and make it easy. While this empowers users to fix a showstopping bug and carry on with their work, I worry about how many young naieve types are going to jump on, make changes, and end up bitterly dissappointed in one way or another.
I suppose what I want to say is:
- Think long and hard before starting anything (when modifying the compiler, that is).
- Do a full mockup of what the changes should look like. For instance, I’m thinking about adding a switch statement. The completely wrong way would be to add a DoSwitch() routine to pmain.e and then write some test cases. Instead I have a little text file where I put any relevant code snippets I find as I work on other things, and only when completely satisfied that it covers all the basics should I start to think about technical details.
- Lastly, be prepared to fail and be wrong. On the home page I said "ruthlessly cherry picks only the very best", and I meant it.