DPANSE.HTM (21358B)
1 <HTML><HEAD> 2 <TITLE>DPANS94</TITLE> 3 <link disabled rel="stylesheet" href="mpexc6.css"> 4 <style>@import url(mpexc6.css);</style> 5 </head> 6 7 <BODY> 8 <table width=100%> 9 <tr> 10 <td align=left> 11 <a href=dpansd.htm><img src=left.gif 12 width=26 height=26 align=ALIGN border=0></a> 13 <a href=dpansf.htm><img src=right.gif 14 width=26 height=26 align=ALIGN border=0></a> 15 </td> 16 <td align=right> 17 <a href=dpans.htm#toc><img src=up.gif 18 width=26 height=26 align=ALIGN border=0></a> 19 <a name=E.>Table of Contents</a> 20 </td> 21 </tr> 22 </table> 23 <p> 24 <hr size=4> 25 26 <H1>E. ANS Forth portability guide (informative 27 annex)</H1> 28 29 30 <hr> 31 <A name=E.1> 32 <H1>E.1 Introduction</H1> 33 </a> 34 35 The most popular architectures used to implement Forth have had 36 byte-addressed memory, 16-bit operations, and two's-complement number 37 representation. The Forth-83 Standard dictates that these particular 38 features must be present in a Forth-83 Standard system and that Forth-83 39 programs may exploit these features freely. 40 41 <P> 42 43 However, there are many beasts in the architectural jungle that are bit 44 addressed or cell addressed, or prefer 32-bit operations, or represent 45 numbers in one's complement. Since one of Forth's strengths is its 46 usefulness in <B>strange</B> environments on <B>unusual</B> hardware 47 with <B>peculiar</B> features, it is important that a Standard Forth run 48 on these machines too. 49 50 <P> 51 52 A primary goal of the ANS Forth Standard is to increase the types of 53 machines that can support a Standard Forth. This is accomplished by 54 allowing some key Forth terms to be implementation-defined (e.g., how 55 big is a cell?) and by providing Forth operators (words) that conceal 56 the implementation. This frees the implementor to produce the Forth 57 system that most effectively utilizes the native hardware. The machine 58 independent operators, together with some programmer discipline, enable 59 a programmer to write Forth programs that work on a wide variety of 60 machines. 61 62 <P> 63 64 The remainder of this Annex provides guidelines for writing portable ANS 65 Forth programs. The first section describes ways to make a program 66 hardware independent. It is difficult for someone familiar with only 67 one machine architecture to imagine the problems caused by transporting 68 programs between dissimilar machines. Consequently, examples of 69 specific architectures with their respective problems are given. The 70 second section describes assumptions about Forth implementations that 71 many programmers make, but can't be relied upon in a portable program. 72 73 <P> 74 75 76 <hr> 77 <A name=E.2> 78 <H1>E.2 Hardware peculiarities</H1> 79 </a> 80 81 82 <hr> 83 <A name=E.2.1> 84 <H2>E.2.1 Data/memory abstraction</H2> 85 </a> 86 87 Data and memory are the stones and mortar of program construction. 88 Unfortunately, each computer treats data and memory differently. The 89 ANS Forth Systems Standard gives definitions of data and memory that 90 apply to a wide variety of computers. These definitions give us a way 91 to talk about the common elements of data and memory while ignoring the 92 details of specific hardware. Similarly, ANS Forth programs that use 93 data and memory in ways that conform to these definitions can also 94 ignore hardware details. The following sections discuss the definitions 95 and describe how to write programs that are independent of the 96 data/memory peculiarities of different computers. 97 <P> 98 99 100 101 <hr> 102 <A name=E.2.2> 103 <H2>E.2.2 Definitions</H2> 104 </a> 105 106 Three terms defined by ANS Forth are address unit, cell, and character. 107 The address space of an ANS Forth system is divided into an array of 108 address units; an address unit is the smallest collection of bits that 109 can be addressed. In other words, an address unit is the number of bits 110 spanned by the addresses addr and addr+1. The most prevalent machines 111 use 8-bit address units. Such <B>byte addressed</B> machines include 112 the Intel 8086 and Motorola 68000 families. However, other address unit 113 sizes exist. There are machines that are bit addressed and machines 114 that are 4-bit nibble addressed. There are also machines with address 115 units larger than 8-bits. For example, several Forth-in-hardware 116 computers are cell addressed. 117 118 <P> 119 120 The cell is the fundamental data type of a Forth system. A cell can be 121 a single-cell integer or a memory address. Forth's parameter and return 122 stacks are stacks of cells. Forth-83 specifies that a cell is 16-bits. 123 In ANS Forth the size of a cell is an implementation-defined number of 124 address units. Thus, an ANS Forth implemented on a 16-bit 125 microprocessor could use a 16-bit cell and an implementation on a 32-bit 126 machine could use a 32-bit cell. Also 18-bit machines, 36-bit machines, 127 etc., could support ANS Forth systems with 18 or 36-bit cells 128 respectively. In all of these systems, 129 <a href=dpans6.htm#6.1.1290>DUP</a> 130 does the same thing: it 131 duplicates the top of the data stack. 132 ! (<a href=dpans6.htm#6.1.0010>store</a>) 133 behaves consistently 134 too: given two cells on the data stack it stores the second cell in the 135 memory location designated by the top cell. 136 137 <P> 138 139 Similarly, the definition of a character has been generalized to be an 140 implementation-defined number of address units (but at least eight 141 bits). This removes the need for a Forth implementor to provide 8-bit 142 characters on processors where it is inappropriate. For example, on an 143 18-bit machine with a 9-bit address unit, a 9-bit character would be 144 most convenient. Since, by definition, you can't address anything 145 smaller than an address unit, a character must be at least as big as an 146 address unit. This will result in big characters on machines with large 147 address units. An example is a 16-bit cell addressed machine where a 148 16-bit character makes the most sense. 149 150 <P> 151 152 153 <hr> 154 <A name=E.2.3> 155 <H2>E.2.3 Addressing memory</H2> 156 </a> 157 158 ANS Forth eliminates many portability problems by using the above 159 definitions. One of the most common portability problems is addressing 160 successive cells in memory. Given the memory address of a cell, how do 161 you find the address of the next cell? In Forth-83 this is easy: 2 + . 162 This code assumes that memory is addressed in 8-bit units (bytes) and a 163 cell is 16-bits wide. On a byte-addressed machine with 32-bit cells the 164 code to find the next cell would be 4 + . The code would be 1+ on a 165 cell-addressed processor and 16+ on a bit-addressed processor with 166 16-bit cells. ANS Forth provides a next-cell operator named 167 <a href=dpans6.htm#6.1.0880>CELL+</a> 168 that 169 can be used in all of these cases. Given an address, CELL+ adjusts the 170 address by the size of a cell (measured in address units). A related 171 problem is that of addressing an array of cells in an arbitrary order. 172 A defining word to create an array of cells using Forth-83 would be: 173 174 175 <PRE> 176 : ARRAY CREATE 2* ALLOT DOES> SWAP 2* + ; 177 </PRE> 178 179 <P> 180 181 Use of 182 <a href=dpans6.htm#6.1.0320>2*</a> 183 to scale the array index assumes byte addressing and 16-bit 184 cells again. As in the example above, different versions of the code 185 would be needed for different machines. ANS Forth provides a portable 186 scaling operator named 187 <a href=dpans6.htm#6.1.0890>CELLS</a>. 188 Given a number n, CELLS returns the 189 number of address units needed to hold n cells. A portable definition 190 of array is: 191 192 <PRE> 193 : ARRAY CREATE CELLS ALLOT 194 DOES> SWAP CELLS + ; 195 </PRE> 196 <P> 197 198 There are also portability problems with addressing arrays of 199 characters. In Forth-83 (and in the most common ANS Forth 200 implementations), the size of a character will equal the size of an 201 address unit. Consequently addresses of successive characters in memory 202 can be found using 203 <a href=dpans6.htm#6.1.0290>1+</a> 204 and scaling indices into a character array is a 205 no-op (i.e., 1 *). However, there are cases where a character is larger 206 than an address unit. Examples include (1) systems with small address 207 units (e.g., bit- and nibble-addressed systems), and (2) systems with 208 large character sets (e.g., 16-bit characters on a byte-addressed 209 machine). 210 <a href=dpans6.htm#6.1.0897>CHAR+</a> 211 and 212 <a href=dpans6.htm#6.1.0898>CHARS</a> 213 operators, analogous to CELL+ and CELLS are 214 available to allow maximum portability. 215 <P> 216 217 ANS Forth generalizes the definition of some Forth words that operate on 218 chunks of memory to use address units. One example is 219 <a href=dpans6.htm#6.1.0710>ALLOT</a>. By 220 prefixing ALLOT with the appropriate scaling operator (CELLS, CHARS, 221 etc.), space for any desired data structure can be allocated (see 222 definition of array above). For example: 223 224 <PRE> 225 CREATE ABUFFER 5 CHARS ALLOT ( allot 5 character buffer) 226 </PRE> 227 <P> 228 229 The memory-block-move 230 word also uses address units: 231 232 233 <PRE> 234 source destination 8 CELLS MOVE ( move 8 cells) 235 </PRE> 236 <P> 237 238 239 <hr> 240 <A name=E.2.4> 241 <H2>E.2.4 Alignment problems</H2> 242 </a> 243 244 Not all addresses are created equal. Many processors have restrictions 245 on the addresses that can be used by memory access instructions. This 246 Standard does not require an implementor of an ANS Forth to make 247 alignment transparent; on the contrary, it requires (in 248 <a href=dpans3.htm#3.3.3.1>Section 3.3.3.1</a> 249 Address alignment) that an ANS Forth program assume that character and 250 cell alignment may be required. 251 252 <P> 253 254 One of the most common problems caused by alignment restrictions is in 255 creating tables containing both characters and cells. When 256 , (<a href=dpans6.htm#6.1.0150>comma</a>) or 257 <a href=dpans6.htm#6.1.0860>C,</a> 258 is used to initialize a table, data is stored at the data-space 259 pointer. Consequently, it must be suitably aligned. For example, a 260 non-portable table definition would be: 261 262 263 <PRE> 264 CREATE ATABLE 1 C, X , 2 C, Y , 265 </PRE> 266 267 <P> 268 269 On a machine that restricts 16-bit fetches to even addresses, 270 <a href=dpans6.htm#6.1.1000>CREATE</a> 271 would leave the data space pointer at an even address, the 1 C, would 272 make the data space pointer odd, and , (comma) would violate the address 273 restriction by storing X at an odd address. A portable way to create 274 the table is: 275 276 277 <PRE> 278 CREATE ATABLE 1 C, ALIGN X , 2 C, ALIGN Y , 279 </PRE> 280 281 <P> 282 283 <a href=dpans6.htm#6.1.0705>ALIGN</a> 284 adjusts the data space pointer to the first aligned address 285 greater than or equal to its current address. An aligned address is 286 suitable for storing or fetching characters, cells, cell pairs, or 287 double-cell numbers. 288 289 <P> 290 291 After initializing the table, we would also like to read values from the 292 table. For example, assume we want to fetch the first cell, X, from the 293 table. ATABLE CHAR+ gives the address of the first thing after the 294 character. However this may not be the address of X since we aligned 295 the dictionary pointer between the C, and the ,. The portable way to 296 get the address of X is: 297 298 299 <PRE> 300 ATABLE CHAR+ ALIGNED 301 </PRE> 302 303 <P> 304 305 <a href=dpans6.htm#6.1.0706>ALIGNED</a> 306 adjusts the address on top of the stack to the first aligned 307 address greater than or equal to its current value. 308 309 <P> 310 311 312 <hr> 313 <A name=E.3> 314 <H1>E.3 Number representation</H1> 315 </a> 316 317 Different computers represent numbers in different ways. An awareness 318 of these differences can help a programmer avoid writing a program that 319 depends on a particular representation. 320 321 <P> 322 323 324 <hr> 325 <A name=E.3.1> 326 <H2>E.3.1 Big endian vs. little endian</H2> 327 </a> 328 329 The constituent bits of a number in memory are kept in different orders 330 on different machines. Some machines place the most-significant part of 331 a number at an address in memory with less-significant parts following 332 it at higher addresses. Other machines do the opposite the 333 least-significant part is stored at the lowest address. For example, 334 the following code for a 16-bit 8086 <B>little endian</B> Forth would 335 produce the answer 34 (hex): 336 337 338 <PRE> 339 VARIABLE FOO HEX 1234 FOO ! FOO C@ 340 </PRE> 341 <P> 342 343 The same code on a 16-bit 68000 <B>big endian</B> Forth would produce 344 the answer 12 (hex). A portable program cannot exploit the 345 representation of a number in memory. 346 347 <P> 348 349 A related issue is the representation of cell pairs and double-cell 350 numbers in memory. When a cell pair is moved from the stack to memory 351 with 352 <a href=dpans6.htm#6.1.0310>2!</a>, 353 the cell that was on top of the stack is placed at the lower 354 memory address. It is useful and reasonable to manipulate the 355 individual cells when they are in memory. 356 357 <P> 358 359 360 <hr> 361 <A name=E.3.2> 362 <H2>E.3.2 ALU organization</H2> 363 </a> 364 365 Different computers use different bit patterns to represent integers. 366 Possibilities include binary representations (two's complement, one's 367 complement, sign magnitude, etc.) and decimal representations (BCD, 368 etc.). Each of these formats creates advantages and disadvantages in 369 the design of a computer's arithmetic logic unit (ALU). The most 370 commonly used representation, two's complement, is popular because of 371 the simplicity of its addition and subtraction algorithms. 372 373 <P> 374 375 Programmers who have grown up on two's complement machines tend to 376 become intimate with their representation of numbers and take some 377 properties of that representation for granted. For example, a trick to 378 find the remainder of a number divided by a power of two is to mask off 379 some bits with 380 <a href=dpans6.htm#6.1.0720>AND</a>. 381 A common application of this trick is to test a 382 number for oddness using 1 AND. However, this will not work on a one's 383 complement machine if the number is negative (a portable technique is 2 384 <a href=dpans6.htm#6.1.1890>MOD</a>). 385 386 <P> 387 388 The remainder of this section is a (non-exhaustive) list of things to 389 watch for when portability between machines with binary representations 390 other than two's complement is desired. 391 392 <P> 393 394 To convert a single-cell number to a double-cell number, ANS Forth 395 provides the operator 396 <a href=dpans6.htm#6.1.2170>S>D</a>. 397 To convert a double-cell number to 398 single-cell, Forth programmers have traditionally used 399 <a href=dpans6.htm#6.1.1260>DROP</a>. 400 However, 401 this trick doesn't work on sign-magnitude machines. For portability a 402 <a href=dpans8.htm#8.6.1.1140>D>S</a> 403 operator is available. Converting an unsigned single-cell number to 404 a double-cell number can be done portably by pushing a zero on the 405 stack. 406 407 <P> 408 409 410 <hr> 411 <A name=E.4> 412 <H1>E.4 Forth system implementation</H1> 413 </a> 414 415 During Forth's history, an amazing variety of implementation techniques 416 have been developed. The ANS Forth Standard encourages this diversity 417 and consequently restricts the assumptions a user can make about the 418 underlying implementation of an ANS Forth system. Users of a particular 419 Forth implementation frequently become accustomed to aspects of the 420 implementation and assume they are common to all Forths. This section 421 points out many of these incorrect assumptions. 422 423 <P> 424 425 426 <hr> 427 <A name=E.4.1> 428 <H2>E.4.1 Definitions</H2> 429 </a> 430 431 Traditionally, Forth definitions have consisted of the name of the Forth 432 word, a dictionary search link, data describing how to execute the 433 definition, and parameters describing the definition itself. These 434 components are called the name, link, code, and parameter fields. No 435 method for accessing these fields has been found that works across all 436 of the Forth implementations currently in use. Therefore, ANS Forth 437 severely restricts how the fields may be used. Specifically, a portable 438 ANS Forth program may not use the name, link, or code field in any way. 439 Use of the parameter field (renamed to data field for clarity) is 440 limited to the operations described below. 441 442 <P> 443 444 Only words defined with 445 <a href=dpans6.htm#6.1.1000>CREATE</a> 446 or with other defining words that call 447 CREATE have data fields. The other defining words in the Standard 448 (<a href=dpans6.htm#6.1.2410>VARIABLE</a>, 449 <a href=dpans6.htm#6.1.0950>CONSTANT</a>, 450 <a href=dpans6.htm#6.1.0450>:</a>, 451 etc.) might not be implemented with CREATE. 452 Consequently, a Standard Program must assume that words defined by 453 VARIABLE, CONSTANT, : , etc., may have no data fields. There is no way 454 for a Standard Program to modify the value of a constant or to change 455 the meaning of a colon definition. The 456 <a href=dpans6.htm#6.1.1250>DOES</a>> 457 part of a defining word 458 operates on a data field. Since only CREATEd words have data fields, 459 DOES> can only be paired with CREATE or words that call CREATE. 460 461 <P> 462 463 In ANS Forth, 464 <a href=dpans6.htm#6.1.1550>FIND</a>, 465 <a href=dpans6.htm#6.1.2510>[']</a> and 466 ' (<a href=dpans6.htm#6.1.0070>tick</a>) 467 return an unspecified entity called 468 an <B>execution token</B>. There are only a few things that may be done 469 with an execution token. The token may be passed to 470 <a href=dpans6.htm#6.1.1370>EXECUTE</a> 471 to execute 472 the word ticked or compiled into the current definition with 473 <a href=dpans6.htm#6.2.0945>COMPILE,</a>. 474 The token can also be stored in a variable and used later. Finally, if 475 the word ticked was defined via CREATE, 476 <a href=dpans6.htm#6.1.0550>>BODY</a> 477 converts the execution 478 token into the word's data-field address. 479 480 <P> 481 482 One thing that definitely cannot be done with an execution token is use 483 <a href=dpans6.htm#6.1.0010>!</a> 484 or 485 <a href=dpans6.htm#6.1.0150>,</a> 486 to store it into the object code of a Forth definition. This 487 technique is sometimes used in implementations where the object code is 488 a list of addresses (threaded code) and an execution token is also an 489 address. However, ANS Forth permits native code implementations where 490 this will not work. 491 492 <P> 493 494 495 <hr> 496 <A name=E.4.2> 497 <H2>E.4.2 Stacks</H2> 498 </a> 499 500 In some Forth implementations, it is possible to find the address of a 501 stack in memory and manipulate the stack as an array of cells. This 502 technique is not portable, however. On some systems, especially 503 Forth-in-hardware systems, the stacks might be in a part of memory that 504 can't be addressed by the program or might not be in memory at all. 505 Forth's parameter and return stacks must be treated as stacks. 506 507 <P> 508 509 A Standard Program may use the return stack directly only for 510 temporarily storing values. Every value examined or removed from the 511 return stack using 512 <a href=dpans6.htm#6.1.2070>R@</a>, 513 <a href=dpans6.htm#6.1.2060>R></a>, or 514 <a href=dpans6.htm#6.2.0410>2R></a> 515 must have been put on the stack 516 explicitly using 517 <a href=dpans6.htm#6.1.0580>>R</a> or 518 <a href=dpans6.htm#6.2.0340>2>R</a>. 519 Even this must be done carefully since the 520 system may use the return stack to hold return addresses and 521 loop-control parameters. 522 <a href=dpans3.htm#3.2.3.3>Section 3.2.3.3</a> Return stack of the Standard 523 has a list of restrictions. 524 525 <P> 526 527 528 <hr> 529 <A name=E.5> 530 <H1>E.5 ROMed application disciplines and conventions</H1> 531 </a> 532 533 When a Standard System provides a data space which is uniformly readable 534 and writeable we may term this environment <B>RAM-only</B>. 535 536 <P> 537 538 Programs designed for ROMed application must divide data space into at 539 least two parts: a writeable and readable uninitialized part, called 540 <B>RAM</B>, and a read-only initialized part, called <B>ROM</B>. A 541 third possibility, a writeable and readable initialized part, normally 542 called <B>initialized RAM</B>, is not addressed by this discipline. A 543 Standard Program must explicitly initialize the RAM data space as 544 needed. 545 546 <P> 547 548 The separation of data space into RAM and ROM is meaningful only during 549 the generation of the ROMed program. If the ROMed program is itself a 550 standard development system, it has the same taxonomy as an ordinary 551 RAM-only system. 552 553 <P> 554 555 The words affected by conversion from a RAM-only to a mixed RAM and ROM 556 environment are: 557 558 <p> 559 , (<a href=dpans6.htm#6.1.0150>comma</a>) 560 <a href=dpans6.htm#6.1.0705>ALIGN</a> 561 <a href=dpans6.htm#6.1.0706>ALIGNED</a> 562 <a href=dpans6.htm#6.1.0710>ALLOT</a> 563 <a href=dpans6.htm#6.1.0860>C,</a> 564 <a href=dpans6.htm#6.1.1000>CREATE</a> 565 <a href=dpans6.htm#6.1.1650>HERE</a> 566 <a href=dpans6.htm#6.2.2395>UNUSED</a> 567 <P> 568 569 (<a href=dpans6.htm#6.1.2410>VARIABLE</a> always 570 accesses the RAM data space.) 571 572 <P> 573 574 With the exception of , (comma) and C, 575 these words are meaningful in both RAM 576 and ROM data space. 577 578 <P> 579 580 To select the data space, these words could be preceded by selectors RAM 581 and ROM. For example: 582 583 584 <PRE> 585 ROM CREATE ONES 32 ALLOT ONES 32 1 FILL RAM 586 </PRE> 587 588 <P> 589 590 would create a table of ones in the ROM data space. The storage of data 591 into RAM data space when generating a program for ROM would be an 592 ambiguous condition. 593 594 <P> 595 596 A straightforward implementation of these selectors would maintain 597 separate address counters for each space. A counter value would be 598 returned by HERE and altered by , (comma), C,, ALIGN, and ALLOT, with 599 RAM and ROM simply selecting the appropriate address counter. This 600 technique could be extended to additional partitions of the data space. 601 602 <P> 603 604 605 <hr> 606 <A name=E.6> 607 <H1>E.6 Summary</H1> 608 </a> 609 610 The ANS Forth Standard cannot and should not force anyone to write a 611 portable program. In situations where performance is paramount, the 612 programmer is encouraged to use every trick in the book. On the other 613 hand, if portability to a wide variety of systems is needed, ANS Forth 614 provides the tools to accomplish this. There is probably no such thing 615 as a completely portable program. A programmer, using this guide, 616 should intelligently weigh the tradeoffs of providing portability to 617 specific machines. For example, machines that use sign-magnitude 618 numbers are rare and probably don't deserve much thought. But, systems 619 with different cell sizes will certainly be encountered and should be 620 provided for. In general, making a program portable clarifies both the 621 programmer's thinking process and the final program. 622 623 <P> 624 625 626 <hr> 627 <A href=dpans.htm#toc><IMG src="up.gif" ></A> Table of Contents 628 <BR> 629 <A href=dpansf.htm><IMG src="right.gif" ></A> 630 Next Section 631 <P> 632 </BODY> 633 </HTML>