0 Computer Arithmetic
Last updated
Last updated
Integer rarely adds any complexity to the computation. We mainly use it to index arrays. And as the size of data grows larger, we need a larger indices to keep track of it. For example, a 32-bit integer can address of memory. The modern day OS uses 64-bit integer to index larger memories.
The indexing range is from . To satisfy the need to store negative numbers, we need to some extra information. One way is to spend the first bit as the sign bit. This implementation is easy to understand but has a few flaws (+/- 0s, addition, greater than). Another approach is having a base number so that the result is . In this approach, we a single 0 representation and well ordered. But does not produce a 0 bitstring. We use a system called 2's complement to rotate the number line.
Real numbers can only be approximately represented since there are finite number of bits. Therefore, we need to truncate the number somewhere. The cutoff (rounding error) is one of the characteristic feature of the floating point representation.
Here is a 32-bit single precision example from Wikipedia
Absolute Error
need units/context to be meaningful
Relative Error
has no units
depending on the application
Relative Rounding Error
Ch3 by Victor Eijkhout
by David Goldberg
by Rick Regan