Matlab Numbers
We have seen that computers can store integers easily in binary notation. Using binary digits in which each digit is electronically or magnetically 'on' or 'off', then the size of the integer a computer can represent is limited only by the number of binary digits the computer reserves for those numbers.
In mathematical computations, it is much more common to require rational or irrational numbers - the number of applied problems that can be solved using only integers is limited. One way to handle such numbers is through a fixed point representation. The idea is that a certain number of binary digits is reserved for the decimal part of a number, and the rest are used for the part of the number to the left of the decimal point. In a very simple example, we might consider a representation of numbers using 16 binary digits. The first eight digits represent the number to the left of a decimal point, while the last seven represent the number to the right of the decimal point. Since the eight binary digits can only represent numbers from 0 to 255, then we really only get two decimal digits, which requires only seven binary digits. Thus, we could use the first digit of the second byte to represent the sign of the number: 0 for positive, 1 for negative. In such a system we might get representations such as those below.
00000001 00000001 ⇒ 1.01 10000000 00010000 ⇒ 128.16 11111111 01100011 ⇒ 255.99 11111111 11100011 ⇒ -255.99
We stress that computers do not use this system - it is purely speculative. The reason they do not use it is that it can only represent numbers between -256 and 256, and with only two-decimal-digit accuracy. If a computation were to require very large numbers or very small numbers, this system would fail miserably.
Instead of the fixed point system described above, computers always use a floating point system for representing non-integers. The idea is that instead of using fixed numbers of binary digits to store integer and decimal parts of the number, we will use fixed numbers of binary digits to store a coefficient and an exponent. Again using an oversimplified system for an example, suppose that we have 16 binary digits to store our number. We could reserve the first digit for the sign of the number, ten binary digits for the coefficient, the next digit for the sign of the exponent, and the last four digits for the value of the exponent. In the examples below we have marked the coefficient sign bit red and the exponent sign bit green.
00000000 01000001 ⇒ 2×21 = 4 00000000 01010001 ⇒ 2×2-1 = 1 00010000 01010011 ⇒ (27+22)×2-3 = 16.5 01111111 11101111 ⇒ 1023×215 = 3.3521664×107 00000000 00111111 ⇒ 1×2-15 = 3.0517577×10-5
The last two numbers in the example represent the largest and smallest numbers that can be represented in this system. Indeed, there are several things worth noting about floating point number systems.
- The set of floating point numbers is finite, and in particular, there is a largest and a smallest floating point number in any such system.
- There are very many floating point numbers near zero, while they become increasingly sparse far from zero. The important thing is the number of significant digits.
- When we multiply two floating point numbers, we get a number with more nonzero digits - a number which is probably not in the set of floating point numbers. We have to 'round' it off to get a number with a floating point representation. Thus floating point arithmetic is not exact.
- The exponent could be thought of as giving the position of the decimal point for the number. Multiplying a number by two corresponds to shifting the coefficient left one digit, or to increasing the exponent by one.
Note that in the example above there is a bias toward larger numbers - i.e. the largest number available is over 107, but the smallest is not 10-7. The accepted standard for floating point number systems, IEEE 754, deals with this issue in reasonably clever ways. Most computing systems use the IEEE 754 standards, which use four bytes (32 bits) for ordinary floating point numbers. The standard also provides for 64 bit floating point numbers.
Matlab does all computations using floating point numbers. It uses double precision floating point numbers, meaning that it uses the 64 bit standard discussed above. In practice this means that you can assume that Matlab carries around something like 16 decimal digits of significance, the smallest number available is around 2.2×10-308, and the largest number in Matlab is around 1.8×10308.
The fact that the Matlab can form numbers as small as 10-300 does not mean that it should. Think about it: if we are doing computations with numbers of magnitude around 100, and can carry only 16 digits of precision, then in floating point arithmetic
The last test will take place at the final exam time on
Tuesday, 12 December, from 1:30-3:30. It will be written as a one-hour (not
50 minute) exam, but you may have the full two hours for it.
In other respects it will be very like the other tests, but
comprehensive - it will emphasize Python, but cover all the
topics we have seen.
There is a
Sample Exam, but be aware that things will have changed somewhat
with the advent of ChatGPT.
The scores are posted on the Info tab at
My.math.
Notify the instructor of discrepancies immediately.