Performing Fixed-Point Arithmetic
Fixed-Point Arithmetic
Addition and subtraction
Whenever you add two fixed-point numbers, you may need a carry bit to correctly represent the result. For this reason, when adding two B-bit numbers (with the same scaling), the resulting value has an extra bit compared to the two operands used.
a = fi(0.234375,0,4,6); c = a+a
c =
0.4688
DataTypeMode: Fixed-point: binary point scaling
Signedness: Unsigned
WordLength: 5
FractionLength: 6a.bin
ans = 1111
c.bin
ans = 11110
If you add or subtract two numbers with different precision, the radix point first needs to be aligned to perform the operation. The result is that there is a difference of more than one bit between the result of the operation and the operands.
a = fi(pi,1,16,13); b = fi(0.1,1,12,14); c = a + b
c =
3.2416
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 18
FractionLength: 14Multiplication
In general, a full precision product requires a word length equal to the sum
of the word length of the operands. In the following example, note that the word
length of the product c is equal to the word length of
a plus the word length of b. The
fraction length of c is also equal to the fraction length of
a plus the fraction length of
b.
a = fi(pi,1,20), b = fi(exp(1),1,16)
a =
3.1416
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 20
FractionLength: 17
b =
2.7183
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 16
FractionLength: 13c = a*b
c =
8.5397
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 36
FractionLength: 30Math with other built in data types
Note that in C, the result of an operation between an integer data type and a
double data type promotes to a double. However, in MATLAB®, the result of an operation between a built-in integer data type
and a double data type is an integer. In this respect, the fi
object behaves like the built-in integer data types in MATLAB.
When doing addition between fi and
double, the double is cast to a fi
with the same numerictype as the fi input. The result of the
operation is a fi. When doing multiplication between
fi and double, the double is cast to a
fi with the same word length and signedness of the
fi, and best precision fraction length. The result of the
operation is a fi.
a = fi(pi);
a =
3.1416
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 16
FractionLength: 13b = 0.5 * a
b =
1.5708
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 32
FractionLength: 28When doing arithmetic between a fi and one of the built-in
integer data types, [u]int[8, 16, 32], the word length and
signedness of the integer are preserved. The result of the operation is a
fi.
a = fi(pi); b = int8(2) * a
b =
6.2832
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 24
FractionLength: 13When doing arithmetic between a fi and a logical data type,
the logical is treated as an unsigned fi object with a value
of 0 or 1, and word length 1. The result of the operation is a
fi
object.
a = fi(pi); b = logical(1); c = a*b
c =
3.1416
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 17
FractionLength: 13The fimath Object
fimath properties define the rules for performing arithmetic
operations on fi objects, including math, rounding, and overflow
properties. A fi object can have a local
fimath object, or it can use the default
fimath properties. You can attach a fimath
object to a fi object by using setfimath.
Alternatively, you can specify fimath properties in the
fi constructor at creation. When a fi
object has a local fimath , rather than using the default
properties, the display of the fi object shows the
fimath properties. In this example, a has
the ProductMode property specified in the
constructor.
a = fi(5,1,16,4,'ProductMode','KeepMSB')
a =
5
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 16
FractionLength: 4
RoundingMethod: Nearest
OverflowAction: Saturate
ProductMode: KeepMSB
ProductWordLength: 32
SumMode: FullPrecisionProductMode property of a is set to
KeepMSB while the remaining fimath
properties use the default values.Note
For more information on the fimath object, its properties,
and their default values, see fimath Object Properties.
Bit Growth
The following table shows the bit growth of fi objects,
A and B, when their
SumMode and ProductMode properties use the
default fimath value, FullPrecision.
| A | B | Sum = A+B | Prod = A*B | |
|---|---|---|---|---|
| Format | fi(vA,s1,w1,f1) | fi(vB,s2,w2,f2) | — | — |
| Sign | s1 | s2 | Ssum =
(s1||s2) | Sproduct =
(s1||s2) |
| Integer bits | I1 =
w1-f1-s1 | I2=
w2-f2-s2 | Isum =
max(w1-f1,
w2-f2) + 1 -
Ssum | Iproduct =
(w1 + w2) -
(f1 +
f2) |
| Fraction bits | f1 | f2 | Fsum =
max(f1, f2)
| Fproduct =
f1 +
f2 |
| Total bits | w1 | w2 | Ssum +
Isum +
Fsum | w1 +
w2 |
This example shows how bit growth can occur in a
for-loop.
T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end
acc =
1
s33,0
acc =
3
s34,0
acc =
6
s35,0acc increases with
each iteration of the loop. This increase causes two problems: One is that code
generation does not allow changing data types in a loop. The other is that, if the
loop is long enough, you run out of memory in MATLAB. See Controlling Bit Growth for some strategies to avoid this
problem.Controlling Bit Growth
Using fimath
By specifying the fimath properties of a
fi object, you can control the bit growth as operations
are performed on the object.
F = fimath('SumMode', 'SpecifyPrecision', 'SumWordLength', 8,... 'SumFractionLength', 0); a = fi(8,1,8,0, F); b = fi(3, 1, 8, 0); c = a+b
c =
11
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 8
FractionLength: 0
RoundingMethod: Nearest
OverflowAction: Saturate
ProductMode: FullPrecision
SumMode: SpecifyPrecision
SumWordLength: 8
SumFractionLength: 0
CastBeforeSum: trueThe fi object a has a local
fimath object F. F
specifies the word length and fraction length of the sum. Under the default
fimath settings, the output, c,
normally has word length 9, and fraction length 0. However because
a had a local fimath object, the
resulting fi object has word length 8 and fraction length
0.
You can also use fimath properties to control bit growth in
a for-loop.
F = fimath('SumMode', 'SpecifyPrecision','SumWordLength',32,... 'SumFractionLength',0); T.acc = fi([],1,32,0,F); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end
acc =
1
s32,0
acc =
3
s32,0
acc =
6
s32,0Unlike when T.acc was using the default
fimath properties, the bit growth of
acc is now restricted. Thus, the word length of
acc stays at 32.
Subscripted Assignment
Another way to control bit growth is by using subscripted assignment.
a(I) = b assigns the values of b into
the elements of a specified by the subscript vector,
I, while retaining the numerictype of
a.
T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); % Assign in to acc without changing its type for n = 1:length(x) acc(:) = acc + x(n) end
acc (:) = acc + x(n) dictates that the values at subscript vector,
(:), change. However, the numerictype
of output acc remains the same. Because
acc is a scalar, you also receive the same output if you
use (1) as the subscript
vector.
for n = 1:numel(x) acc(1) = acc + x(n); end
acc =
1
s32,0
acc =
3
s32,0
acc =
6
s32,0The numerictype of acc remains the same
at each iteration of the for-loop.
Subscripted assignment can also help you control bit growth in a function. In
the function, cumulative_sum, the
numerictype of y does not change, but
the values in the elements specified by n
do.
function y = cumulative_sum(x) % CUMULATIVE_SUM Cumulative sum of elements % of a vector. % % For vectors, Y = cumulative_sum(X) is a % vector containing the cumulative sum of % the elements of X. The type of Y is the type of X. y = zeros(size(x),'like',x); y(1) = x(1); for n = 2:length(x) y(n) = y(n-1) + x(n); end end
y = cumulative_sum(fi([1:10],1,8,0))
y =
1 3 6 10 15 21 28 36 45 55
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 8
FractionLength: 0Note
For more information on subscripted assignment, see the subsasgn function.
accumpos and accumneg
Another way you can control bit growth is by using the accumpos and accumneg functions to
perform addition and subtraction operations. Similar to using subscripted
assignment, accumpos and accumneg preserve
the data type of one of its input fi objects while allowing
you to specify a rounding method, and overflow action in the input
values.
For more information on how to implement accumpos and
accumneg, see Avoid Multiword Operations in Generated Code
Overflows and Rounding
When performing fixed-point arithmetic, consider the possibility and consequences
of overflow. The fimath object specifies the overflow and
rounding modes used when performing arithmetic operations.
Overflows
Overflows can occur when the result of an operation exceeds the maximum or
minimum representable value. The fimath object has an
OverflowAction property which offers two ways of dealing
with overflows: saturation and wrap. If you set
OverflowAction to saturate, overflows
are saturated to the maximum or minimum value in the range. If you set
OverflowAction to wrap, any overflows
wrap using modulo arithmetic, if unsigned, or two’s complement wrap, if
signed.
For more information on how to detect overflow see Underflow and Overflow Logging Using fipref.
Rounding
There are several factors to consider when choosing a rounding method, including cost, bias, and whether or not there is a possibility of overflow. Fixed-Point Designer™ software offers several different rounding functions to meet the requirements of your design.
| Rounding Method | Description | Cost | Bias | Possibility of Overflow |
|---|---|---|---|---|
ceil | Rounds to the closest representable number in the direction of positive infinity. | Low | Large positive | Yes |
convergent | Rounds to the closest representable number. In the case of a
tie, convergent rounds to the nearest even
number. This approach is the least-biased rounding method
provided by the toolbox. | High | Unbiased | Yes |
floor | Rounds to the closest representable number in the direction of negative infinity, equivalent to two’s complement truncation. | Low | Large negative | No |
nearest | Rounds to the closest representable number. In the case of a
tie, nearest rounds to the closest
representable number in the direction of positive infinity. This
rounding method is the default for fi object
creation and fi arithmetic. | Moderate | Small positive | Yes |
round | Rounds to the closest representable number. In the case
of a tie, the
| High |
| Yes |
fix | Rounds to the closest representable number in the direction of zero. | Low |
| No |