原來指望能夠有div_fr1x16之類的函數來實現fract16的除法,但是很遺憾vdsp居然不直接提供這樣的函數,讓人頗為尴尬,估計是因為其CPU不直接提供fract除法的緣故。不過vdsp文檔裡面提供了一個做除法的例子:
fract16 saturating_fract_divide(fract16 nom, fract16 denom)
{
int partialres = (int)nom;
int divisor = (int)denom;
fract16 rtn;
int i;
int aq; /* initial value irrelevant */
if (partialres == 0) {
/* 0/anything gives 0 */
rtn = 0;
} else if (partialres >= divisor) {
/* fract16 values have the range -1.0 <= x < +1.0, */
/* so our result cannot be as high as 1.0. */
/* Therefore, for x/y, if x is larger than y, */
/* saturate the result to positive maximum. */
rtn = 0x7fff;
} else {
/* nom is a 16-bit fractional value, so move */
/* the 16 bits to the top of partialres. */
/* (promote fract16 to fract32) */
partialres <<= 16;
/* initialize sign bit and AQ, via divs(). */
partialres = divs(partialres, divisor, &aq);
/* Update each of the value bits of the partial result */
/* and reset AQ via divq(). */
for (i=0; i<15; i++) {
partialres = divq(partialres, divisor, &aq);
}
rtn = (fract16) partialres;
}
return rtn;
}
這個計算過程在不打開優化的情況下將需要500個cycle,在打開優化之後將只需要42個cycle!相比較於將fract16轉換為float進行計算還是要快得多。
但是很顯然,例子中給出的這個函數並沒有考慮到符號位,比如計算0.2 / -0.4這樣的式子時它將直接返回1,因而我們需要對它進行適當的修改。
fract16 saturating_fract_divide(fract16 nom, fract16 denom)
{
int partialres;
int divisor;
fract16 rtn;
int i;
int aq; /* initial value irrelevant */
int sign = (nom ^ denom) & 0x8000;
partialres = abs_fr1x16(nom);
divisor = abs_fr1x16(denom);
if (partialres == 0) {
/* 0/anything gives 0 */
rtn = 0;
} else if (partialres >= divisor) {
/* fract16 values have the range -1.0 <= x < +1.0, */
/* so our result cannot be as high as 1.0. */
/* Therefore, for x/y, if x is larger than y, */
/* saturate the result to positive maximum. */
rtn = 0x7fff;
} else {
/* nom is a 16-bit fractional value, so move */
/* the 16 bits to the top of partialres. */
/* (promote fract16 to fract32) */
partialres <<= 16;
/* initialize sign bit and AQ, via divs(). */
partialres = divs(partialres, divisor, &aq);
/* Update each of the value bits of the partial result */
/* and reset AQ via divq(). */
for (i=0; i<15; i++) {
partialres = divq(partialres, divisor, &aq);
}
rtn = (fract16) (partialres);
if(sign)
rtn = negate_fr1x16(rtn);
}
return rtn;
}
優化之前需要522個cycle,優化之後需要50個cycle。