Download text file
$if 0
After much searching and testing these HUFFMAN string (de)compression
functions are about the smallest, fastest I can come up with. I've done
away with the "hashing" functions the routines I've found used and just
created the bit masks mathmatically. They work the same and look the same
so "if it looks like a duck and quacks like a duck then....".
Anyhow, what Huffman compression does is counts all the incidences of byte
values used in a string (file), then stores the used vales in an array and
sorts that array in decending order. (highest count first) Then each
character is assigned a bit mask. The bit masks are then layed into a buffer
and viola! compressed data. Maybe! There is an outside chance that the data
won't comperss because there are too many varied and oft used odd-ball
characters but... this would be rare for text in a single language.
So lets take "PETER PIPER PICKED A PECK OF PICKLED PEPPERS." as a test
string. After the count, sort and bit map create you get a set of arrays
that look like this
"P" CharVal(01) = 80 : Count%(01) = 9 : BitCode??(1) = "00"
"E" CharVal(02) = 69 : Count%(02) = 8 : BitCode??(2) = "010"
" " CharVal(02) = 32 : Count%(02) = 7 : BitCode??(2) = "011"
"C" CharVal(02) = 67 : Count%(02) = 3 : BitCode??(2) = "1000"
"R" CharVal(02) = 82 : Count%(02) = 3 : BitCode??(2) = "1001"
"I" CharVal(02) = 73 : Count%(02) = 3 : BitCode??(2) = "1010"
"K" CharVal(02) = 75 : Count%(02) = 3 : BitCode??(2) = "1011"
"D" CharVal(02) = 68 : Count%(02) = 2 : BitCode??(2) = "11000"
"F" CharVal(02) = 70 : Count%(02) = 1 : BitCode??(2) = "11001"
"A" CharVal(02) = 65 : Count%(02) = 1 : BitCode??(2) = "11010"
"." CharVal(02) = 46 : Count%(02) = 1 : BitCode??(2) = "11011"
"O" CharVal(02) = 79 : Count%(02) = 1 : BitCode??(2) = "111000"
"L" CharVal(02) = 76 : Count%(02) = 1 : BitCode??(2) = "111001"
"S" CharVal(02) = 83 : Count%(02) = 1 : BitCode??(2) = "111010"
"T" CharVal(02) = 84 : Count%(02) = 1 : BitCode??(2) = "111011"
Packed up, "PETER" looks like this "000101110110101001" and takes up
18 bits or 2.25 bytes vs the original 5 bytes.
Unpacking requires that you keep thinking in BITS left to right (that's
backwards to the computer remember). When you read in the first bit and
it is OFF then you're dealing with either the first, second or third
character in the list ("P", "E" or " " ). If the next bit is, again
OFF then the coded letter is "P" and so on for the next two bits if
it is ON.
If that frist bit is ON then you have to keep checking until you get an
OFF bit and adding 4 for each ON bit you encounter. When you hit the OFF
bit the next two bits add to your counter to determine which character
in the list you're to use.
ÚÄÄÄÄÄÄ + 4
³ÚÄÄÄÄÄ + 4
³³ÚÄÄÄÄ + 4
³³³ÚÄÄÄÄÄÄÄÄÄ skip this bit
³³³³ÚÄÄ + 2
³³³³³ÚÄ + 1
"T" = "111011"
total = 15 so CharVal?(15) = 84
Here's the whole word "PETER" layed out with count and logic.
ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ the first bit is OFF so
³ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ the next bit is OFF too so ============ "P"
³³ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ the first bits OFF so
³³³ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ the next bit is ON so at least 2
³³³³ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ the next bit is OFF so it's a 2 ======= "E"
³³³³³ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ the first bit is ON so at least 4
³³³³³³ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄ the next bit is ON so +4 for 8
³³³³³³³ÚÄÄÄÄÄÄÄÄÄÄÄÄÄ the next bit is ON so +4 for 12
³³³³³³³³ÚÄÄÄÄÄÄÄÄÄÄÄÄ this bit is OFF so end of 4's
³³³³³³³³³ÚÄÄÄÄÄÄÄÄÄÄÄ this bit is ON so +2 for 14
³³³³³³³³³³ÚÄÄÄÄÄÄÄÄÄÄ this bit is ON so +1 for 15 ======= "T"
³³³³³³³³³³³ÚÄÄÄÄÄÄÄÄÄ the first bits OFF so
³³³³³³³³³³³³ÚÄÄÄÄÄÄÄÄ the next bit is ON so at least 2
³³³³³³³³³³³³³ÚÄÄÄÄÄÄÄ the next bit is OFF so it's a 2 ======= "E"
³³³³³³³³³³³³³³ÚÄÄÄÄÄÄ this first bit is ON so at least 4
³³³³³³³³³³³³³³³ÚÄÄÄÄÄ the next bit is OFF so end of 4's
³³³³³³³³³³³³³³³³ÚÄÄÄÄ this bit is OFF so add 0
³³³³³³³³³³³³³³³³³ÚÄÄÄ this bit is ON so +1 for 5 ============ "R"
000101110110101001
Credit is due to that unknown individual who provided the HUFFMAN.TIP
file that came with one of the earlier versions of PowerBASIC. From his
or her notes I was able to decifer the coding system and create these
functions. And so I now pass them onto you for your use.
Don Schullian
d83@DASoftVSS.com
$endif
DECLARE FUNCTION fHuffPack (O AS STRING) AS STRING
DECLARE FUNCTION fHuffUnPack (P AS STRING) AS STRING
FUNCTION PBmain ()
LOCAL Orig AS STRING
LOCAL Packed AS STRING
LOCAL UnPked AS STRING
CLS
Orig = "PETER PIPER PICKED A PECK OF PICKLED PEPPERS."
Packed = fHuffPack( Orig )
PRINT FORMAT$( LEN(Orig ), "Len = ## : >>" ); Orig ; "<<"
PRINT FORMAT$( LEN(Packed), "Len = ## : >>" ); Packed; "<<"
PRINT STRING$(79,45)
UnPked = fHuffUnPack( Packed )
PRINT "Orig: >>"; Orig ; "<<"
PRINT "UnPk: >>"; UnPked; "<<"
WaitKey$
END FUNCTION
'=========================================================================
'== fHuffPack( InString )
'==
'== PARAMS: InString = the data to be packed
'== RETURNS: Packed data string
'== 2 bytes = original length
'== 1 byte = number of unique characters
'== x bytes = unique characters
'== remainder = encoded data
'== NULL if packed version of InString would exceed the
'== string size limitations set with STRING
'=========================================================================
FUNCTION fHuffPack( InString AS STRING ) AS STRING
DIM Count (256) AS LONG ' count of all chars
DIM CharVal(256) AS LONG ' assigned values
DIM BitCode(256) AS LONG ' bit values
LOCAL Packed AS STRING ' working string buffer
LOCAL InLen AS LONG ' length of incoming
LOCAL CharCount AS LONG ' # of used chars
LOCAL I_ptr AS BYTE PTR ' ptr to incoming
LOCAL P_ptr AS BYTE PTR ' ptr to working buffer
LOCAL BitBuf AS LONG ' working buffer of bits
LOCAL BitCount AS LONG ' current bit position
LOCAL X AS LONG ' loop counter
LOCAL B AS LONG ' counter
LOCAL F AS LONG ' counter
LOCAL P AS LONG ' counter
LOCAL S AS LONG ' counter
'
I_ptr = STRPTR( InString ) ' set incoming pointer
InLen = LEN( InString ) ' set incoming length
'
FOR X = 1 TO InLen ' count each char in
INCR Count(@I_ptr) ' the incoming string
INCR I_ptr '
NEXT '
I_ptr = I_ptr - InLen ' reset pointer to start
'
FOR X = 1 TO 256 ' assign and count the #
IF Count(X) > 0 THEN ' of used characters
INCR CharCount '
CharVal(X) = X '
END IF '
NEXT '
' gets the char values in
ARRAY SORT Count(), TAGARRAY CharVal(), DESCEND ' highest first order
ARRAY INSERT CharVal(0) ' (1) is now highest
'
BitCode(2) = 2 ' create a set of bit
BitCode(3) = 3 ' values
F = 8 ' 1 = "00"
FOR X = 4 TO CharCount ' 2 = "010"
BitCode(X) = F + B ' 3 = "011"
IF B < 3 THEN ' 4 = "1000"
INCR B ' 5 = "1001"
ELSE ' 6 = "1010"
B = F ' 7 = "1011"
SHIFT LEFT F, 1 ' 8 = "11000"
F = ( F OR B ) ' 12 = "111000"
B = 0 '
END IF '
NEXT '
'
B = CharCount + 3 ' count number of bytes
FOR X = 1 TO InLen ' required to hold the
ARRAY SCAN CharVal(1), = @I_ptr, TO P ' 3 byte header +
INCR I_ptr ' number of characters +
SELECT CASE P ' the packed data
CASE 1 : BitCount = BitCount + 2 ' (see list above for
CASE 2, 3 : BitCount = BitCount + 3 ' bit counts)
CASE ELSE : BitCount = BitCount + (P \ 4) + 3 '
END SELECT '
IF BitCount => 32000 THEN ' lay off 4000 bytes
B = B + 4000 '
BitCount = BitCount - 32000 '
END IF '
NEXT '
IF BitCount > 0 THEN ' clean-up leftover
BitCount = BitCount + 7 ' bits + whatever is
B = B + ( BitCount \ 8 ) ' needed to get a full
END IF ' bit
I_ptr = I_ptr - InLen ' reset incoming pointer
Packed = STRING$( B, 0 ) ' create working buffer
MID$( Packed, 1, 2 ) = MKI$(InLen) ' put len of original
P_ptr = STRPTR( Packed ) + 2 ' set the working pointer
@P_ptr = CharCount ' byte 3 = # of chars
FOR X = 1 TO CharCount ' bytes 4 -> are the
INCR P_ptr ' char values that
@P_ptr = CharVal(X) ' correspond to the bit
NEXT ' codes
'
BitCount = 0 ' start processing bytes
FOR X = 1 TO InLen '
ARRAY SCAN CharVal(1), = @I_ptr, TO P ' find array pos of next
INCR I_ptr ' char / bump pointer
SELECT CASE P ' set # of bits required
CASE 1 : S = 2 ' to hold the bit-code
CASE 2, 3 : S = 3 '
CASE ELSE : S = (P \ 4) + 3 '
END SELECT '
SHIFT LEFT BitBuf, S ' shift buffer # new bits
BitBuf = ( BitBuf OR BitCode(P) ) ' OR new bits into buffer
BitCount = BitCount + S ' incr bit counter
IF BitCount => 16 THEN GOSUB fHuffPack1 ' if the buffer is full
NEXT '
IF BitCount > 0 THEN ' clean-up leftover bits
IF BitCount > 8 THEN GOSUB fHuffPack1 ' more than 1 byte
IF BitCount > 0 THEN ' still not done
SHIFT LEFT BitBuf, 8 - BitCount ' move bits to left
INCR P_ptr ' next/last pos in buf
@P_ptr = ( BitBuf AND 255 ) ' put it there, dude!
END IF '
END IF '
'
FUNCTION = Packed ' RETURN packed data
EXIT FUNCTION '
'======================================================'========================
fHuffPack1: ' remove full bytes from
DO ' the bit buffer and put
BitCount = BitCount - 8 ' them in the work buf
ROTATE RIGHT BitBuf, BitCount ' move the left most
INCR P_ptr ' bit to the right
@P_ptr = ( BitBuf AND 255 ) ' stuff it into the
SHIFT RIGHT BitBuf, 8 ' strip right most bits
ROTATE LEFT BitBuf, BitCount + 8 ' move bits back in order
LOOP UNTIL BitCount < 8 '
RETURN '
'
END FUNCTION
'=========================================================================
'== fHuffUnPack( Packed )
'==
'== PARAMS: Packed
'== SEE: fHuffPack for header details
'== RETURNS: Unpacked data
'== NOTE: There is NO error checking etc. as it is assumed that what
'== you send in Packed was created with fHuffPack
'=========================================================================
FUNCTION fHuffUnPack( Packed AS STRING ) AS STRING
DIM CharVal(256) AS LONG ' char values from
LOCAL UnPacked AS STRING ' working buffer
LOCAL InLen AS LONG ' length of incoming
LOCAL OutLen AS LONG ' final length of out
LOCAL I_ptr AS BYTE PTR ' pointer to incoming
LOCAL O_ptr AS BYTE PTR ' pointer to outgoing
LOCAL BitBuf AS LONG ' buffer to work bits
LOCAL FirstBit AS LONG ' current bit position
LOCAL S AS LONG ' counter
LOCAL P AS LONG ' counter
'
OutLen = CVI( Packed ) ' final returned len
UnPacked = STRING$( OutLen + 4,0 ) ' create working buf
O_ptr = STRPTR( UnPacked ) ' set working ptr
I_ptr = STRPTR( Packed ) + 2 ' set incoming ptr
S = @I_ptr ' # of chars used
InLen = LEN( Packed ) - ( S + 2 ) ' # of packed bytes
FirstBit = -1 ' starts down here
'
FOR P = 1 TO S ' load char values
INCR I_ptr ' into array
CharVal(P) = @I_ptr '
NEXT '
'
DO ' start processing
WHILE ( FirstBit < 15 ) AND ( InLen > 0 ) ' if bit buffer empty
DECR InLen ' knock off a byte
INCR I_ptr ' next byte in incoming
FirstBit = FirstBit + 8 ' loading 8 bits
SHIFT LEFT BitBuf, 8 ' make room in bit buf
BitBuf = ( BitBuf OR @I_ptr ) ' stuff new byte
WEND '
IF BIT( BitBuf, FirstBit ) = 0 THEN ' if the bit is 0
DECR FirstBit ' next bit position
IF BIT(BitBuf,FirstBit) = 0 THEN ' if still 0 then
P = 1 ' this is code "00"
ELSE ' the bit is ON
DECR FirstBit ' next bit position
P = 2 + BIT(BitBuf,FirstBit) ' "01" or "11"
END IF '
ELSE ' else the bit is 1
P = 0 ' clear code value
DO ' check next bits
P = P + 4 ' each ON = 4
DECR FirstBit '
LOOP UNTIL BIT(BitBuf,FirstBit) = 0 ' until we hit an OFF
FOR S = 2 TO 1 STEP -1 ' skip next bit then
DECR FirstBit '
IF BIT(BitBuf,FirstBit) THEN P = P + S ' add 2 and/or 1
NEXT '
END IF '
DECR FirstBit ' next bit position
@O_ptr = CharVal(P) ' set the decoded char
INCR O_ptr ' move working buf ptr
LOOP UNTIL ( InLen = 0 ) AND ( FirstBit < 0 ) ' got to get them all
'
FUNCTION = LEFT$( UnPacked, OutLen ) ' RETURN truncated
'
END FUNCTION