HowToFixComputers.com




Watched TopicsWatched Topics SearchSearch RegisterRegister Log in to check your private messagesLog in to check your private messages ProfileProfile Log inLog in
Skybuck's Universal Code 4 (memory efficient)

 
Post new topic   Reply to topic    Index -> ASUS
Author Message
Skybuck Flying
Guest





PostPosted: Sat May 26, 2007 12:01 pm    Post subject: Skybuck's Universal Code 4 (memory efficient) Reply with quote

Skybuck's Universal Code 4:

The first version of Skybuck's Universal Code describes the basic idea but
it's memory inefficient for large ammounts of data/information.

For each data bit a marker bit was necessary.

Also the interleaving of bits could be cumbersome for software and maybe
even hardware ;)

Thus I present to you a somewhat more efficient universal code, which is
still variable bit-based but more efficient, though I am not sure if I like
the bitness of it... maybe I'll think up a byte version for
intel system which are octal/byte based :)

For now however the basic concept for version 4 is:

Length1 + Length2 + Data

This represent 3 fields stuck together in the information stream.

The first field describes the length of the second field. The second field
describes the length of the data.

The first field is like a marker for the second field, and the second field
is length for the data in binary encoding.

So I could also be described as:

Marker + Length + Data

But this could confuse people with version 1... where the term marker ment a
full marker...

So maybe describe it better as:

LengthMarker + Length + Data

or

LengthOfLength + Length + Data.


The first field consists out of zero's with a 1 terminator to describe the
length of the second field, the second field uses binary encoding, and the
data can be anything.


An example:

Suppose the following data has to be encoded/stored:

1234567890123
Hello World !


13 characters.

Length = 13 + 1 for null terminator = 14.

Bits needed to encoded 14 in binary: 0x1 + 1x2 + 1x4 + 1x8 = 4 bits.

Thus also 4 bits needed for the marker.

The information stream as stored as follows:

length marker:
0001

length:
0111

data:
xxxxxxetc.

00010111xxxxxxetc


the length field describes the data length in bits to make it bit flexible.


So now software can read the information stream as follows:

// read length marker:

LengthMarkerCount := 0;
repeat
if ReadBit( Bit ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
break; // error.
end;
until Bit = 1;

// allocate length field.

SetLengthInBits( Length, LengthMarkerCount );

// read length:
LengthCount := 0;
repeat
if ReadBit( Bit ) then
begin
SetBit( Length, LengthCount, Bit );
LengthCount := LengthCount + 1;
end else
begin
break; // error, or retry, inform, etc.
end;
until LengthCount = LengthMarkerCount;

// allocate data field in bits.

SetLengthInBits( Data, Length );

// read data
DataCount := 0;
repeat
if ReadBit( Bit ) then
begin
SetBit( Data, DataCount, Bit );
DataCount := DataCount + 1;
end else
begin
break; // error or retry.
end;
until DataCount = Length;


// ofcourse the example would not be complete without a write example how to
encode information so let's do that too:

For example take the string as example.

DataCountInBits := Length( 'Hello World !'+#0 ) * 8;
LengthCountInBits := BitNeededFor( DataCountInBits ); // use log functions
to determine bits needed to encode the value in binary.
LengthMarkerCountInBits := LengthCountInBits; // same

// now first write the marker:

if LengthMarkerCountInBits = 0 then exit; // strange error which should not
happen.

LengthMarkerCount := 0;

repeat
if LengthMarkerCount = LengthMarkerCountInBits then
begin
// write terminating bit.
if WriteBit( 1 ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
// write leading bit(s)
if WriteBit( 0 ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end;

until LengthMarkerCount = LengthMarkerCountInBits;

// now write the binary length field.

if LengthCountInBits = 0 then exit; // strange error which should not
happen.

LengthCount := 0;

repeat
if LengthCount = LengthCountInBits then
begin
if GetBit( Length, LengthCount, Bit ) then
begin
// write binary length bit.
if WriteBit( Bit ) then
begin
LengthCount := LengthCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
break; // memory error.
end;
end;

until LengthCount = LengthCountInBits;

// now write the data field:

if DataCountInBits = 0 then exit; // strange error which should not happen.

DataCount := 0;

repeat
if DataCount = DataCountInBits then
begin
if GetBit( Data, DataCount, Bit ) then
begin
// write data bit.
if WriteBit( Bit ) then
begin
DataCount := DataCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
break; // memory error.
end;
end;

until DataCount = DataCountInBits;


There, that should give you an idea of how to do it. Ofcourse this is only
concept code... not really tested it... but should work nicely.

Some random and some crappy thoughts about using it in software development
source codes (can be skipped for reading):

(*
I am not sure if I like working with bits like this.. so as I wrote
before... I might think up a byte version of the universal code... which
might be more easy to program and handle...

not sure yet... working with bits can be straight forward to... but then
again sometimes not... depends on the source/classes etc... don't know
really, haven't tried yet...

Extra classess, and methods definetly needed... would be handy to have some
code which can simply encoded any type of data really... like a pointer and
a size or so... probably
easy to make... but also everything has to be "serialized" and
"deserialized" which is a bit weird and such... also offsets go out the
window wink <- can almost no longer be used
since you don't know where you were or going to etc.. or maybe it's still
possible... probably not though...

First you have to serialize the previous fields before you can add the next
fields... that could be a drawback... but then again.. it doesn't really
have to be like that if you work
with seperate fields in memory... and then finally those fields can be
encoded and decoded seperately or they could be copied if they already
serialized / deserialized... so this kinda
creates a problem... since many possible design decissions possible...

Ofcourse it would be handy if the software works directly on universal
codes... this would make the software scale better automatically without
much or even any code changes necessary.

Working with encoded fields should therefore be recommanded with special
algorithm which take adventage of it, to create more flexiblity... and
finally for network and storage purposes
the fields could simply be copied... the serialize them after each other as
a sort of simple compression...

since on current hardware the fields will have some unused bits because the
memory works with bytes... etc.
*)

Though this code is complexer I like it better since it's more memory
efficient and actually also more speed efficient... less marker bits to
process.

Even for small numbers like 32 bits or 64 bits it's not that bad: for
example:

64 bits require 1x1,1x2,1x4,1x8,1x16,1x32,1x64 = 7 bits.

So that's a total of 7 markers bits, 7 length bits, 64 data bits = 78 bits.

Another extreme example, the opposite a large file:

20 Gigabyte = 20 * 8 = 160 gigabits.

8 bits for marker, 8 bits for length field, 160 gigabits = 160 gigabits plus
a little.

Compare that with 320 gigabits for version 1.

So a major efficiency improvement and still very probably just as flexible !
;)

Well a bit long and messy post, but I am lazy nowadays =D

But thinking about starting to use this stuff to make my software more
future-ready =DDD

BYYYYYYYYYEEEEEE,
BYEEEEEEEEEEEEEEEEEEEEEE,
Skybuck.

P.S.: My you enjoy universal codes, and maybe the Skybuck's Power be with
you =D
Back to top
Skybuck Flying
Guest





PostPosted: Sun May 27, 2007 7:18 am    Post subject: Re: Skybuck's Universal Code 4 (memory efficient) Reply with quote

Ok,

Now I understand what you're saying.

Marker:

1

Length

0

Data

Missing.

Encoding an empty universal code could be handy for software which uses
classess to implement universal codes and simply uses this encoding to
describe "no data".

You mention other systems, are they universal codes, meaning they describe
their own length and scalable indefinetly ?

I know static huffman does not scale, it's static afterall ;)

Nor will dynamic huffman probably. How will you encode the new character ?

Again you do not seem to crash the powerfull concept of a universal code =D
or you mistakenly think other systems are universal codes ;)

Nor will I modify the universal code just to save one bit and create a
mismatch in binary system... cause that what your suggestion would mean.

0 would no longer represent zero... but 1 <- not gonna happen wink way to
confusing and I am lazy :)

Thanks anyway, it did show you have nice ideas ;)

Bye,
Skybuck.
Back to top
Skybuck Flying
Guest





PostPosted: Sun May 27, 2007 10:08 am    Post subject: Re: Skybuck's Universal Code 4 (memory efficient) Reply with quote

"MooseFET" <kensmith@rahul.net> wrote in message
news:1180235247.698197.273810@d30g2000prg.googlegroups.com...
Quote:
On May 26, 7:18 pm, "Skybuck Flying" <s...@hotmail.com> wrote:
Ok,

Now I understand what you're saying.

Marker:

1

Length

0

Data

Missing.

You've used more than 2 bits to say "no data" and 3 bits to say 1 or
zero.

Better coding based on the idea behind Huffman :

00 - Nodata
01 - True
10 - False
11 - All other cases

Seems like you need 2 bits to describe true.

While my encoding only need 1 bit, sure there is overhead, but that overhead
is not part of the original data.

The original data can have any encoding that's the beauty of Skybuck's
universal code, existing encodings can be integrated into the universal
encoding.

No other special encodings are needed to represent data.

The data can stay in it's original form.

Only the marker and the length fields are prefixed to it.

Alternatively new encodings for the data section can be used as well for
example to represent larger numbers for arbitrary precision math.

Quote:
Now we have the three simple cases done now we can decide how
most efficient path goes from here. I'm lazy so I will only state a
possible path that stays more efficent than your suggestion:

At 2 bits, your method would take over 4 bits so we can step directly
to the 4 bit groups and always be better than your way:

1100 - 0 plus the following bit as a pair
1101 - 1 plus the following bit as a pair
1110 - Three bits follow
1111 - All other cases

This method can be extended without limit and will always be better.

The last time you posted this, someone pointed out a better pattern
than this but this one is good enough to prove the point.

I do not completely understand why you are making a big deal out of trying
to come up with a better encoding for the 1 bit of data case...

Also it seems you are trying to use a new encoding... that's not too
shabby... it can probably be integrated into the universal code data field
as well lol... but not really necessary.. that would just be double.

So for now I do not agree that "it's better" or even "more efficient" and I
definelty do not see how this would be "more flexible" or "self describing
length ?".

That's a good question: where is the length field in your scenerio ? ;)

Seems like you need the universal code after all ;)

The data of the universal code can use any other encoding... isn't that
beautifull ? =D

Also what did you describe: mangled encoding ?

Or did you describe a new encoding for one of the fields: Marker, Length,
Data ?

I like keeping those fields nicely seperated ;)

So I definetly do not want a mangled encoding. <- bleh .

Please clearify or try again and then come again =D

Bye,
Skybuck.
Back to top
Rudy Velthuis
Guest





PostPosted: Sun May 27, 2007 10:09 am    Post subject: Re: Skybuck's Universal Code 4 (memory efficient) Reply with quote

27.05.2007 07:19:03, Skybuck Flying wrote:

Quote:
Also,

Your reasoning must be inheritly flawed.

Yours is. Read up on Huffman coding and be amazed.
--
Rudy Velthuis http://rvelthuis.de

"Sometimes a scream is better than a thesis."
-- Ralph Waldo Emerson (1803-1882)
Back to top
Skybuck Flying
Guest





PostPosted: Sun May 27, 2007 10:19 am    Post subject: Re: Skybuck's Universal Code 4 (memory efficient) Reply with quote

Also,

Your reasoning must be inheritly flawed.

Huffman compression is based on the idea that some characters or data will
occur more often then others.

Binary encoding is the most efficient "general" encoding where no occurence
statistics are known before hand.

Then there is also dynamic huffman compression which learns the occurence as
it views the data... however this requires describing new characters...

And a very important question is how do you describe/encode the new
characters ?

Fixed bits for new characters/data is ofcourse flawed, because it's not
variable bit, thus not an universal code :)

Some without completely understand what your little tiny encoding thingy was
about.. it's probably based on flawed reasoning in the first place :)

:::

Bye,
Skybuck ;)

P.S.: I like my universal encoding to be efficient for every possible
case/scenerio wink =D not optimized for certain scenerio's and it definetly
should self describe the length of itself/scale/be flexible ofcourse
otherwise it wouldn't be a universal code ! DUH wink =D

"MooseFET" <kensmith@rahul.net> wrote in message
news:1180235247.698197.273810@d30g2000prg.googlegroups.com...
Quote:
On May 26, 7:18 pm, "Skybuck Flying" <s...@hotmail.com> wrote:
Ok,

Now I understand what you're saying.

Marker:

1

Length

0

Data

Missing.

You've used more than 2 bits to say "no data" and 3 bits to say 1 or
zero.

Better coding based on the idea behind Huffman :

00 - Nodata
01 - True
10 - False
11 - All other cases

Now we have the three simple cases done now we can decide how
most efficient path goes from here. I'm lazy so I will only state a
possible path that stays more efficent than your suggestion:

At 2 bits, your method would take over 4 bits so we can step directly
to the 4 bit groups and always be better than your way:

1100 - 0 plus the following bit as a pair
1101 - 1 plus the following bit as a pair
1110 - Three bits follow
1111 - All other cases

This method can be extended without limit and will always be better.

The last time you posted this, someone pointed out a better pattern
than this but this one is good enough to prove the point.
Back to top
Skybuck Flying
Guest





PostPosted: Mon May 28, 2007 9:55 am    Post subject: Re: Skybuck's Universal Code 4 (memory efficient) Reply with quote

"MooseFET" <kensmith@rahul.net> wrote in message
news:1180283866.819089.226600@x35g2000prf.googlegroups.com...
Quote:
On May 27, 8:20 am, "Skybuck Flying" <s...@hotmail.com> wrote:
How would you encode a larger example with your system, for example the
following data bits:

111010101010100110100101101110011
33 Bits by my count.


Maybe that would clearify your idea a bit :)

I stopped after 3 bits, so I'll pick up at 4 bits and again.

The first 4 bits of the prefix are already stated as being 1111 so we
only need to deal with the next 6 bits.

00XXXX - The 4 bit value XXXX
010XXX - Two more bits follow making a total of 5 bits
0110XX - Four more bits follow making 6 bits
01110X - Six bits follow making 7 bits
011110 - 8 bits follow
10NNNN - (9+NNNN) bits follow 9..24
110NNN - Two more of N follow then (25+N) bits *
1110NN - 4 more bits of N follow then (57+N) bits
11110N - 6 more bits of N follow then (121+N) bits
111111 - All other cases

* Your question is answered here.

Notice how my system is not limited and has a smaller overhead whereas
your system is limited to only handle lengths up to 32768 bits and
takes more.

Why do you think my system is limited ? It's not limited at all.

I have a question for your system:

After you have encoded, how do you decode ?

How do you detect where the prefixes start and the data begins ?

Bye,
Skybuck.
Back to top
Skybuck Flying
Guest





PostPosted: Mon May 28, 2007 9:55 am    Post subject: Re: Skybuck's Universal Code 4 (memory efficient) Reply with quote

"Skybuck Flying" <spam@hotmail.com> wrote in message
news:f3dsdk$p80$1@news4.zwoll1.ov.home.nl...
Quote:

"MooseFET" <kensmith@rahul.net> wrote in message
news:1180283866.819089.226600@x35g2000prf.googlegroups.com...
On May 27, 8:20 am, "Skybuck Flying" <s...@hotmail.com> wrote:
How would you encode a larger example with your system, for example the
following data bits:

111010101010100110100101101110011
33 Bits by my count.


Maybe that would clearify your idea a bit :)

I stopped after 3 bits, so I'll pick up at 4 bits and again.

The first 4 bits of the prefix are already stated as being 1111 so we
only need to deal with the next 6 bits.

from previous post:

Quote:
1100 - 0 plus the following bit as a pair
1101 - 1 plus the following bit as a pair
1110 - Three bits follow
1111 - All other cases

00XXXX - The 4 bit value XXXX
010XXX - Two more bits follow making a total of 5 bits
0110XX - Four more bits follow making 6 bits
01110X - Six bits follow making 7 bits
011110 - 8 bits follow
10NNNN - (9+NNNN) bits follow 9..24
110NNN - Two more of N follow then (25+N) bits *
1110NN - 4 more bits of N follow then (57+N) bits
11110N - 6 more bits of N follow then (121+N) bits
111111 - All other cases

Your encoding starts either with a zero, followed by ones, followed by a
terminating zero, followed by the data.

Or

Your encoding starts with a one, followed by a terminating zero, followed by
the data.

This way the length of the prefix can be determined.

Knowing the length of the prefix how to determine how many data bits follow
?

(Also the 33 bits I described are data only, no overhead bits included, the
overhead bits would be 12 bits, total: 45 bits).

Bye,
Skybuck.
Back to top
Display posts from previous:   
Post new topic   Reply to topic    Index -> ASUS All times are GMT
Page 1 of 1

 

 MemberlistMemberlist  UsergroupsUsergroups



Powered by p|-|pBB

Featured Sites: DIY Projects