Encoding Data

Encoding Data

By: Jordan Mendez, Ashish Nair, Sean Pino, Joshua Reed, Dalton Vonfeldt

Technical Words

  • Encryption – The process of encoding messages or information so that only people/computers with the key (see “Key”) can read it.
    • ex. Pig Latin – “Ouyay ucksay.”
  • Decryption – The process of using the key (see “Key” again) to decode the encrypted (see “Encryption”) message.
    • ex. Pig Latin – “Ouyay ucksay = You suck.” (to see how Pig Latin works, see “Key”)
  • ASCII – American Standard Code for Information Interchange.
  • Evil – Whatever a given society deems unacceptable and/or counterproductive. (see “Tanning”) If caught doing evil, you will receive an “XF” in the course. (see “Honor Code”)
  • Programmers – We’re lazy. You’re lazy. You’re probably not even reading this. I mean, technically, you have to be reading this, but most of your classmates never will.
  • Compression Ratio – The ratio of the compressed file size to the uncompressed file size.
  • Key – A File or program that contains the necessary information for decoding a message
    Encrypting is real real important. You’re gonna need to learn it. Especially if you’re going into cybersecurity or illegal stuff (see “Evil”). But we don’t really encourage it. The evil stuff, not the cybersecurity.

History of Encoding Data

    Okay, Let's get serious, A big turning point in the history of encoding data was the Enigma Machine in World War 2. A lot of people would argue that Nazi Party had little to do with encoding data, however, they were pioneers in the encryption of information. The Nazi's had the Enigma machine in World War 2, this machine's algorithm was so complex it had over a 150 million million million different permutations. The Enigma Machine was eventually cracked due to the brilliance of several mathematicians such as Alan Turning, Tommy Flowers, and very important to the field of encoding data, Claude Shannon. Claude Shannon was an mathematician, electronic engineer, and cryptographer best known for being the “Father of Information theory”. He wrote a thesis showing that electrical applications of boolean algebra could be used to manipulate any logical, numerical relationship. This is highly important and has been called the most important master’s thesis of all time.[6]


Claude Shannon

    Was Born in Petoskey, Michigan on April 30, 1916 Contributed to the cryptanalysis field for national defense during WWII Worked on codebreaking Worked on Secure Telecommunications Met with Alan Turing who had similar interests, discussed: Breaking Ciphers, Encipherment of Speech, Turing’s Universal Turing Machine paper, of which Claude was very impressed. Proved that cryptographic one-time pad is unbreakable. Proved that any unbreakable system has the same characteristics: The key must be truly random, As large as the plaintext,Never reused in whole or part, Must be kept secret.[6]

Reasons for Encryption

    People encrypt their information for many reasons, not just security. File compression is one of the many reasons people encrypt their stuff. Since we’re lazy, (see “Programmers”) here’s what HowStuffWorks.com[2] says:

As an example, let's look at a type of information we're all familiar with: words.

In John F. Kennedy's 1961 inaugural address, he delivered this famous line:

"Ask not what your country can do for you -- ask what you can do for your country."

The quote has 17 words, made up of 61 letters, 16 spaces, one dash and one period. If each letter, space or punctuation mark takes up one unit of memory, we get a total file size of 79 units. To get the file size down, we need to look for redundancies.

Immediately, we notice that:

-"ask" appears two times

-"what" appears two times

-"your" appears two times

-"country" appears two times

-"can" appears two times

-"do" appears two times

-"for" appears two times

-"you" appears two times

Ignoring the difference between capital and lower-case letters, roughly half of the phrase is redundant. Nine words -- ask, not, what, your, country, can, do, for, you -- give us almost everything we need for the entire quote. To construct the second half of the phrase, we just point to the words in the first half and fill in the spaces and punctuation.

Most compression programs use a variation of the LZ adaptive dictionary-based algorithm to shrink files. "LZ" refers to Lempel and Ziv, the algorithm's creators, and "dictionary" refers to the method of cataloging pieces of data.

The system for arranging dictionaries varies, but it could be as simple as a numbered list. When we go through Kennedy's famous words, we pick out the words that are repeated and put them into the numbered index. Then, we simply write the number instead of writing out the whole word.

So, if this is our dictionary:

1. ask

2. what

3. your

4. country

5. can

6. do

7. for

8. you

Our sentence now reads:

"1 not 2 3 4 5 6 7 8 -- 1 2 8 5 6 7 3 4"

    Okay so basically, you just read (or ignored) their explanation of what your computer will do when you compress the file. Instead of storing each letter, you store the dictionary and the words in it, which takes up slightly less space (74 units versus the original 79). But for larger files, like this textbook, there is a lot more repetition and the compression ratio (see “Compression ratio”) is much lower. 

    If you didn’t understand the previous paragraph, drop the class or grow a sense of humor. (see “Encryption”) That’s enough reason(s) to encrypt.

Types of Encryption

    There’s two primary types of encryption: symmetric and asymmetric. With symmetric encryption, you run a file through the program and create a key (see “Key”) that scrambles the file. Then you e-mail the encrypted file to the recipient and separately transmit the decoding key (which could be a password or another data file). Running the same encryption application, the recipient uses the decoding key (see “Key”) to unscramble the message. Symmetric encryption is fast but not as safe as asymmetric encryption because someone could intercept the key (see “Key”) and decode the messages. But because of its speed, it's commonly used for e-commerce transactions. [1]

    Asymmetric encryption is more complex--and more secure. Two related keys (see “Key”) are required: a public key (see “Key”) and a private key (see “Key”). You make your public key (see “Key”) available to anyone who might send you encrypted information. That key (see “Key”) can only encode data; it cannot decode it. Your private key (see “Key”) stays safe with you. When people wish to send you encrypted information, they encrypt it using your public key (see “Key”). When you receive the ciphertext, you decrypt it with your private key (see “Key”). Asymmetric encryption's added safety comes at a price: More computation is required, so the process takes longer. [1]

Graphic Representation

There are two kinds of computer graphics - raster and vector. Raster images are represented as a grid of pixels. Since they are presented this way, they scale horribly and are resolution dependant ,in other words, when you enlarge a raster image it will become pixilated. Raster graphics can be saved in many formats but the most common ones are:

       .bmp (bitmaps)

      .jpeg (Joint Photographic Experts Group)

      .GIF (Graphics Interchange Format)

      .PNG (Portable Network Graphics

    Vector Images are represented as a lines, points, and polygons that are expressed mathematically. This allows Vector Graphics to scales really well from business cards to giant billboards. Commons forms to save Vector graphics in are:

      .AI (Adobe Illustrator image)

      .CGM (Computer Graphics Metalife)

      .SVG (Scalable Vector Graphic)

    With how different these too formats are and the fact that vector graphics scale better you would assume that they are better right? Wrong. Vector graphics are really good for things like a company logo that you would need to have in all different sizes (See Business Cards example in previous paragraph) but they are bad at representing a wide range of colors and the subtle shading that would need for a photo. This is where Raster graphics shine. Since they are coded Pixel by Pixel, each one can display a different color and allow for awesome gradients. So in general, if you need something that can scale, use a Vector image, but if you need a wide range of colors with subtle switching use Raster.[3][4][5]

Assembly Code

    Most people would agree that assembly code is boring as $#!% but I’ll try to make this interesting. I’m not going to define assembly code here, that’s probably defined elsewhere in the text. Look it up, I don’t care. Most of the time, people don’t talk about assembly code in relation to encoding, but in essence assembly code is encoded knowledge that both the computer and humans can understand. In fact, assembly code is one step up from binary, which is usually too low level to be read by people (well, it requires ludicrous amounts of technical knowledge to be read). Assembly code itself is quite tedious to read, but what’s interesting about assembly code is that most people don’t write programs in assembly code, although there are exceptions where assembly code makes the program run faster ad more efficiently. But I’m sure the guys in Chapter Whatever can tell you more about that. Instead of writing really boring programs in really boring assembly language, most programs are written in higher-level programming languages, examples being Java, Python, C, C++, C#, G#, Fortran, Prolog, Ada, Ruby, etc, etc... But what are those languages? Where do the come from? Obviously people had to make them, but what did they make these languages out of? The answer comes back to the topic of this section (if it didn’t this would be a really crappy book): encoding. Knowledge gets encoded from assembly code into a higher level language, which is really just a set of predefined shortcuts in code! Wowzers! You just learned something! Or not, I don’t really care. [11]


    The American Standard Code for Information Interchange (ASCII) was developed by the committee of the American Standards Association. ASCII is a character-encoding scheme and is an industry standard, which assigns letters, numbers, and other characters within the 256 slots available in the 8-bit code. ASCII codes represent text in computers, communications equipment, and other devices that use text. Most modern character-encoding schemes are based on ASCII, though they support many additional characters.The ASCII table is divided in 3 sections: Non printable, system codes between 0 and 31.  Lower ASCII, between 32 and 127. This table originates from the older, American systems, which worked on 7-bit character tables.  Higher ASCII, between 128 and 255. This portion is programmable; characters are based on the language of your operating system or program you are using. Foreign letters are also placed in this section. [8][9][10]


    There are many different ways to encode something, one way is via Unicode.Unicode is a program that provides a unique number for every character and it doesn’t matter what language, program or platform. Unicode is one of the largest encoding systems in the world and has been adopted by companies like Apple, IBM, and Microsoft, just to name a few. All modern browsers and a lot of the operating systems out there support Unicode. The first form of Unicode dates back to 1987. It was originally a 16-bit design created by Joe Becker from Xerox. It assumed that only the scripts and characters in modern use would need to be encoded. In 1996 Unicode was no longer confided to 16-bits which made Unicode capable to produce over a million code points which allowed to encode Historic Languages like Egyptian Hieroglyphs and thousands of other characters that were not expected to need encoding.[12][13]


Works Cited

"ASCII Table and Description." Ascii Table. N.p., 2010. Web. 12 Dec. 2012.

"ASCII." What Is (American Standard Code for Information Interexchange)? Computer Hope, n.d. Web. 12 Dec. 2012.

"ASCII." Wikipedia. Wikimedia Foundation, 12 Dec. 2012. Web. 12 Dec. 2012.

Brain, Marshall. "How Microprocessors Work." HowStuffWorks. HowStuffWorks, Inc, n.d. Web. 12 Dec. 2012.

Brandt, Andrew, and Alexandra Krasne. "How It Works: Encryption." PCWorld. IDG Consumer & SMB, 14 Feb. 2000. Web. 12 Dec. 2012.

"Claude Shannon." Wikipedia. Wikimedia Foundation, 12 June 2012. Web. 12 Dec. 2012.

Harris, Tom. "How File Compression Works." HowStuffWorks. HowStuffWorks, Inc, n.d. Web. 12 Dec. 2012.

"Image File Formats." Wikipedia. Wikimedia Foundation, 12 Sept. 2012. Web. 12 Dec. 2012.

"Raster Graphics." Wikipedia. Wikimedia Foundation, 12 Apr. 2012. Web. 12 Dec. 2012.

"Unicode." Wikipedia. Wikimedia Foundation, n.d. Web. 12 Dec. 2012.

"Vector Graphics." Wikipedia. Wikimedia Foundation, 12 Apr. 2012. Web. 12 Dec. 2012.

"What Is Unicode?" What Is Unicode? N.p., n.d. Web. 12 Dec. 2012.



All images from Wikimedia commons