Electronic Documents

Electronic Documents

By Alexander Burch, David Gabriel, Ryan Godfrey, Christopher Greenwood, and Jordan Wheeler



History

    This chapter will be going over electronic documents; explaining how electronic text is stored internally, how documents are formatted, different types of documents, what the future holds for documents, and some other fun ways to use documentation. For this section though, you will be going over the history of documents detailing how documentation has transitioned dating all the way back to the dawn of man.

    The first act of documentation was said to have started in 1725 in the form of data storage, but if you actually ask yourself “When was documentation actually started?” You could find yourself surfing the web not knowing that you actually already know it. It started at the birth of man. To look at it more closely, think of when you were born. As a baby you are already experiencing the mass amounts of data being stored in your brain from your daily experiences, touching everything and putting things in your mouth. All those things you did were a form of storing data mentally. So theoretically the first form of life contained the power of documentation. The next piece of documentation might not be so obvious because it is as just as deceiving as the first. The cave man era brought us their documentation through the drawings they created on the cave walls. These forms of documentation evolved through the Stone Age assuming the first life form was man. The next discovery was founded during the beginning of the Egyptians age formerly known as the New Kingdom during the Late Bronze Age.

    Egyptians used stone tablets to record much of their data. These tablets are called Amarna Tablets, also known as Amarna Letters. They were written on cuneiform tablets in their language of diplomacy called Akkadian. Papyrus was the next big evolutionary way of documentation occurring around the turn of the 5th century. Papyrus paper came from the papyrus plant. Large quantities of the plant were located around the Nile Delta. Centuries pass and now we are closer to our era with the invention of punch cards in 1725.

    Basile Bouchon invented the first punch card around the 18th century, which consisted of perforated paper loops storing cloth patterns and had a capacity of 960 bits which were used for storing the settings of different kinds of machines. Next inventions having the form of the punch card would be different variations of punch tape first invented in 1846 by Alexander Bain, the inventor of the fax machine, and then in another form called IBM punch invented by Herman Hollerith in 1881, which made our census process reduce from 8 years to 1 year. These inventions had the same characteristics of storing data and in 1966 the invention of HP introduced the 2753A Tape Punch machine, which was able to punch a speed of 120 characters per second and was retailed at the price of $4,150. The next invention had one arguable characteristic being made like tape. Magnetic tape was invented in the 20th century starting another big boom in technology.

    Magnetic tape could store the amount of 10,000 punch cards only using half the reel. Common reels consisted of tape measuring from 2,400 to 4,800 feet. Having this long of tape brought the issue of cosmetic damage, but later was fixed by creating standing drives for the tapes to be installed in making it less prone to damage. Another invention close to magnetic tape was the magnetic drum created in Austria during 1932. The magnetic drum was said to be an early form of computer memory, which was electromagnetic pulses that could be stored by changing the magnetic orientation of ferromagnetic particles on the drum. We are now pulling closer and closer to our present time because next was the invention of the floppy disk and hard drive leading to the invention of CD’s, DVD’s, Blu-Ray’s, Flash Drives, and SSDs. The Internet is also another form of documentation and pretty huge on the fact of how fast the accessibility it provides people. Now that we have reached the present forms of documentation, let’s have a peek at what the future might hold. Talk of Hologram Storage and in theory meaning storing data in layers of tiny holograms lasting up to 30 years. Another interesting form of documentation/storage would be quantum storage being encoded on an electron. Similar would be storing data on bacteria, which is currently being researched by scientists in Hong Kong. Lastly, sand has the capability of storing mass amounts of data closing in at 1 million petabytes with 1 petabyte being equal to 1 billion megabytes.


Storage

    Electronic document storage allows the user to instantly retrieve any file that is needed. This is great, because in the physical world you must search for the document, or keep it in a filing cabinet which takes up space. Electronic documents are all stored in an easy to reach place making it fast and efficient.

    One of the steps in electronic document organization and storage is indexing. This is a very important step, because if a company’s documents aren’t in an organized fashion the company can lose money. The ability to find a document in 4 seconds as opposed to 4 minutes is critical in some situations. Filing is easier than and just as important as it is in the physical world.

    The text in electronic documents is shown using character sets. A character set is a collection of letters and symbols which are encoded by specific sequences of bits in the computer’s memory. Each byte or sequence of bytes represents a given character, which allows for numerous characters. Content developers and programmers must declare what encoding they are going to use for the computer to be able to present the content correctly. A couple of frequently used character encoding systems are ASCII and Unicode, with Unicode being used more now due to its versatility in handling multiple languages.


Formatting

    In electronic documents we use markup languages to describe how text is supposed to be read and displayed by a computer. They are used to show, stylize and describe different forms of text so, as a user, it is more appealing and understandable. There are three basic types of markup used in electronic documents. They are presentational, procedural and descriptive. In most cases, you will not see the markup language in the final printed/shown product. Instead, you will see the results of the instructions they are giving so the text shows up the way you want it.

    Presentational markup is used by text editors to establish spacing both vertically and horizontally for where the text is to be shown. Users will not see this in the document as they use it. Even text programs such as notepad which seem to have no stylizing techniques use this presentational markup to make documents more “presentable” to the user.

    Procedural markup is used to describe what needs to be done to the text to be produced or finalized. For every format change there is a specific markup stating that when the computer reads it, it will change it to the new format. For example, you can specifically state that an area of text is to be italicized or bolded for each instance that you need it. This type of markup language can be hard to transfer to other programs that might use a slightly different style of marking, so the user might need to edit every instance of the markup. A real world example of a system that uses procedural markup is typesetters for printing presses. The editor decides every format that they wish to embed onto certain text and the typesetter reads line by line the text and the markup that is associated with it.

    Descriptive or semantic markup is used to describe what blocks of text are. Instead of saying that this block of text must be formatted a certain way, it is instead labeled and another program can choose what to do with it. All the text shown is broken into blocks and it’s up to your browser to interpret how that block of text is to be shown on your computer screen. The most extensively used example of this is via the web and HTML style documents. Instead of saying this next block of text needs to be changed in size and bolded because it’s a header, the markup will label that block as “Header 1" where your browser has built in that anything tagged with “Header 1” is automatically bolded.


Extra Data: Metadata

    Electronic documents hold various kinds of data, such as the actual text of the document and the formatting of that text, both of which have already been discussed. These two kinds can be seen, one by seeing the actual words, the other by seeing how the words are arranged. But electronic documents have yet another type of data, one that is a bit less apparent. This sort of data is called metadata.

    Metadata contains information about the document itself. Now, not every file type includes metadata, but “rich” types such as .doc(x) files or .pdf files do. This data can include things such as the name of the document, the name of the authoring computer, when the document was created, the last time it was edited, and similar information. Metadata differs between file types, as well. Emails, for instance, include some unique metadata fields. They do include information about authors and creation dates, but they also include things like what attachments were included with the email and all of the people that the email was sent to. This data is kept through email replies, so if two people carry on a conversation through email, the most recent reply contains metadata on itself and all of the previous emails as well.

    Metadata can also provide additional functionality to a document. For instance, Microsoft Word has an option to keep track of all changes made to a document. When turned on, every change to a document and who made the change is saved along with the document. This is useful for collaboration so that users can see what new changes have been made and who exactly made them. It also provides the functionality of being able to undo all changes to the document even after closing the file and opening it at a different time.

    This change tracking also allows the important function of seeing the history of a document. It allows someone to see the iterations a document went through up to this point. This fact has been used in legal cases where an electronic document is an important factor. An example of such a case was when SCO Corporation tried to sue a company for violating an agreement. They submitted an electronic document detailing the suit, but this file contained the change history of the document. By looking at the document’s changes, it was found that SCO was planning on suing a completely different company for a completely different reason, violating intellectual rights, just before filing suit. This showed that SCO was simply trying to find a good target to sue and changed their minds not long before actually submitting the file. This incident happened perhaps because SCO wasn’t aware of the extra data that was being saved in their document. There are other situations where metadata can have adverse effects, such as government documents being released with previous changes recorded in the file. In these situations, it might be appropriate to delete the metadata before publishing the file. Programs, or at least Microsoft Word, do allow the deletion of metadata from a document.

    How to delete metadata from a Word Document will be detailed here, as a guide for those who want to know. It is a fairly easy task to perform:
  1. Open the Word file that the metadata is to be deleted on
  2. Click “File”
  3. To simply view the metadata, click on “Show All Properties”
  4. To delete the metadata, look under “Prepare for Sharing” and click on “Check for Issues”
  5. Click “Inspect”, have all boxes checked, and click “Inspect”
  6. A window will return with different types of data found in the document. Click the appropriate “Remove All” buttons to delete those types of data from the document.
  7. Done
    In summary, metadata is extra data that is about the document. It can contain information about who made the document and when it was made, and also can record changes to the document. It is an important part of electronic documents, and adds additional functionality. Proper management of the metadata is an important consideration, however, especially for sensitive data found in documents such as government reports.


Security

    Document security is vital in many document management applications. There are many different ways to prevent theft of important electronic documents. These range from placing a watermark on the document or using programs that create a secure location to exchange files. These two methods are quite different, but they all try to accomplish the same goal: document protection. This is difficult, because once a new security measure has been introduced, someone, somewhere finds a hole to break through. Then this opening is shared with everyone else in the world and documents can be stolen. Placing a watermark on the document can help prevent some of the damage.

    There are two types of watermarking. The first type of watermarking is visible and obvious. This method is used mainly with a still image or a moving picture, but still has applications in document security. The images or documents can still be copied, but they come with the company’s logo imprinted visually. The images or documents cannot be disputed, because it obviously belongs to one side or the other. The second type of watermarking is invisible to someone not viewing the coding. There is extra data embedded within the information. The main goal is to make it so the customer doesn’t notice as well as making sure the person who is trying to pirate the information doesn’t know what to look for. Watermarks are extremely useful so that no matter how important the document is, it is always protected.

    An option that is similar to watermarking is called a digital signature. It takes place outside of the document. There is a step-by-step process that makes it easy to create a digital signature. First, you need to copy and paste the document into a program that can send it, like an e-mail. The next step would be to use a special software program that creates a hash, or a mathematical summary, of the document. Then you use the private key that you got from the public-private authority to encrypt the hash. The newly encrypted hash becomes your digital signature of the message. Every time you send the message you need a different digital key. The recipient then uses the public key to validate the hash. This method is used often because of its simplicity and the security it provides.

    Another option would be to create a secure website that requires users to have a username and password. This creates a barrier between your documents and the outside world, but it is not impenetrable. There are programs that can do this as well. These include SharePoint 2010, a program created by Microsoft, and Sharefile. SharePoint 2010 provides a single, secure infrastructure for businesses to share documents and files. This is done over the internet, and allows for employees to work on projects from different locations while being protected from outside sources. Sharefile does about the same thing, and has an upload size of 10GBs, which is quite big. It is compatible with Windows and Mac.

    There are certain measures put into place by the U.S. government that set the bar for document protection. The HIPAA security rule establishes national standards to protect individual’s electronic personal health information that is created, received, used, or maintained by a covered entity. The main purpose is to protect patient’s confidential medical history. To further protect documents, companies can apply “Information Rights Management” (IRM) policies. These policies encrypt the documents, further increasing protection.

    There are many options to protect documents from those that would use them in malicious ways. Security is a field that is always evolving and that is why it is one of the most important fields in computer science.


Physical Documents vs. Electronic Documents

Storage

    Physical document storage and electronic document storage are quite different, but they do have some similarities in concept. A main difference is that physical documents have to have the characters printed on the paper or whatever medium they’re on, but electronic documents are stored as binary numbers using the different types of character encoding systems. A similarity between the two is in how quickly they can be accessed. For either document to be accessed quickly, they both must be filed and indexed in an orderly manner. This allows the person looking for the document, or the computer searching for the document, to find their result in a timely manner. And finally, in concept only, the two are similar in that they each document takes up a certain amount of space. So if you have many physical documents, you will need enough storage space, using filing cabinets or the like, to store all of them in. For electronic documents, you will need enough storage space, with a bigger hard drive or other memory type, to store a large number of documents.

Formatting

    There are a couple differences between electronic and physical documents. On physical documents formatting is permanent. Once you print words on a piece of paper, the format used cannot be changed. On electronic documents you’re able to change the format whenever you need to. Instead of having a permanent format on a single platform like paper, we can label paragraphs and let the platform decide how each paragraph should look like, as in web browser software. The benefit of electronic documents is you’re able to edit the format to appeal to different groups of people unlike physical documents which are only chosen by the original producer.

Metadata

    Metadata is rather unique to the electronic form of documents, and its very nature lends itself well to the digital form. With electronic documents, automatic documentation is essentially made while the file is being created. This data gets stored along with the file so that the two don’t get separated. It also allows easier revisions of documents in that it records a document’s history and can undo changes back to a desired version of the document quickly and easily. This is quite the opposite of how this sort of information would be stored for a physical document. Documentation would have to be done manually on a separate document so as not to clutter the original text. Certainly, information such as the author and date created would be fine on the original document, but the date of every revision and who did the revision would quickly clutter the paper. This would require a folder full of records on who changed what, when they did so, and what they changed. A copy of the document for each revision would be necessary to be able to go back to a desired version. All of this information would have to be handled manually so that the pieces don’t get separated or disorganized. Overall, metadata simply works better in the digital world rather than the physical.

Security

    Security is important in both physical and electronic documents. The ability to protect what you need to keep private is important and keeping pictures, documents, or images from falling into the competition’s hands. Watermarking and digital signatures are most similar to physical applications of security. You can put a watermark behind the text on a document by decreasing the opacity. Digital signatures can be created via agreed secret codes, or changed with machines similar to Enigma.


Works Cited

History

DataRecoveryGroup. "Data Storage History and Future." Data Recovery. DRG, 8 July 2011. Web. 12 Dec. 2012. <http://www.datarecoverygroup.com/articles/data-storage-history-and-future>.

Lilly, Paul. "Computer Data Storage Through the Ages -- From Punch Cards to Blu-Ray."Maximum PC. MAXIMUMPC, 2 Mar. 2009. Web. 12 Dec. 2012. <http://www.maximumpc.com/article/news/computer_data_storage_through_ages>.

Mozy. "The Past, Present, and Future of Data Storage." Mozy Online Backup. Decho Corp, 2011. Web. 12 Dec. 2012. <http://mozy.com/infographics/the-past-present-and-future-of-data-storage/>.

People of Wiki."Ancient Egypt." Wikipedia. Wikimedia Foundation, 12 Dec. 2012. Web. 12 Dec. 2012. <http://en.wikipedia.org/wiki/Ancient_Egypt>.

People of Wiki. "Armana Letters." Wikipedia. Wikimedia Foundation, 12 Dec. 2012. Web. 12 Dec. 2012. <http://en.wikipedia.org/wiki/Amarna_letters>.

People of Wiki.“Papyrus." Wikipedia. Wikimedia Foundation, 12 Dec. 2012. Web. 12 Dec. 2012. <http://en.wikipedia.org/wiki/Papyrus>.

Svardh, Mikael. "History of Data Storage." Gadgets. Fosfor Gadgets, 27 May 2006. Web. 12 Dec. 2012. <http://gadgets.fosfor.se/history-of-data-storage/>.


Storage 

Ishida, Richard.”Introducing Character Sets and Encodings.”W3C Internationalization. W3C, 5, Jan. 2009. Web. 12 Dec. 2012. <http://www.w3.org/International/getting-started/characters/>. 

"What is Document Storage?" Docuvantage.com. Document Advantage Corp. n.d. Web. 2012. <http://www.docuvantage.com/document-management-basic-tutorials/document-storage>.

Bradshaw, Ashley. "Electronic Document Storage." Scantronix. 2012. Web. 2012. <http://www.scantronix.net/document-scanning-blog/electronic-document-storage/>.

"Frequently Asked Questions - Basic Questions." Unicode.org. Unicode, Inc. 2012. Web. 2012. <http://www.unicode.org/faq/basic_q.html>.


Formatting

"What is A Markup Language?" Peterindia.net. n.d. Web. 2012. <http://www.peterindia.net/MarkupLanguageOverview.html>.

Coombs, James H, Steven J. DeRose, and Allen H. Renear. "Markup Systems and the Future of Scholarly Text Processing."  The Cover Pages. n.d. Web. 2012. <http://xml.coverpages.org/coombs.html>.

"Markup Language." Wikipedia. Wikimedia Foundation. 2012. Web. 2012. <http://en.wikipedia.org/wiki/Markup_language>.


Metadata

Nagel, Scott. "Embedded Information in Electronic Documents: Why Metadata Matters." Law Practice Today. American Bar Association. July 2004. Web. 2012. <http://apps.americanbar.org/lpm/lpt/articles/ftr07044.html>.

Chapter 3, Abelson, Hal, Ken Ledeen, and Harry Lewis. "Blown to Bits Your Life, Liberty, and Happiness After the Digital Explosion." Addison-Wesley. 2008. Web. 2012. http://www.bitsbook.com/wp-content/uploads/2008/12/B2B_3.pdf

Hamilton, Nicole. "How to Find Metadata in Word Documents." Demand Media. Hearst Communications, Inc. n.d. Web. 2012. <http://smallbusiness.chron.com/metadata-word-documents-46186.html>.


Security

LinkedIn Corporation. 2007 Web. 2012. <http://www.linkedin.com/answers/product-management/distribution/PRM_DIS/20557-1666275>.

McCoy, Jim. Securing Sharepoint Documents That You Take Offline." MDSN.com. Aug. 25 2007 Web. 2012. <http://blogs.msdn.com/b/sharepoint_workspace_development_team/archive/2010/08/25/securing-sharepoint-documents-that-you-take-offline.aspx>.

References
http://sharepoint.microsoft.com/en-us/product/capabilities/Pages/default.aspx
http://www.sharefile.com/industries/business/features.aspx
http://ssae16.com/
http://www.hhs.gov/ocr/privacy/hipaa/administrative/securityrule/index.html
http://www.locklizard.com/document-watermarking.htm
ĉ
Alex Burch,
Dec 12, 2012, 12:05 PM
Comments