software developer coding

Metadata: Is It Really Worth It?

This article originally appeared in Law Technology Today published by the ABA on Jan 12, 2016. I co-authored this piece with Joel Henry, PhD, JD.

Many attorneys consider metadata a must have piece of e-discovery, but in fact whether this “data about data” provides useful information in most cases tends to be misunderstood. The already startling cost of e-discovery is keeping pace with evolving technology ensuring that vast amounts of metadata exist. Certainly metadata can contain useful, even case-critical, information but just because it exists doesn’t mean that you need it to support your case. In the end you should only demand or agree to produce costly metadata when such data is truly imperative to providing information which bolsters a successful and cost-effective argument for your case.

Metadata Simplified

Computer applications typically translate user input into the information you see on a computer screen. A Microsoft Word document is a simple example: what you see on the screen is the result of a user inputting characters into the application, usually through a keyboard or stylus. However, there is background data that the application uses which is not visually presented, this is metadata. Depending on the file, this metadata can contain information about who created the file, on what computer, at what time, as well as who last updated or printed the file. In our Word example, other data includes detailed “track changes” information, comment fields, and hidden text fields. With emails, metadata can contain information about when the email was created, when it was sent, the recipients it was sent to (including all blind carbon copy recipients), and even when the email was delivered. Metadata is not an all-encompassing term, the above examples are contained within various types of metadata, thus it is important to understand what those types are. The Southern District of New York examined metadata thoroughly in Aguilar v. Immigration & Customs Enforcement Division, looking at case law, the Federal Rules of Civil Procedure, and the Sedona Principles. From this, the Court defined three types of metadata: substantive, system, and embedded.

Substantive metadata is commonly referred to as application metadata and is “created as a function of the application software used to create the document or file.” In other words, this metadata includes features like the history of changes made to a document, paragraph spacing information, and how fonts are displayed. System metadata is the type of information most people think of when they hear the word “metadata.” This type of data includes file names, creation date, last edited date, the user who last saved the file, etc. Lastly, embedded metadata captures the hidden information within a document such as spreadsheet formulas, hyperlinks, geographical coordinates in a photo, and hidden columns.

It is important to understand these types of metadata so you know what to ask for during a meet and confer conference. Simply asking for “metadata” might result in any one of these three being produced. This tactic can be detrimental considering that courts often deny subsequent requests for metadata as a violation of Fed. R. Civ. P. 34(b)(2)(E)(3). Being specific will ensure you get the metadata you need. Likewise, if you are the producing party, having a specific type of metadata to go after will help reduce costs and time spent finding that data. Practice tip – come to an agreement with opposing counsel during the meet and confer conference to avoid costly battles over whether metadata production is required or warranted.

Federal courts across the country have ruled differently on when and whether metadata production is an absolute requirement. Professor James Moore has noted that “[t]he courts appear to be moving toward a general presumption against the production of metadata.” This presumption against production reflects the fact that metadata often fails to provide truly relevant information; in other words, metadata very rarely actually changes what is already known about the case or a set of documents or emails. On the other hand, when parties have agreed to produce metadata, then courts are likely to compel production of metadata “unless it finds such production to be overly burdensome or costly.”

Costs of Metadata

The production of metadata has two costs, real and practical. The real costs of metadata production include the actual cost imposed on either the producing or requesting party to get the data. In a very simple case, where standard file types like spreadsheets and word processing documents exist, producing the native file will include the substantive, system, and embedded metadata (assuming the native files were preserved properly in the first place). If you follow the suggestion of one of our previous articles and produce in the native format, then you may be wondering where the additional cost comes from. There are two instances where the cost of metadata production will always present a significant hurdle. The first is when the file is in an industry specific or proprietary format and the second is when the original file is buried in backup storage (offline disks or drives).

Files in industry specific or proprietary formats present a number of difficulties. Companies may utilize systems that are either uncommon to the general public (industry specific systems), or completely unavailable to the public (proprietary systems), which would make delivery of the native file completely useless to the receiving party because the files cannot be opened, viewed, or printed. Imagine if someone gave you an eight-track tape today, it would be completely unusable to the average person. The Federal Rules of Civil Procedure provide an avenue to ensure that this type of electronically stored information (“ESI”) is provided in a format that is “in a reasonably usable form.” Abiding by this rule, in the case of proprietary data, often results in documents being provided in a PDF or TIFF file. While this ensures the receiving party will be able to read the document, it makes collecting the metadata a headache since the original file’s metadata is not provided with PDF or TIFF files. These industry specific or proprietary formats prevent the requestor from reading the native file, necessitating the producing party to export the metadata from each file manually; a process that will be both time intensive and costly, as well as complex and sometimes error prone.

A file buried in an image backup (a bit by bit duplication of a hard drive or disk) can also present significant production hurdles. A file can become solely contained within an image backup after data has been purged from a local system but saved to backup medium, this is what IT personnel might do when you request a legal preservation hold. Even more common is when data is still within the company’s data retention period, but the local user updates or deletes the file, here the only copy of the file may exist on a weekly backup tape. If you know the document exists, but must resort to digging it out of a backup, there is a tremendous cost associated with locating and extracting that file from the larger backup image. This is partly due to keeping backups economically feasible, the only way to control cost is to compress all files when stored on backup media (as more space costs more money), which requires expanding upon retrieval, making it exceedingly difficult to sift through this data to find the desired file. These “archiving systems [are] described… as ‘business efficient, not litigation efficient,’ primarily designed for emergencies rather than searching and retrieving specific files, a process characterized as ‘neither fast nor cheap.’” A 2012 presentation by the Association for Information and Image Management estimated that while “it costs about 20 cents a day to buy one gigabyte of storage… it costs around $3,500 to review that same gigabyte of storage.” So if your file is buried within a 30 gigabyte backup storage tape, it could cost $105,000 just to find what you are looking for.

Practical costs are easier to understand. When producing ESI you likely have no idea how much metadata exists or, even scarier, what it contains. One example of a practical cost includes the additional privilege and confidential review of metadata, requiring knowledgeable attorneys who understand how metadata is presented. These associated aspects of production (i.e. review before release, creation of a privilege log, etc.) grow the practical costs of metadata production.

The actual cost of metadata production varies case-by-case, making it nearly impossible to be able to tell an attorney what they will save by excluding metadata from production. In one example, the defendant had already spent $3 million on e-discovery production before being asked for the metadata of the documents already produced. The additional cost to produce the metadata was calculated to be about $500,000, which was 1/6th the overall e-discovery cost. While this figure might be lower if metadata had been requested at the outset, it demonstrates the significant cost to provide metadata.

Conclusion

Whether or not metadata should be included in an e-discovery request depends on the facts and circumstances of the case. As a practical matter, you should carefully consider if any of the three types of metadata might provide information key to your case. If the authenticity of a file is likely not to be questioned or all of your ESI consists of Microsoft Word files created on a known set of computers, metadata may not be warranted. However, if you have a complex case involving financial records, with spreadsheets and databases, metadata will likely help you understand what those documents mean and how the information, values, and calculations were derived. Similarly, if your case depends on tracking which employee e-mailed certain documents to whom, when, and why, metadata may be useful in answering these key questions.

So when is it useful to explore if metadata is worth the cost and effort? The general rule is that the more complex and interactive the computer program, the more likely metadata will actually help you understand what the program did in relation to each user and each piece of data or file. It is more often than not that simple applications, like word processers, presentation software, and scanned files, will yield very little value in metadata production and will waste enormous amounts of time and money for both producing and receiving parties.

[1] 255 F.R.D. 350 (S.D.N.Y. 2008).
[2] Id. at 354 (citing The Sedona Principles, Second Edition: Best Practices Recommendations and Principles for Addressing Electronic Document Production).
[3] 7 J. Moore, Moore’s Federal Practice § 34.12[3][c], at 34-48 to 34-49 (3d ed. 2012).
[4] Law Technology Today, Native Format: You and Opposing Counsel’s Best Friend,lawtechnologytoday.org, http://www.lawtechnologytoday.org/2015/10/native-format (last visited Dec. 20, 2016).
[5] Fed. R. Civ. P. 34(b)(2)(E).
[6] RAND Corp., Where the Money Goes Understanding Litigant Expenditures for Producing Electronic Discovery, rand.org, http://www.rand.org/content/dam/rand/pubs/monographs/2012/RAND_MG1208.pdf (last visited Dec. 20, 2015).
[7] Ralph Losey, Metadata and E-Discoveryfloridalawfirm.comhttp://floridalawfirm.com/West.Metadata.Chapter.pdf (last visited Dec. 20, 2015).

Leave a Reply

Your email address will not be published. Required fields are marked *