Backfiling Challenges in Implementation of Contract Life Cycle Management Solution

One of the major work streams of a contract management repository initial implementation involves the processing of your existing contracts. These existing contracts will often be referred to as your backfile, or legacy contracts. How you decide to handle them will have a considerable bearing on both the duration and cost of your implementation.

Why is backfile important? Many experts feel that the majority of the return on investment (ROI) of an Enterprise Contract Management System (ECLM) is derived from repository functionality: 

  • Being able to find existing contracts, by searching either contract attributes or a full-text search of the contract or attachment language
  • Being able to report on your contracts at the external party, contract type, contract value, department, company, region, or enterprise level
  • Being proactively notified of the impending expiration of a contract with sufficient time to take the appropriate action, such as renewing the agreement or finding an alternative supplier

Legacy contracts exist in hardcopy, softcopy, or both formats. Part of theplanning phase of the project should involve determining which sources of legacy contracts will be used to load the contract repository. This decision will be one facet of the project scope.

Picture a file folder in a filing cabinet that contains all the contracts between your organization and one external party. Part of the migration process involves defining and implementing a process and work instructions to migrate those paper documents into the new electronic contract repository. 

Are your hardcopy contracts located in a central file room, or dispersed throughout the organization, possibly even in different geographic regions? If dispersed, which ones are in the scope? What other documents are present in the folders containing the contracts? Should these documents be considered attachments to the contracts?  Do you have a process to identify which attachments go with which contracts? What will happen to the paper contracts and attachments after they are scanned into soft copy format? Should they be returned to the file room, shipped off to long-term storage, or be destroyed? Before making this decision, legal and compliance departments should be consulted.


How will amendments be handled? There are several approaches. The appropriate approach for your organization will depend on the capabilities of the technology solution and the available project resources.  Some common approaches would be to treat the amendments as file attachments to the original contract.  But how will the metadata of the base contract be affected by these amendments?  Amendments often alter critical contract attributes, such as the contract expiration date.  Failing to update base metadata could result in inaccurate reporting in the new system.

A second approach would be to treat each amendment as a separate contract record.  Many contract management systems include an ability to link or relate contract records to each other.  But linking an amendment to its base contract does not automatically update the base contract metadata.  This leaves us with the same issue as the file attachment approach.

A third option is to treat the amendments as actual amendments in the system.  These are special contract records that, when created, automatically alter the relevant base contract metadata.  Utilization of this approach usually involves loading of the base contract and then, each amendment in appropriate order into the system.  This is the most elegant, and resource consuming, approach to implement, but it results in the most accurate reports and notifications from the repository.


Document scanning is a multi-step process that involves separating and preparing paper documents into organized groups such as contracts, attachments, and other documents.  Staples, paper clips, and binder clips must be removed before the document can be physically scanned.  The scanner or scanning software must be set to correctly handle single or double-sided documents. Images of each document page are created by the scanner.  These images are then converted into a format best suited for an electronic repository. 

The most common file format is the Portable Document Format, or PDF file.  PDF files can be image only or text-searchable.  Optical Character Recognition (OCR) software is used to make a text-searchable PDF.  It does this by analyzing the pixels of an image to identify font characters.  These characters are then stored as hidden text in the PDF. It’s important to note that optical character recognition is rarely 100% accurate.  Fax, photocopies, and unusual fonts can degrade OCR accuracy. 

Think of how many contracts have been executed by Fax. If it’s hard for you to read the image, the OCR software will be equally challenged. A Fax sent twice can reduce OCR accuracy to the 40% range — and sometimes worse.  Why does OCR accuracy matter?  Because the results from your full-text search of the repository may not include all the relevant documents if the hidden text doesn’t match the actual language.

The next step in the process involves gathering contract metadata.  Metadata is the definition of the data — or a set of concise data that concisely describes extensive sets of data. In this case, that would be data that describes the attributes of the newly scanned contract.  Typical metadata items include:

  • External party name
  • External party address
  • Contract start date
  • Contract end date
  • Contract execution date
  • Termination type: end date, auto-renew, or evergreen
  • Type of contract: CDA, MSA, SOW, License, Amendment, and so on.
  • Contract amount

Some types of metadata, such as External Party Name, apply to all contracts. But not all contracts will have an associated monetary amount.  There are multiple ways to extract metadata, such as indexing and abstracting.  Indexing works well for highly-structured documents where the data fields can be easily identified by software — think standard forms.  Most contract documents, however, will require abstraction. This process involves a human reading the PDF to identify the desired/needed fields. Detailed playbooks are created to direct personnel conducting the abstraction: for example, the proper location, per the contract type, to find the desired fields.  Playbooks must also include guidance on how to handle missing data.

In recent years, another abstraction option has appeared. Several vendors have introduced software that utilizes artificial intelligence to read electronic contract documents and capture relevant metadata.  This software also can be used to scan file shares on your network to identify contracts.  This is invaluable functionality, especially in a diverse organization that may not know where all their contracts reside.

Quality Assurance and Control

Quality Assurance (QA) can be thought of as the planned and systematic activities implemented in a system so that quality requirements will be fulfilled. Quality requirements should be defined in the project Quality or Test Plan.  A project’s QA work stream can then define processes necessary to fulfill quality requirements. Quality Control (QC) refers to the observation techniques and activities used to fulfill requirements for quality.  Thus, the QA plan will define a process to check the abstracted metadata for errors. QC is the execution of the process to check the data for validity.

QC process can run the gamut from simple to extremely elaborate (and expensive).  Will the actual text of text-searchable PDFs need to be checked by human eyes?  This can be a laborious process, but it might be necessary if source documents were low quality and full-text repository searching is a business requirement of the CLM system.

QC activities are often performed by both the project team and the business.  Verifying that the field values are correct should be part of the abstraction process.  The business should do their own verification of the data before it is loaded into the repository.

How much of the data should be verified by the business?  The answer is dependent upon quality requirements as documented in the Quality/Test Plan.  Formulas exist that define recommended QC sample size given acceptable error rates for a given record count.  This decision and the associated calculations to determine sample size should be documented in the Quality/Test Plan. 

The data should then be rationalized.  But why rationalize?  Consider three scanned contracts whose external parties are: IBM; International Business Machines; and International Business Machines Corporation. Who is the actual external party? Similar issues arise with addresses. Many companies have implemented Master Data Management (MDM) solutions.  It can be beneficial to run the metadata through the MDM system for rationalization before being loaded into the ECLM system.  The backfile work stream should elaborate on the desired data rationalization approach and the overall project should address master data management. This includes defining how external party names, addresses, and other data will be loaded and maintained in the system.


We’ve talked about why your backfile is an important work stream of your contract management system initial implementation and covered some of the major topics to be addressed in the planning, analysis, and design phases of the project.  Consider working with an implementation partner who has experience addressing these and other factors of a successful Enterprise Contract Lifecycle Management system implementation.