Estimating Software Size

Cost estimators begin a software estimate by evaluating the sizes of the deliverables to be developed. Software programs that are complex, perform many functions, have safety-of-life requirements, and require high reliability are typically larger than simpler programs.

Estimating software size requires detailed knowledge about a program’s functions, including scope, complexity, and interactions. Several methods exist to measure software size. These include the Common Software Measurement International Consortium (COSMIC) Functional Sizing Method, function point analysis, object point analysis, source lines of code (SLOC), and use case, among others. Although the methods differ from one another and may apply to different types of software development, they are all tools to arrive at a numerical estimate that characterizes the size of the software code to be developed.

Each software sizing method draws on different inputs, and some approaches may be more useful than others depending on where a program is in its development phase. Some methods are designed to estimate future work based on known requirements without regard to the software’s design, which may be useful early in development. Others may require knowledge of the software’s planned architecture to estimate work based on expected interactions among the user, hardware, and software components. Other methods consider modifications needed to standard software to meet user needs. In all cases, the applicability of a particular sizing method will depend on the type of software under development and on the availability of data on comparable efforts.

SLOC has been used widely for years as a software sizing metric, and many organizations have databases of historical SLOC counts for various types of completed programs, making it the predominant method for sizing software. If the decision is made to use historical SLOC to estimate software size, the cost estimator needs to make sure that the program being estimated is similar in size, language, and application to the historical data. In addition, the cost estimator must ensure that SLOC is reported consistently.

Consistency in software sizing is critical. Reported and estimated lines of code can vary significantly depending on the software language used and on the methodology to count SLOC. Moreover, many sizing methods lack a standards body that controls their counting rules. In the absence of a uniform counting convention, different users may take one of the source definitions for the basic approach and modify the rules internally to suit their purposes. This can result in dissimilar counts across organizations leading to problems with accuracy and reproducibility. The test of a reliable sizing method is that two individuals working independently can apply the same rules to an identical problem and arrive at commensurate results.

The following questions highlight issues to bear in mind when considering methodologies to estimate software size:

Are the rules for the sizing technique rigorously defined in a widely accepted format?
Are the rules under the control of a recognized, independent controlling body and updated from time to time?
Does the controlling body certify the competency (and, hence, consistency) of counters who use their rules?
Are statistical data available to support claims for the consistency of counting by certified counters?
How long have the rules been stable?
Are the source documents or artifacts for estimating the size metric available to the cost estimators?

After choosing a software estimation method, estimators should consider whether the software will be newly written or will be based on the modification of existing code. Modifications include reused (code used verbatim with no modifications); adapted (existing code that needs to be redesigned, may need to be converted, and may require further modification); and auto-generated code. Although modifying software may save time compared to writing fresh code, their incorporation into a new program usually requires more effort than anticipated. For instance, the time it takes to add reused code depends on whether significant integration, reverse engineering, and additional design, validation, and testing are required. If the effort to incorporate reused software is too large, it may be more cost effective to write new code.

The Office of Management and Budget (OMB) issued a memorandum⁶⁶ to ensure that new custom-developed source code be made available for reuse across the federal government. To facilitate this transition, OMB has identified four supporting requirements needed to adopt open source software. Among them, OMB has highlighted the need to secure data rights to government-wide reuse. In addition, it is essential to document the source code to facilitate its use and adoption.

Using historical data to test expectations about how much code to reuse is a best practice that can mitigate the potential for cost overruns. When assessing the potential for software reuse, analysts should consider the following:

Code reused from other programs typically requires additional adapting effort compared to code carried over from a previous organic release.
Upgrade projects tend to achieve higher levels of code reuse success than new projects. This is likely a result of having more familiarity with the software structure, compilation, and capability that are designed into an existing program.
Software that was appropriately designed for reuse from the outset requires less effort to integrate into a new program. Code designed for reuse contains modular attributes that comply with open system architecture guidelines, making reuse more efficient and effective thereby increasing the chance of successful reuse.
Even seemingly simple efforts, such as re-deploying unmodified software onto a new or upgraded platform, often face complications due to unforeseen differences including how the new hardware processes the software and integration issues with other systems.

When possible, cost estimators should check the estimated software size using two different methodologies. Developing software estimates based on several different approaches that are compared and converge toward a consensus is a best practice. In addition, the estimate should reflect the expected growth in software size from requirements growth or underestimation (that is, optimism). This growth adjustment should be made before performing risk and uncertainty analysis. Moreover, the size estimate should be updated as data become available so that growth can be monitored.

Office of Management and Budget, Executive Office of the President, M-16-21: Federal Source Code Policy: Achieving Efficiency, Transparency, and Innovation through Reusable and Open Source Software, (Washington, D.C.: August 2016).↩︎