This paper provides one review of the comparative strengths and weaknesses of LOCC and CodeCount, two tools for calculating the size of software source code. The next two sections provide quick overviews of CodeCount and LOCC. The final section presents the perceived strengths and weaknesses of the two tools. A caveat: although I am attempting to be objective in this review, I have in-depth knowledge of LOCC and only very superficial knowledge of CodeCount. Comments and corrections solicited and welcomed.
Classic CodeCount is a relatively old and mature project. The first work on Classic CodeCount began in the late 1980's. CodeCount reflects its early origins in three important ways.
Here is an example of the kind of output produced by CodeCount:
Temporary Project Name
The Totals
Total Blank | Comments | Compiler Data Exec. | Number | File SLOC
Lines Lines | Whole Embedded | Direct. Decl. Instr. | of Files | SLOC Type Definition
------------------------------------------------------------------------------------------------------------------------------------
399 61 | 94 5 | 11 37 196 | 5 | 244 CODE Physical
399 61 | 94 5 | 11 13 135 | 5 | 159 CODE Logical
0 0 | 0 0 | 0 0 0 | 0 | 0 DATA Physical
Number of files successfully accessed........................ 5 out of 5
Ratio of Physical to Logical SLOC............................ 1.53
Number of files with :
Executable Instructions > 100 = 1
Data Declarations > 100 = 0
Percentage of Comments to SLOC < 60.0 % = 4 Ave. Percentage of Comments to Physical SLOC = 40.6
Total occurrences of these Java Keywords :
Compiler Directives Data Keywords Executable Keywords
import............. 11 abstract........... 0 goto............... 0
export............. 0 const.............. 0 if................. 4
boolean............ 0 else............... 1
int................ 6 for................ 0
long............... 0 do................. 0
byte............... 0 while.............. 1
short.............. 0 continue........... 0
char............... 0 switch............. 0
extends............ 2 case............... 0
float.............. 0 break.............. 0
double............. 0 default............ 0
implements......... 1 return............. 2
class.............. 2 super.............. 0
function........... 9 this............... 1
interface.......... 0 new................ 9
native............. 0 try................ 1
void............... 12 throw.............. 0
static............. 0 throws............. 0
package............ 0 catch.............. 1
private............ 3 with............... 0
public............. 17
protected.......... 0
operator........... 0
volatile........... 0
REVISION AG1 SOURCE PROGRAM -> JAVA_LINES This output produced on Tue Apr 27 10:21:37 1999
"CSCI" CodeCount appears to be a collection of student projects done in the past year or so to modify the original code count tool to support "object" languages. The level of counting sophistication appears to be constrained by the design of Classic CodeCount. Essentially, the code was modified with the addition of counter variables to count the occurrence of method/function declarations, and the output routines were enhanced with to print the number of methods/functions found in the file. CSCI Code Count provides this kind of support for languages like HTML, XML, Excel, JavaScript, C/C++. etc.
As a result, CSCI CodeCount output consists of the standard Classic CodeCount output, interspersed with additional output lines indicating the number of objects found. Here is an excerpt of CSCI CodeCount output illustrating the kinds of object information provided:
79 15 | 27 1 | 0 1 36 | 37 | CODE CybraMenu.html
Number of User Defined Object of CybraMenu.html = 3
39 7 | 16 1 | 0 0 15 | 15 |ObjectType ObjectName
Number of Methods in Object ObjectName = 2
32 6 | 13 1 | 0 0 12 | 12 |Data Test
Number of Methods in Object Test = 2
12 3 | 5 0 | 0 0 4 | 4 |Logic NestedObj
Number of Methods in Object NestedObj = 1
Number of User Defined Method in CybraMenu.html = 2
Calculating this "hierarchical" size data at these differing grain sizes of methods, classes, and packages (as opposed to files) has significant advantages: you can determine things like the average number of methods per class, the distribution of method sizes, and so forth. Concrete applications of this information include triggering for inspection any methods that exceed a certain size, or characterizing the size of "small", "medium", and "large" methods for project estimation.
After struggling with JavaLOC for about a year, we came to the conclusion that a non-grammar-based approach to providing such "hierarchical" size data was unmanageable: the system's algorithms were kludgey and either failed or produced incorrect results rather frequently.
Joe Dane, the original developer of LOCC, recognized that a grammar-based approach would provide a robust approach to providing hierarchical data, and as a bonus, could allow the system to be designed so that new languages could be supported simply by "plugging in" a new grammar. He found a Java-based parser generator called JavaCC that provided appropriate infrastructure and grammars for many popular languages.
As well as designing the system for ease of extension to any language with a JavaCC grammar, he also designed the system for ease of extension to different input and output formats. Thus, there are command line, programmatic, and GUI input formats, and text, Leap data file, and csv (comma separated value) output formats. Here is some example output in text format from LOCC for the canonical HelloWorld program:
Java Source: HelloWorld.java (6) Number of Classes: 1 Number of Interfaces: 0 Number of Methods: 1 Package: Class: HelloWorld (5) 1 Method(s): Method: main (3)As you can see, LOCC provides the total LOC for the entire file (6), the total LOC for the class (5), and the total number of classes and methods (1 each), and the total LOC for each method (3 LOC for main).
LOCC, however, does something quite useful beyond providing a hierarchical size-based measurement of the total lines of code in one or more files. It also produces a "structural diff" by comparing two versions of a system, matching classes and methods (by name), and determining the number of lines of code that were changed in each method, class, package, etc. After making some minor modifications to the canonical HelloWorld program, the text output for the "diff" between the two versions might look as follows:
Size Difference Info for HelloWorld.java 2 lines added in method main 0 lines added in class HelloWorld 0 lines added in packageThis indicates that two lines were added (or simply modified) in the main method, that no lines were added/modified in the class HelloWorld apart from the method-level modifications already listed, and that no lines were added/modified in the package apart from those already indicated at the class level.
At least in our experience, this "diff" mechanism is even more useful than the "total" mechanism, because most projects do not start from scratch. Instead, in an incremental development scenario, a developer wants to estimate how many previously written methods might be "touched" and how many new methods might be written in the upcoming increment of the project. LOCC's diff method provides a much more accurate measure of these changes than simply computing the total LOC before and after the project increment. For example, an increment of development might not change the total size of the system significantly, but might require extensive rewriting of many hundreds of lines of code.
Although the LOCC output examples are from trivial programs, it has been used to count systems of significant size. For example, it was used to count the size of a Java system over 500,000 lines of source code.
Extension of CodeCount also typically involves copying and modification of the entire source code, resulting in replications of the system. In contrast, LOCC support for all languages, input formats, and output formats is provided in a single system. For all of these reasons, LOCC appears to be the more extensible of the two systems.