Posts Tagged ‘source lines’

How to count lines of PHP source code in a directory (recursively)

Saturday, July 14th, 2012

Count PHP and other programming languages lines of source code (source code files count) recursively

Being able to count the number of PHP source code lines for a website is a major statistical information for timely auditting of projects and evaluating real Project Managment costs. It is inevitable process for any software project evaluation to count the number of source lines programmers has written.
In many small and middle sized software and website development companies, it is the system administrator task to provide information or script quickly something to give info on the exact total number of source lines for projects.

Even for personal use out of curiousity it is useful to know how many lines of PHP source code a wordpress or Joomla website (with the plugins) contains.
Anyone willing to count the number of PHP source code lines under one directory level, could do it with:::

serbver:~# cd /var/www/wordpress-website
server:/var/www/wordpress-website:# wc -l *.php
17 index.php
101 wp-activate.php
1612 wp-app.php
12 wp-atom.php
19 wp-blog-header.php
105 wp-comments-post.php
12 wp-commentsrss2.php
90 wp-config-sample.php
85 wp-config.php
104 wp-cron.php
12 wp-feed.php
58 wp-links-opml.php
59 wp-load.php
694 wp-login.php
236 wp-mail.php
17 wp-pass.php
12 wp-rdf.php
15 wp-register.php
12 wp-rss.php
12 wp-rss2.php
326 wp-settings.php
451 wp-signup.php
110 wp-trackback.php
109 xmlrpc.php
4280 total

This will count and show statistics, for each and every PHP source file within wordpress-website (non-recursively), to get only information about the total number of PHP source code lines within the directory, one could grep it, e.g.:::

server:/var/www/wordpress-website:# wc -l *.php |grep -i '\stotal$'
4280 total

The command grep -i '\stotal$' has \s in beginning and $ at the end of total keyword in order to omit erroneously matching PHP source code file names which contain total in file name; for example total.php …. total_blabla.php …. blabla_total_bla.php etc. etc.

The \s grep regular expression meaning is "put empty space", "$" is placed at the end of tital to indicate to regexp grep only for words ending in string total.

So far, so good … Now it is most common that instead of counting the PHP source code lines for a first directory level to count complete number of PHP, C, Python whatever source code lines recursively – i. e. (a source code of website or projects kept in multiple sub-directories). To count recursively lines of programming code for any existing filesystem directory use find in conjunction with xargs:::

server:/var/www/wp-website1# find . -name '*.php' | xargs wc -l
1079 ./wp-admin/includes/file.php
2105 ./wp-admin/includes/media.php
103 ./wp-admin/includes/list-table.php
1054 ./wp-admin/includes/class-wp-posts-list-table.php
105 ./wp-admin/index.php
109 ./wp-admin/network/user-new.php
100 ./wp-admin/link-manager.php
410 ./wp-admin/widgets.php
108 ./wp-content/plugins/akismet/widget.php
104 ./wp-content/plugins/google-analytics-for-wordpress/wp-gdata/wp-gdata.php
104 ./wp-content/plugins/cyr2lat-slugs/cyr2lat-slugs.php
,,,,
652239 total

As you see the cmd counts and displays the number of source code lines encountered in each and every file, for big directory structures the screen gets floated and passing | less is nice, e.g.:

find . -name '*.php' | xargs wc -l | less

Displaying lines of code for each file within the directories is sometimes unnecessery, whether just a total number of programming source code line is required, hence for scripting purposes it is useful to only get the source lines total num:::

server:/var/www/wp-website1# find . -name '*.php' | xargs wc -l | grep -i '\stotal$'

Another shorter and less CPU intensive one-liner to calculate the lines of codes is:::

server:/var/www/wp-website1# ( find ./ -name '*.php' -print0 | xargs -0 cat ) | wc -l

Here is one other shell script which displays all file names within a directory with the respective calculated lines of code

For more professional and bigger projects using pure Linux bash and command line scripting might not be the best approach. For counting huge number of programming source code and displaying various statistics concerning it, there are two other tools – SLOCCount
as well as clock (count lines of code)

Both tools, are written in Perl, so for IT managers concerned for speed of calculating projects source (if too frequent source audit is necessery) this tools might be a bit sluggish. However for most projects they should be of a great add on value, actually SLOCCount was already used for calculating the development costs of GNU / Linux and other projects of high importance for Free Software community and therefore it is proven it works well with ENORMOUS software source line code calculations written in programming languages of heterogenous origin.

sloccount and cloc packages are available in default Debian and Ubuntu Linux repositories, so if you're a Debilian user like me you're in luck:::

server:~# apt-cache search cloc$
cloc - statistics utility to count lines of code
server:~# apt-cache search sloccount$
sloccount - programs for counting physical source lines of code (SLOC)

Well that's all folks, Cheers en happy counting 😉

What is the real development costs of Debian GNU / Linux – How much costs the development of a Free Software projects

Friday, February 17th, 2012

Free Software (FS) is free as in freedom as well as free as in price. Free and Open Source Software (FOSS) is developed by geek hobbyist which voluntarily put their time and effort in writting, testing and sharing with anyone for free million of lines of programming code. This doesn't mean however the price of free software costs is 0 (zero). Though the "end product" –  Free Software developed is FREE, "real" software costs as with any other product costs huge money.

I've recently read on Jeb's blog an estimation on how much is the cost of one of the major Free Software project efforts – Debian GNU / Linux
According to James E. Brombergerthe whole Debian project was estimated to be at the shocking price of $19 billion – $19 000, 000, 000 !!!

Here is how JEB got the $19 billions, a quote taken from his blog:

"By using David A Wheeler’s sloccount tool and average wage of a developer of US$72,533 (using median estimates from Salary.com and PayScale.com for 2011) I summed the individual results to find a total of 419,776,604 source lines of code for the ‘pristine’ upstream sources, in 31 programming languages — including 429 lines of Cobol and 1933 lines of Modula3!

In my analysis the projected cost of producing Debian Wheezy in February 2012 is US$19,070,177,727 (AU$17.7B, EUR€14.4B, GBP£12.11B), making each package’s upstream source code worth an average of US$1,112,547.56 (AU$837K) to produce. Impressively, this is all free (of cost).

James has done incredible job with this great research and he deserves applause.
However I believe the numbers proposed by his research are slightly different if we speak about realistic cost of Debian GNU / Linux.
The real costs of the working software ready to install on a user PC are way higher, as according to Jeb's research only the software cost based on code line count is considered.

Hence James software estimation calculates only the programming costs and miss many, many factors that constitute the software end cost.
Some of the many, many REAL COST / expenses for developing a huge Free Software project like Debian GNU / Linux to be considered are:
 

a) bandwidth costs for hosting free software (on the server side)b) bandwidth cost for developers or FS users downloading the software

a) Time spend to spread the word of the great added value of Debian and bundled software (Mouth by Mouth Marketing)

b) Time spend to advertise Debian and its free software components on blogs, social networks (identi.ca, facebook, twitter) etc.(Voluntary online Marketing, SEO etc.)

c) Time spend on generating ideas on future program versions and reporting them to Debian FS community

d) Time on evaluation and feedback on software

e) Time spend on managing free software repository (download) servers voluntarily (by system administrators)

f) Time spend by users on Bug Tracking & Bug Reporting

g) Time spend on research and self-actualization by software developer)

h) Time spend on software Quality Assurance

This are most of the multiple factors which should probably influence the cost of any non-free (proprietary software) project. No matter this costs apply for non-free software, it perfectly applies for free software as well.With all said if if we assume the non-programming costs are equal to the programming costs of $ 19 000 000 000 (suggested by Jeb). This means the real cost of Debian will presumably be at least $32 000 000 000. Putting $ 19 billion for all this long list of "additional" costs (besides pure source) factors is probably still very under-scored number.
 

  • the developers use of their own computers (hardware depreciation)
  • electricity bill of the volunteer (developer) working on the program or project
  • electricity bills for servers where free software is stored and available for download
  • volunteer developer IT skills and tech knowledge (KNOW HOW)
  • Internet, network, dial up bandwidth cost
  • personal time put in FS development (programming, design, creativity etc.)! here the sub costs are long:
  • Costs for Project Management Leaders / Project Coordination
  • The complexity of each of the projects constituting Debian

Very interesting figure from Jeb's research is the Programming Languages break down by source code figure.
Jamesresearch reveals on the 4 major programming languages used in the 17000+ software projects (part of Debian GNU / Linux):

 

  • ANSI C with 168,536,758 – (40% of all projects source code)
  • C++ at 83,187,329 – (20% of all projects source)
  • Java 34,698,990 – (lines of code 8% of sources)
  • Lisp – (7% of all projects source code)

  His research also provides a general idea on how much the source code of some of the major FOSS projects costs. Here is a copy of his figures
 

Individual Projects

Other highlights by project included:

Project Version Thousands
of SLOC
Projected cost
at US$72,533/developer/year
Samba 3.6.1 2,000 US$101 (AU$93M)
Apache 2.2.9 693 US$33.5M (AU$31M)
MySQL 5.5.17 1,200 US$64.2M (AU$59.7M)
Perl 5.14.2 669 US$32.3M (AU$30M)
PHP 5.3.9 693 US$33.5M (AU$31.1M)
Bind 9.7.3 319 US$14.8M (AU$13.8M)
Moodle 1.9.9 396 US$18.6M (AU$17.3M)
Dasher 4.11 109 US$4.8M (AU$4.4M)
DVSwitch 0.8.3.6 6 US$250K (AU$232K)

 

As you can imagine all the source evaluation results, are highly biased and are open for discussion, since evaluating a free software project/s is a hard not to say impossible task. The "open" model of development makes a project very hard to track, open source model implies too many unexpected variables missing from the equation for clear calculation on costs. What is sure however if turned in money it is very expensive to produce.  At present moment Debian Project is sponsored only through donations. The usual yearly budget 5 years ago for Debian  was only $80 000 dollars a year!! You can check Debian Project annual reports throughout the years here , for year 2012 Debian Project budget is as low as $ 222, 677 (US Dollars)! The output price of the software the project provides is enormous high if compared to the low project expenses!

For us the free software users, price is not a concern, Debian is absolutely free both  as in freedom and free as in beer 😉