- ISO Studies of underscores vs MixedCase in Ada or C++
- Posted by Andy Glew on September 26th, 2003
I am in search of any rigourous,
scientific, academic or industrial studies
comparing naming conventions in
C++ or similar languages such as
Ada:
Specifically, are names formed with
underscores more or less readable
than names formed with MixedCase
StudlyCaps camelCase?
....and similarly, any measurements
of programmer productivity, bug rate,
etc.; although IMHO readability matters
most.
* Religion - NOT?!
I understand that this is a religious issue
for many programmers, an issue of programming style.
I am not interested in a religious war.
I obviously have my own opinion, but I am
open to scientific evidence.
* Ada Studies?
I thought that I had seen studies like
this in some of the early design documents
for Ada, but I have not been able to find
such references on the web. Which is not
entirely surprising, since Ada was designed
prior to the web.
The Ada 83 and 95 Quality Guidelines recommend
underscores to improve readability, but provide
no source justifying this statement.
* What such studies might look like
Simple readability and recall:
- present a test subject with
a list of compound words
formed with underscoresand mixed case
- remove the list, and ask test subject
to write it
- score on accuracy
Program debugging
- present programs that are otherwise identical,
differing only in their use of underscores/MixedCase
to test subject programmers (e.g. a CS class)
- program has a known bug
- ask test subjects to find bug
- score on accuracy locating bug
Cruel TA study:
- Two sections of a CS class
- Enforce programming standards,
underscores vs MixedCase
- Pose a programming problem
- Score according to success
completing assignment
Empirical:
- Given version control databases
of large programs, some written in underscore
style, others in MixedCase
- Total bug rates normalized by LOC, name count, etc.
- OR: count only bugs that can be attributed
(after inspection of checkins) to misnamed variables
For that matter, I would be interested in any surveys
folks may have done that count projects and their
coding standards, possibly weighted
- open source (e.g. sourceforge)
- industrial
- textbooks, weighted by sales
- websites of coding standards, weighted by Google score...
Although this is less convincing than a rigorous study.
* Explanation of Newsgroups Chosen
I hope it is obvious why I have chosen these
newsgroups to post this search to:
comp.software-eng, comp.programming,
- an issue of software engineering
comp.lang.c++,
- the language I am most interested in
comp.lang.ada
- because I vaguely recall historical work
- Posted by Attila Feher on September 26th, 2003
Andy Glew wrote:
The underscore convention work also in case insensitive languages.
The InnerCaps convention fails to solve the issue of all caps words like
SMTPTCPIPConnection. Usual solution is to write them wrong as
SmtpTcpIpConnection.
The underscore convention tends to make lines longer, which can have bad
effect on readablity.
IMO it is a personal preference issue, and also an issue of what fonts and
development envirnmoent is in use.
IMO if one has to select *one* convention for a whole company using many
languages then only the underscore one stands. With InnerCaps there is a
possibility to create hard-to-find name collisions, especially in languages
where the type of variables can change runtime by a simple assignment.
--
Attila aka WW
- Posted by Jakob Bieling on September 26th, 2003
"Andy Glew" <andy.glew@amd.com> wrote in message
news:2cfd1a4e.0309252032.3e3c0a1a@posting.google.c om...
[snip]
Write a large text (several lines) with mixed-case and the same again
with underscores. Then give it people to read and ask them what they find
easier to read. I would not be surprised if the majority favours the text
with underscores.
[snip]
The underscore can easily be view as a space which seperates the words,
whereas mixed-case does not provide a seperation like that, but rather a
'large' here-comes-a-new-word-mark (ie. the captial letter). The problem I
see with this: non-captial letters can be 'large' as well. just have a look
at the 't', 'h' etc, which, imo, does not make reading a mixed-case text
easier.
Personally, I prefer underscore for the reason above.
Just my .02c
--
jb
(replace y with x if you want to reply by e-mail)
- Posted by Matt Gregory on September 26th, 2003
Jakob Bieling wrote:
I think we just need a programming font that has half-sized underscores
in front of all the capital letters. That would solve all these problems.
I personally don't like typing underscores, but I agree they are more
readable. Emacs does have a view-camel-cased-identifiers-as-underscored
mode, so that's a step in the right direction.
- Posted by Ludovic Brenta on September 26th, 2003
Personally I prefer underscores, too, and for that reason I really
like Emacs' glasses-mode. So, use whatever you want, *I* will always
see underscores 
--
Ludovic Brenta.
- Posted by Steve on September 26th, 2003
I think a more relevent test would be to give two versions the same code,
one with underscores, one with mixed casing, to different groups of
programmers to analyze. Include a quiz asking questions about the code.
See which version results in more correct answers, and which version
achieves the answers more quickly.
Steve
(The Duck)
"Jakob Bieling" <netsurf@gmy.net> wrote in message
news:bl0ka8$n7h$07$1@news.t-online.com...
[snip]
- Posted by Frank J. Lhota on September 26th, 2003
Underscores are basically a way to provide spaces in an identifier. Since
identifiers are generally phrases (nown phrases for objects, verb phrases
for procedures) and phrases often consist of more than one word, I find the
use of underscores to be quite natural.
The opposing argument is that underscores are too large, and that a case
change is a more readable way to indicate how to divide the decomposition
into words. To me, the upper / lower case method of delineate the words in
an indentifier has always looked like the transcript of a very fast talker.
Yes, you can make out the words, but just barely. Moreover, the use of
letter case to delineate words prohibits any other use of letter case. It
rules out using all caps for a certain category of identifiers, for example.
There is an easy way to test which convention is more readable. Here is one
of Shakespeare's sonnets rendered in the mixed case format:
FromFairestCreaturesWeDesireIncrease,
ThatTherebyBeautysRoseMightNeverDie,
ButAsTheRiperShouldByTimeDecease,
HisTenderHeirMightBearHisMemory:
ButThouContractedToThineOwnBrightEyes,
FeedstThyLightsFlameWithSelfSubstantialFuel,
MakingAFamineWhereAbundanceLies,
ThySelfThyFoeToThySweetSelfTooCruel:
ThouThatArtNowTheWorldsFreshOrnament,
AndOnlyHeraldToTheGaudySpring,
WithinThineOwnBudBuriestThyContent,
AndTenderChurlMakstWasteInNiggarding:
PityTheWorldOrElseThisGluttonBe,
ToEatTheWorldsDueByTheGraveAndThee
It may be a matter of taste, but I certainly found the original sonnet to be
more readable and more beautiful.
- Posted by Randy King on September 26th, 2003
<snip> op <snip>
This is a somwhat offtopic post, but the OP did ask the question about
readability.
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer
inwaht orredr the ltteers in a wrod are, the olny iprmoetnt tihng is
taht the frist and lsat ltteer be at the rghit pclae. The rset can be a
total mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae
the huamn mnid deos not raed ervey lteter by istlef, butthe wrod as a
wlohe. Aolbsulty amzanig huh?
- Posted by Hyman Rosen on September 26th, 2003
Randy King wrote:
"Anidroccg to crad cniyrrag lcitsiugnis planoissefors at an uemannd,
utisreviny in Bsitirh Cibmuloa, and crartnoy to the duoibus cmials
of the ueticnd rcraeseh, a slpmie, macinahcel ioisrevnn of ianretnl
cretcarahs araepps sneiciffut to csufnoe the eadyrevy oekoolnr."
- Posted by Matt Gregory on September 26th, 2003
I wrote:
Nevermind, that was a terrible idea. It was almost good though.
- Posted by Jack Klein on September 26th, 2003
On 25 Sep 2003 21:32:40 -0700, andy.glew@amd.com (Andy Glew) wrote in
comp.lang.c++:
My team is currently working under this guideline as a compromise:
Function names must be CamelMode, but optionally underscores are
allowed, e.g. Camel_Mode.
....or should I say "compromised" guidelines?
Interestingly I see a lot of programmers who prefer CamelMode for
function names, yet prefer under_scores in variable names. In every
single case where I have checked, the programmer has done at least
some coding for Windows and its Pascal, BASIC, etc., API. And in
every single case they claim that is not where their style came from.
Go figure.
--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
- Posted by Programmer Dude on September 26th, 2003
Jack Klein wrote:
I've tried just about every combination over the years. At one
point it was underscores in function names, not in data names.
OOP added enough other basic types of things it got hard to have
a style for each. Currently, I use lower_case_with_underscores
for local names and CamelCaseMode for functions/methods and
for global data.
I'm considering switching to Mixed_Case_With_Underscores for
global data. In fact, with the fairly recent addition of
several new languages to my tool kit, it's probably time to
once again re-think my whole naming convention thing.
--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|
- Posted by Mike Smith on September 26th, 2003
Hyman Rosen wrote:
Yes, it's possible to take it *too* far. But I *was* able to read the
quoted text at maybe half the speed at which I could have read it if it
were spelled correctly. And the text in Randy King's post is even more
readable than that - I can read it at almost full speed.
--
Mike Smith
- Posted by tmoran@acm.org on September 26th, 2003
for More Readable Programs", (c) 1990 ACM Press, ISBN 0-201-10745-7
(It doesn't appear to address naming questions, however.)
- Posted by Michael Feathers on September 26th, 2003
"Matt Gregory" <bleah-no-more-spam@earthlink.net> wrote in message
news:Ar_cb.6981$pP6.2822@newsread2.news.atl.earthl ink.net...
Let's see, what if an IDE had a toggle which converted identifier names back
and forth on demand, flagging any clashes. ;-)
- Posted by Hyman Rosen on September 26th, 2003
Mike Smith wrote:
Which clearly means that the first/last letter thing isn't the
only factor in comprehension.
- Posted by Default User on September 26th, 2003
Mike Smith wrote:
That's because it's not well scrambled at all. Examine the larger words,
they almost all have large unchanged or barely changed segments. Most of
the time double letter combos are kept together, very little reversal of
segments. I think the given example (I've received it many times) does
not provide much evidence for the contention at all.
Brian Rodenborn
- Posted by Default User on September 26th, 2003
Jack Klein wrote:
We are allowed underscores when acronyms appear in the name.
InitiateFMS_Executive();
Brian Rodenborn
- Posted by Arthur J. O'Dwyer on September 26th, 2003
On Fri, 26 Sep 2003, Default User wrote:
On the other hand, the thing which turned out to be confusing me the
most in Hyman's scrambled text was the typo (the comma after "unnamed").
Once I learned to ignore that, and take the rest of the grammar with a
grain of salt (the phrase including the word "uncited" also gave me
problems), it was fairly straight sailing.
At least, it was straight sailing until about half-way through, at
which point my brain kicked in and I rezilaed waht mohted was bnieg
uesd to otacsufbe the iaudividnl wdros -- at taht pniot I jsut setratd
rnidaeg tehm bdrawkcas.
Perhaps an interesting experiment would be to compare the relative
effects of ioisrevnn, aaabehiilopttzn, roandm sirnlcmabg, and radonm
dpraigh scamrbnlig. But that's not really topical here, (wherever
"here" is).
-Arthur
- Posted by Mad Hamish on September 27th, 2003
On Fri, 26 Sep 2003 15:40:00 GMT, "Frank J. Lhota"
<NOSPAM.lhota.adarose@verizon.net> wrote:
Hence the mixed case format must be better for programming.
--
"Hope is replaced by fear and dreams by survival, most of us get by."
Stuart Adamson 1958-2001
Mad Hamish
Hamish Laws
h_laws@aardvark.net.au