Tech Support > Computers & Technology > Programming > how can you improve this function?
how can you improve this function?
Posted by philipl@vistatec.ie on November 17th, 2003


hi,

I have some code here which basically look for within a string, the
occurance of any 3 consectative characters which are the same. so AAA
bbb etc would be reported by this function. I later added some code
which is needed to detect the scenario where the same char appears 4
times for example AAAA, in this case want to skip the remaining 3
chars and check the next character in the string. This has led me to
increase my for loop counter within my for loop, I know this is
supposed to be bad practice, is there anything you guys can suggest?

Here the code anyway - thx

private bool lookfor3CharacterOccurance(string line, string path)
{
bool found = false;

for(int i=0;i<=line.Length-4;i++)
{

//not interested in these formatting characters
if(line[i] != ' ' && (line[i] != '\u000A') && (line[i] != '\u000B')
&& (line[i] != '\u000C') && (line[i] != '\u000D') && (line[i] !=
'\u2028') && (line[i] != '\u2029') && (line[i] != '\u0009'))
{
//only what reoccurances of 3 characters of the same type
if((line[i] == line[i+1] ) && (line[i] == line[i+2] ) )
{
if(!(line[i] == line[i+3]))
{
//Console.WriteLine("{0} {1} {2}",line[i],line[i+1],line[i+2]);
shhpErr.Add("File: " + path + " contains at least one occurances of
the same characters 3 times: "+line[i]+line[i+1]+line[i+2]);
found = true;
break;
}
else
{
i=i+3;
}
}
}
}

return found;
}

Posted by Nicholas Paldino [.NET/C# MVP] on November 17th, 2003


Philip,

You are not going to be able to NOT skip ahead, unless you know the
structure of the string beforehand (which I am sure you do not). For
example, when you come across the first 'A', you don't know that there are
three more, so you have to skip to the next one and compare it.

I would take a different approach to what you are doing. I would have
two dictionaries. Instead of looking forward each time, why not keep track
of the last four characters scanned?

Hope this helps.


--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com

<philipl@vistatec.ie> wrote in message
news:a3f6c32f.0311170631.22718943@posting.google.c om...


Posted by Matthieu Villeneuve on November 17th, 2003


<philipl@vistatec.ie> wrote in message
news:a3f6c32f.0311170631.22718943@posting.google.c om...
This is actually bad practice, because 'for' corresponds to
an idiom in C (and C-based languages), which is a data structure
traversal (array, list, etc.): perform a given task for every
element in the data structure. You can read the following line:

for (int i = 0; i < array.length; i++) { ... }

as the C translation of "for every element in 'array' do ...".

'for' loops are sometimes wrongly used for situations that do not
correspond to that idiom. In that case, 'while' should be used
instead:

int i = 0;
while (i < array.length) {
... // a task that may vary according to the contents
// of the array
i++;
}


--
Matthieu Villeneuve


Posted by Niki Estner on November 17th, 2003


keep an integer that counts "consecutive identical characters", like:

int count = 0;
for (int i=0; i<line.length-1; i++)
{
if (line[i] == line[i+1])
count++;
else
if (count)
{
if (count > 3) ...
if (count = 3) ...
count = 0;
}
}
if (count > 3) ...
if (count = 3) ...

You might want to put the if phrases into another function.

That way you code doesn't take 7 A's for 4 A's (which are skipped) plus 3
A's (which aren't)

You could also increase the loop counter by two, checking (line[i] ==
line[i-1]) only if (line[i] == line[i+1]) hit (since you're not interested
in 2-couples), but I doubt this will improve performance, since the problem
is probably memory/cache-bound with the processor waiting for data most of
the time.

Niki

<philipl@vistatec.ie> schrieb im Newsbeitrag
news:a3f6c32f.0311170631.22718943@posting.google.c om...


Posted by Sherif ElMetainy on November 17th, 2003


Hello

I suggest using regular expressions. like this

bool lookfor3CharacterOccurance(string line)
{
return new
System.Text.RegularExpressions.Regex("(\\S)\\1{2,} ").Match(line).Success;
}

to match 4 characters change the 2 to 3, to match 5 make it 4 and so on

Best regards,

Sherif

<philipl@vistatec.ie> wrote in message
news:a3f6c32f.0311170631.22718943@posting.google.c om...


Posted by Corey Murtagh on November 17th, 2003


Matthieu Villeneuve wrote:

I'd say it's a matter of style. The 'for' construct is incredibly
versatile, since it doesn't limit you much in terms of the expressions
you put in there. Conceptually for is a sequential loop, but it can
also be used as a replacement for most other looping structures in C.

Whether you do or don't is up to you, I guess. But I've seen some
really complex, non-intuitive 'for' constructs in production code, so
bad form or not you've really got to understand it.

Not a great example. More to the point:

for (node* curr = head; curr && curr->value != parm; curr =
(curr->value < parm ? curr->right : curr->left);

Which should probably be written:

node* curr = head;
while (curr && curr->value != parm)
{
if (curr->value < parm)
curr = curr->right;
else
curr = curr->left;
}

But hey, be prepared to find either, and understand why they're the same
thing.

--
Corey Murtagh
The Electric Monk
"Quidquid latine dictum sit, altum viditur!"


Posted by Thomas Matthews on November 17th, 2003


Matthieu Villeneuve wrote:

What about when one wants to repeat some functionality a given
number of times:

for (unsigned int i = 0; i < 20; ++i)
{
printf("Line number %d\n", i);
}

In my 30 years of programming experience, the "for" loop is used
when there is a fixed number of iterations, initialization takes
place before loop. Generally, in the context of a for loop the
loop is to execute at least once (1 or more times). The "while"
loop allows for the loop to not be executed the first time
(zero or more iterations). The repeat or do-while loops are
used when the code is executed 1 or more times.

I really don't advise loop controls for various structures.
The idiom I use is whatever is readable and conveys the meaning
to the reader. Sometimes, in C, I will use a "for" loop as
a while loop since the incrementing can be specified in the
"for" statement whereas it must be specified _somewhere_ in
the while loop.


--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book


Posted by Corey Murtagh on November 17th, 2003


Thomas Matthews wrote:

Any of the other looping structures can be replaced with a suitable
'for' construct. The condition is evaluated first time, so:

int done = something;
while (!done)
{...}

is equivalent to:

for (int done = something; !done
{...}

and so on. And of course, since the condition defaults to true, you can
replace:

while (true) {}

with:

for (; {}

--
Corey Murtagh
The Electric Monk
"Quidquid latine dictum sit, altum viditur!"


Posted by Daniel Sjöblom on November 17th, 2003


philipl@vistatec.ie wrote:
Here's somethin in java, but it should be very easy to convert to C++.
It's a very simple finite state machine:

/**
* - line is the string to look in
* - target is the number of consecutive chars you are looking for
*/

private boolean hasConsecutive(String line, int target)
{
int len = line.length();
final int looking = 1, found = 2;
int mode = looking;
int count = 0;
char c, prev = 0;

for (int i=0; i < len; i++)
{
c = line.charAt(i);

switch (mode)
{
case (looking) :
{

if (c == prev)
{
count++;

if (count == target - 1)
{
mode = found;
}
}
else
{
count = 0;
prev = c;
}
break;
}
case (found) :
{
if (c != prev)
{
return true;
}
else
{
count = 0;
prev = c;
mode = looking;
}
break;
}
}
}

return count == target -1;
}
--
Daniel Sjöblom


Posted by Thad Smith on November 18th, 2003


philipl@vistatec.ie wrote:


One suggestion, in addition to the others presented, is to test for the
forbidden characters AFTER you find a triplet, rather than before. This
would be faster for most files. You might, though, still throw out
spaces first if you expect a lot of consecutive spaces in the input.

Thad


Posted by Matthieu Villeneuve on November 18th, 2003


"Thomas Matthews" <Thomas_MatthewsHatesSpam@sbcglobal.net> wrote in message
news:w_9ub.21809$ln3.18703@newssvr33.news.prodigy. com...
That's right, iteration of a task a given number of times is the
other idiom of 'for'. (Although your example is rather a traversal
of the set of integers from 0 to 19 than a simple iteration of
the task 20 times, since the task uses the value of i).

Absolutely, and I consider it wrong (or bad) to use 'for' for
anything that is not an iteration a given number of times or a
data structure traversal.

If the user naturally thinks "let's do that initialization, then
repeat that task while this test is true", then s/he should use
'while', not 'for'. If s/he thinks "let's do that task N times"
or "let's do that task for every element in this set", then
'for' is appropriate.

'for' and 'while' may have equivalent effects in programs, but
their semantics (in terms of human interpretation) are not
equivalent.

Saving a line of code or two (by having initialization and
increment in the same line as the test) seems to me to be a quite
bad reason to use 'for' instead of 'while'. The main goal is to
get the code as easy to understand as possible, even if it
"costs" a few more lines.


--
Matthieu Villeneuve


Posted by Matthieu Villeneuve on November 18th, 2003


"Corey Murtagh" <emonk@slingshot.co.nz.no.uce> wrote in message
news:1069100926.538187@radsrv1.tranzpeer.net...
That's very true, you _can_ replace both constructs, because
they happen to be equivalent in C and a few other languages.
But I consider it very bad style to do so, because saving a few
characters makes the code less readable, because not matching
the "natural" way of thinking of some loops (even if programmers
with a little experience can read both quite fast).


--
Matthieu Villeneuve


Posted by Matthieu Villeneuve on November 18th, 2003


"Corey Murtagh" <emonk@slingshot.co.nz.no.uce> wrote in message
news:1069099995.189154@radsrv1.tranzpeer.net...
That's a good point, lots of code use 'for' wrong, and it's something
one has to be able to understand quickly.

Absolutely. I would write it a little differently, though.
The ternary operator is another C idiom, for a quantity that depends
on a boolean value, so I would use it here. I would also write the
test "curr" as "curr != NULL", to make it a test on real boolean
values.

node* curr = head;
while (curr != NULL && curr->value != parm) {
current = (curr->value < parm) ? curr->right : curr->left;
}


--
Matthieu Villeneuve


Posted by Calum on November 18th, 2003


Matthieu Villeneuve wrote:
I'm going to disagree completely! I find the "for" notation more
readable. The reason is:

for(<initalization>; <condition>; <iterative step>) ...

means that you know _exactly_ what the initalization is, what the
condition is, and what the step is. If you bury the <step> and
<initialization> in other parts of the program, they are not so easily
identified.

I will quite happily write

for(Node *n=root; n!=NULL; n=n->tail)
...

"for" succinctly lumps together all of the loopy stuff, which in my
opinion is good structure. When debugging your loop, you know exactly
where to look.

If you want to write BASIC, stick to BASIC. Where did you read that the
for loop in C etc should only be used for numerics?


Posted by Corey Murtagh on November 18th, 2003


Matthieu Villeneuve wrote:

I don't think Thomas was talking about saving space, but aiding
readability. If you have it all in one place it's easier to glance at
that one place and see what it's doing. I agree with that too. Hunting
down where an increment is happening within a largeish loop can be
frustrating, especially if it's not my code.

Even worse though is the possibilty of skipping a vital increment. This
is a trivial thing to do with a 'while' construct simply by using
'continue'. In a for you have to explicitly decrement to neutralize the
end-of-loop increment, whether you're using 'continue' or not.

If the increment is necessary, it should be done in a 'for' construct
rather than a 'while', regardless of what other increments may be
happening in the loop. This is a fairly common practice for string
traversal where character pairs (or other groupings) may be consumed in
a single pass through the loop... the CRLF pair in DOS format text
files, for example.

--
Corey Murtagh
The Electric Monk
"Quidquid latine dictum sit, altum viditur!"


Posted by Matthieu Villeneuve on November 18th, 2003


"Calum" <calum.bulk@ntlworld.com> wrote in message
news:bpd17m$h3s$1@newsg3.svr.pol.co.uk...
I write that too, and I find nothing wrong about it, because it
exactly corresponds to the traversal of a data structure. And as I
said earlier, that's one of the idioms 'for' corresponds to.

Nowhere, I don't think I said that either...


--
Matthieu Villeneuve


Posted by Matthieu Villeneuve on November 18th, 2003


"Corey Murtagh" <emonk@slingshot.co.nz.no.uce> wrote in message
news:1069158360.114581@radsrv1.tranzpeer.net...
Well, stop me if I'm just rehashing my own words, but if the loop
is a plain iteration (either for a given number of times, or over
a data set), then the increment logically belongs to the 'for'
increment clause, so you won't have to hunt it down.

If an increment is not logically part of the iteration (e.g.
incrementing a value that is not the loop index, or the current
element of the set, etc.), then I consider that it belongs to the
code in the loop, not to the loop "definition".

I agree that there are situations where using 'for' makes the code
easier to write and read, but to me they are just exceptions in
which the code in the loop occasionally (but not often) has to touch
the loop index or the current element of the set being traversed.


--
Matthieu Villeneuve


Posted by Calum on November 18th, 2003


Matthieu Villeneuve wrote:
Ah, my apologies. You actually said "every element in the data
structure". So what's wrong with

for(const char *s=start; s!=end; ++s)
...

where start and end are pointers into a longer string? Why does it
matter what the "whole" data structure is?


Posted by Matthieu Villeneuve on November 18th, 2003


"Calum" <calum.bulk@ntlworld.com> wrote in message
news:bpd56a$a5v$1@news8.svr.pol.co.uk...
Nothing, 's' successively points to every element in a set
of elements, and to me that corresponds to a 'for' idiom.

It doesn't...

--
Matthieu Villeneuve


Posted by philipl@vistatec.ie on November 18th, 2003


thx guys - all this is an eye opener!


Similar Posts