- Get HTML page from Explorer in C++ --OR--- OCR code?
- Posted by Craig York on September 26th, 2003
Ok I'm fighting with a common problem - getting text from a web page -
that I have read threads on already (so I know why wm_gettext doesn't
work). Is it possible to launch a web page from my application,
capture it in a buffer so I can parse it and then also have it go to
IE and display as normal? I realize up front that some text may
actually be embedded in images rather than text strings in the html.
Or - anyone have some decent OCR code I could reuse with a screen cap?
I recall doing some pattern matching in grad school but I don't want
to re-invent the wheel.
Thanks,
Craig York
cyork2@yahoo.com
- Posted by Tim Robinson on September 26th, 2003
Wouldn't it be easier to obtain the HTML code of the web page yourself and
parse that?
--
Tim Robinson (MVP, Windows SDK)
http://www.themobius.co.uk/
"Craig York" <cyork2@yahoo.com> wrote in message
news:e9b7ba53.0309261321.21f82496@posting.google.c om...
- Posted by Jerry Coffin on September 27th, 2003
In article <e9b7ba53.0309261321.21f82496@posting.google.com>, cyork2
@yahoo.com says...
Downloading some HTML doesn't require using IE or anything like it. You
can _fairly_ easily open a socket to the server and grab the HTML
directly, by sending "GET [pagename]" (where 'pagename' is the name of
the page you want, or nothing for the default page) to port 80, and
reading back the result.
Win32 has some functions that are at least intended to make this a bit
easier -- see InternetOpen, InternetOpenUrl and InternetReadFile for one
fairly easy way to deal with things.
--
Later,
Jerry.
The universe is a figment of its own imagination.
- Posted by Craig York on September 27th, 2003
Thanks guys - I was wanting to do this for web pages as the user
worked in IE.. your suggestions work as long as I can get the web page
the user is currently on from IE. I may try that and if not it turns
out there are some other threads on OCR that may help me as well.
Thanks,
Craig
Jerry Coffin <jcoffin@taeus.com> wrote in message news:<MPG.19dec2b8e70b5a62989b2e@news.clspco.adelp hia.net>...
- Posted by Jerry Coffin on September 27th, 2003
In article <e9b7ba53.0309270746.2dc947ef@posting.google.com>, cyork2
@yahoo.com says...
The easiest way may be to embed a Web Browser control into your
application, and get the URL from it. If you insist on talking to an
existing instance of IE instead, you can do that as well, but it's
usually more difficult -- basically you have to figure out which (of
potentially many) instances of IE to work with, then retrieve an
interface to its Web Browser control, and work with that, just like you
would have by embedding the control in your app.
It's also possible to embed any of a number of other HTML/browser
controls in your application -- QHTM, for one obvious example.
--
Later,
Jerry.
The universe is a figment of its own imagination.