<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><title type="text">博客园_Safe3 Network Center</title><subtitle type="text">			</subtitle><id>http://feed.cnblogs.com/blog/u/27677/rss</id><updated>2012-04-24T03:51:18Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><generator>feed.cnblogs.com</generator><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/"/><link rel="self" type="application/atom+xml" href="http://feed.cnblogs.com/blog/u/27677/rss"/><entry><id>http://www.cnblogs.com/Safe3/archive/2012/04/24/2467770.html</id><title type="text">Using Internet Explorer from .NET</title><summary type="text">5.0IntroductionEarlier in this book we have looked at how to read HTML from websites, and how to navigate through websites using GET and POST requests. These techniques certainly offer high performance, but with many websites using cryptic POST data, complex cookie data, and JavaScript rendered text</summary><published>2012-04-24T03:51:00Z</published><updated>2012-04-24T03:51:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2012/04/24/2467770.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2012/04/24/2467770.html"/><content type="html">&lt;p&gt;&lt;strong&gt;5.0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Introduction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Earlier in this book we have looked at how to read HTML from websites, and how to navigate through websites using GET and POST requests. These techniques certainly offer high performance, but with many websites using cryptic POST data, complex cookie data, and JavaScript rendered text, it might be useful to know that you can always call on the assistance of Internet Explorer browsing engine to help you get the data you need.&lt;/p&gt;&lt;p&gt;It must be stated though, that using Internet Explorer to data mine web pages creates a much larger memory footprint, and is not as fast as scanning using HTTP requests alone. But it does come into its own when a data mining process requires a degree of human interaction. A good example of this would be if you wanted to create an automated test of your website, and needed to allow a non-technical user the ability to follow a sequence of steps, and select data to extract and compare, based on the familiar Internet Explorer interface.&lt;/p&gt;&lt;p&gt;This chapter is divided into two main sections. The first deals with how to use the Internet Explorer object to interact with all the various types of web page controls. The second section deals with how Internet explorer can detect and respond to a user interacting with web page elements.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.1&amp;nbsp;&amp;nbsp; Web page navigation&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;The procedure for including the Internet Explorer object in your application differs depending on which version of Visual Studio .NET you are using. After starting a new windows forms project, users of Visual Studio .NET 2002 should right click on their toolbox and select customize toolbox, click COM components, then select Microsoft Web Browser.&amp;nbsp; Users of Visual Studio .NET 2003 should right click on their toolbox and select Add/Remove Items, and then follow the same procedure as mentioned above. In Visual Studio .NET 2005, you do not need to add the web browser to the toolbox, just drag theWebBrowser&amp;nbsp;control to the form.&lt;/p&gt;&lt;p&gt;An important distinction between the Internet Explorer object used in Visual Studio .NET 2002/03 and the 2005 version is that, the latter uses a native .NET class to interact with Internet Explorer, whereas the former uses a .NET wrapper around a COM (Common Object Model) object. This creates some syntactic differences between how Internet Explorer is used within .NET 2.0 and .NET 1.x. The first example in this chapter will cover both versions of .NET for completeness. Further examples will show .NET 2.0 code only, unless the equivalent .NET 1.x code would differ substantially.&lt;/p&gt;&lt;p&gt;The first thing you will need to know when using Internet Explorer is how to navigate to a web page. Since Internet Explorer works asynchronously, you will also need to know when Internet Explorer is finished loading a web page. In the following example, we will simply navigate to&amp;nbsp;&lt;a href="http://www.google.com/"&gt;www.google.com&lt;/a&gt;&amp;nbsp;and popup a message box once the page is loaded.&lt;/p&gt;&lt;p&gt;To begin this example, drop an Internet Explorer object onto a form, as described above, and call it&amp;nbsp;WebBrowser.&amp;nbsp;Now add a button to the form and name it&amp;nbsp;btnNavigate.&amp;nbsp;Click on the button and add the following code&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync("http://www.google.com");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show("page loaded");&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync("http://www.google.com")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show("page loaded")&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;We then create the&amp;nbsp;NavigateToUrlSync&amp;nbsp;method. Note how the C# version differs in version 1.x and 2.0. This is because the COM object is expecting four optional&amp;nbsp;ref object&amp;nbsp;parameters. These parameters can optionally define the flags, target frame name, post data and headers sent with the request. They are not used in this case, yet since C# does not support optional parameters they have to be passed in nonetheless.&lt;/p&gt;&lt;p&gt;C# 1.x&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public void NavigateToUrlSync(string url)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; object oMissing = null;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; bBusy=true;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; WebBrowser.Navigate(url,ref oMissing,ref oMissing,ref oMissing,ref oMissing);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while(bBusy)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Application.DoEvents();&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public void NavigateToUrlSync(string url)&lt;/p&gt;&lt;p class="Code"&gt;{&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; bBusy=true;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; WebBrowser.Navigate(url);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while(bBusy)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Application.DoEvents();&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Public Sub NavigateToUrlSync(ByVal url As String)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; bBusy = True&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; WebBrowser.Navigate(url)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; While (bBusy)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Application.DoEvents()&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; End While&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;The while loop is polls until the public&amp;nbsp;bBusy&amp;nbsp;flag is cleared. The&amp;nbsp;DoEvents&amp;nbsp;command ensures that the application remains responsive whilst waiting for a response from the web server.&lt;/p&gt;&lt;p&gt;To clear the&amp;nbsp;bBusy&amp;nbsp;flag, we handle either the&amp;nbsp;DocumentComplete&amp;nbsp;(.NET 1.x) or&amp;nbsp;DocumentCompleted&amp;nbsp;(.NET 2.0) thus:&lt;/p&gt;&lt;p&gt;C# 1.x&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void WebBrowser_DocumentComplete(object sender, AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; bBusy = false;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; bBusy = false;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 1.x&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub WebBrowser_DocumentComplete(ByVal sender As Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent) _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Handles WebBrowser.DocumentComplete&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; bBusy = False&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub WebBrowser_DocumentCompleted(ByVal sender As Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As WebBrowserDocumentCompletedEventArgs) _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Handles WebBrowser.DocumentCompleted&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; bBusy = False&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;To finish off the example, don抰 forget to declare the public&amp;nbsp;bBusy&amp;nbsp;flag.&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public bool bBusy = false;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public bBusy As Boolean = false&lt;/p&gt;&lt;/div&gt;&lt;p&gt;To test the application, compile and run it in Visual Studio, then press the navigate button.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.2 &amp;nbsp; Manipulating web pages&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;An advantage of using Internet Explorer over raw HTTP requests is that you get access to the DOM (Document Object Model) of web pages, once they are loaded into Internet Explorer. For developers familiar with JavaScript, this should be an added bonus, since you will be able to control the web page in much the same way as if you were using JavaScript within a HTML page.&lt;/p&gt;&lt;p&gt;The main difference however, between using the DOM in .NET versus JavaScript, is that .NET is a strongly typed language, and therefore you must know the type of the element you are interacting with before you can access its full potential.&lt;/p&gt;&lt;p&gt;If you are using .NET 1.x you will need to reference the HTML type library, by clicking Projects &amp;gt; Add Reference. Then select&amp;nbsp;Microsoft.mshtml&amp;nbsp;from the list. For each of the examples in this section you must import the namespace into your code thus:&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;using mshtml;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Imports mshtml&lt;/p&gt;&lt;/div&gt;&lt;p&gt;If you then cast the&amp;nbsp;WebBrowser.Document&amp;nbsp;object to an&amp;nbsp;HTMLDocument&amp;nbsp;class, many of the code examples shown below should word equally well for .NET 1.x as .NET 2.0&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.2.1&amp;nbsp; Frames&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Frames may be going out of fashion in modern websites, but oftentimes, you may need to extract data from a website that uses frames, and you need to be aware how to handle them within Internet Explorer. In this section, you will notice that the code differs substantially between version 1.x and 2.0 of .NET, therefore source code for both are included.&lt;/p&gt;&lt;p&gt;To create a simple frameset, create three files,&amp;nbsp;Frameset.html,&amp;nbsp;left.html&amp;nbsp;and&amp;nbsp;right.html, these files containing the following HTML code respectively.&lt;/p&gt;&lt;p&gt;Frameset.html&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;html&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;frameset cols="50%,50%"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;frame name="LeftFrame" src="Left.html"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;frame name="LeftFrame" src="right.html"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;/frameset&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/html&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;Left.html&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;html&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;This is the left frame&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/html&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;Right.html&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;html&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;This is the right frame&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/html&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;In the following example, we will use Internet Explorer to read the HTML contents of the left frame. This example uses code from the program listing in section 5.1, and assumes you have saved the HTML files in&amp;nbsp;C:\&lt;/p&gt;&lt;p&gt;VB.NET 1.x&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync("C:\frameset.html")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim hDoc As HTMLDocument&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc = WebBrowser.Document&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc = CType(hDoc.frames.item(0), HTMLWindow2).document&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show(hDoc.body.innerHTML)&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync("C:\frameset.html")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim hDoc As HtmlDocument&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc = WebBrowser.Document.Window.Frames(0).Document&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show(hDoc.Body.InnerHtml)&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;C# 1.x&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"C:\frameset.html");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HTMLDocument hDoc;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; object oFrameIndex = 0;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc = (HTMLDocument)WebBrowser.Document;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc = (HTMLDocument)((HTMLWindow2)hDoc.frames.item(&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ref oFrameIndex)).document;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show(hDoc.body.innerHTML);&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"C:\frameset.html");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HtmlDocument hDoc;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc = WebBrowser.Document.Window.Frames[0].Document;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show(hDoc.Body.InnerHtml);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;The main difference between the .NET 2.0 and .NET 1.x versions of the above code is that the indexer on the frames collection returns an object, which must be cast to an&amp;nbsp;HTMLWindow2&amp;nbsp;under the COM wrapper in .NET 1.x. In .NET 2.0 the indexer performs the cast internally, and returns an&amp;nbsp;HtmlWindow&amp;nbsp;object.&lt;/p&gt;&lt;p&gt;To test the application, compile and run it from Visual Studio .NET, press the navigate button, and a message box should pop up saying this is the left frame.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.2.2&amp;nbsp; Input boxes&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Input boxes are used in HTML to allow the user enter text into a web page. Here we will automatically populate an input box with some data.&lt;/p&gt;&lt;p&gt;Given a some HTML, which we save as InputBoxes.html as follows&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;html&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;form name="myForm"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; My Name is :&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;input type="text" value="" name="myName"&amp;gt;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;/form&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/html&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;We can get a reference to the input box on the form by calling&amp;nbsp;getElementById&amp;nbsp;on the&amp;nbsp;HtmlDocument. In .NET 1.x this should be then cast to an&amp;nbsp;IHTMLInputElement.&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"C:\InputBoxes.html");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HtmlElement hElement;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = WebBrowser.Document.GetElementById("myName");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.SetAttribute("value", "Joe Bloggs");&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync("C:\InputBoxes.html")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim hElement As HtmlElement&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = WebBrowser.Document.GetElementById("myName")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.SetAttribute("value", "Joe Bloggs")&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;In order to enter the text into the input box, we call the&amp;nbsp;SetAttribute&amp;nbsp;method of the&amp;nbsp;HtmlElement, passing in the property to change, and the new text. In .NET 1.x we would set the&amp;nbsp;value&amp;nbsp;property of the&amp;nbsp;IHTMLInputElement&amp;nbsp;to the new text.&lt;/p&gt;&lt;p&gt;To test the application, compile and run it from Visual Studio .NET, then press the navigate button.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.2.3&amp;nbsp; Drop down lists&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;In HTML, drop down lists are used in web pages to allow users input from a list of pre-defined values. In the following example, we will demonstrate how to set a value of a drop down list, and then read it back.&lt;/p&gt;&lt;p&gt;We shall start off with a HTML file, which we save as&amp;nbsp;DropDownList.html&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;html&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;form name="myForm"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; My favourite colour is:&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; &amp;lt;select name="myColour"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;option value="Blue"&amp;gt;Blue&amp;lt;/option&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;option value="Red"&amp;gt;Red&amp;lt;/option&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; &amp;lt;/select&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;/form&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/html&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;We can get a reference to the drop down list by calling&amp;nbsp;getElementById&amp;nbsp;on the&amp;nbsp;HtmlDocument. In .NET 1.x this should be then cast to anIHTMLSelectElement.&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"C:\dropdownlists.html");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; HtmlElement hElement;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = WebBrowser.Document.GetElementById("myColour");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.SetAttribute("selectedIndex", "1");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show("My favourite colour is:" + hElement.GetAttribute("value"));&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlAsync("C:\dropdownlists.html")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim hElement As HtmlElement&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;hElement = WebBrowser.Document.GetElementById("myColour")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.SetAttribute("selectedIndex", "1")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show("My favourite colour is:" + hElement.GetAttribute("value"))&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;Here, we can see that in order to set our selection we pass ?/span&amp;gt;selectedIndex? and the selection number to&amp;nbsp;SetAttribute. We then pass ?/span&amp;gt;value? to&amp;nbsp;GetAttribute&amp;nbsp;in order to read back the selection. In .NET 1.x, we achieve the same results by setting the&amp;nbsp;selectedIndex&amp;nbsp;property on theIHTMLSelectElement&amp;nbsp;and reading back the selection from the&amp;nbsp;value&amp;nbsp;property.&lt;/p&gt;&lt;p&gt;To test the application, compile and run it from Visual Studio .NET, press the navigate button, and you should see a message box appear saying my favorite color is: Red.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.2.4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Check boxes and radio buttons&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Check boxes and radio buttons are generally used on web pages to allow the user to select between small numbers of options. In the following example, we shall demonstrate how to toggle check boxes and radio buttons.&lt;/p&gt;&lt;p&gt;We shall start off with a HTML file, which we will save as&amp;nbsp;CheckBoxes.html&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;html&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;form name="myForm"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;input type="checkbox" name="myCheckBox"&amp;gt;Check this.&amp;lt;br&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;input type="radio" name="myRadio" value="Yes"&amp;gt;Yes&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;lt;input type="radio" name="myRadio" checked="true" value="No"&amp;gt;No&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;/form&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/html&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;As before we can get a reference to the checkbox by calling&amp;nbsp;getElementById. However, since the two radio buttons have the same name, we need to use&lt;/p&gt;&lt;p&gt;Document.All.GetElementsByName&amp;nbsp;and then select the required radio button from the&amp;nbsp;HtmlElementCollection&amp;nbsp;returned.&lt;/p&gt;&lt;p&gt;In .NET 1.x, we would use a call to&amp;nbsp;getElementsByName&amp;nbsp;on the&amp;nbsp;HTMLDocument. This returns an&amp;nbsp;IHTMLElementCollection. We can then get the reference to the&amp;nbsp;IHTMLInputElement&amp;nbsp;with the method&amp;nbsp;item(null,1).&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"C:\checkboxes.html");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HtmlElement hElement;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HtmlElementCollection hElements;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = WebBrowser.Document.GetElementById("mycheckBox");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.SetAttribute("checked", "true");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElements = WebBrowser.Document.All.GetElementsByName("myRadio");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = hElements[0];&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.SetAttribute("checked", "true");&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync("C:\checkboxes.html")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim hElement As HtmlElement&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = WebBrowser.Document.GetElementById("mycheckBox")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.SetAttribute("checked", "true")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = WebBrowser.Document.All.GetElementsByName("myRadio").Item(0)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.SetAttribute("checked", "true")&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;As before, we set the property of the&amp;nbsp;HtmlElement&amp;nbsp;using the&amp;nbsp;SetAttribute&amp;nbsp;method. In .NET 1.x, you need to set the&amp;nbsp;@checked&amp;nbsp;property on theIHTMLInputElement&lt;/p&gt;&lt;p&gt;To test the application, compile and run it from Visual Studio, then press the navigate button. You should see the check box and radio button toggle simultaneously.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.2.5&amp;nbsp; Buttons&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Submit buttons and standard buttons are generally used to submit forms in HTML. They form a crucial part in navigating any website.&lt;/p&gt;&lt;p&gt;Given a simple piece of HTML, which we save as Buttons.html as follows:&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;html&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;form action="http://www.google.com/search" method="get" name="myForm"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;input type="text" value=".NET" name="q"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;input type="submit" name="btnSubmit" value="Google Search"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;/form&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/html&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;We can get a reference to the button on the form by calling&amp;nbsp;getElementById&amp;nbsp;on the&amp;nbsp;HtmlDocument. In .NET 1.x this should be then cast to anIHTMLElement.&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"C:\buttons.html");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; HtmlElement hElement;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = WebBrowser.Document.GetElementById("btnSubmit");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.InvokeMember("click");&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync("C:\buttons.html")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim hElement As HtmlElement&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement = WebBrowser.Document.GetElementById("btnSubmit")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hElement.InvokeMember("click")&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;In the above example, we can see that after we get a reference to the button, we call the click method using&amp;nbsp;InvokeMember. Similarly, if we wanted to submit the form without clicking the button, we could get a reference to&amp;nbsp;myForm&amp;nbsp;and pass ?/span&amp;gt;submit? to the&amp;nbsp;InvokeMember&amp;nbsp;method.&lt;/p&gt;&lt;p&gt;In .NET 1.x, there is no&amp;nbsp;InvokeMember&amp;nbsp;method of&amp;nbsp;IHTMLElement, so therefore you must call the click method of theIHTMLElement. In the case of a form, you should cast the&amp;nbsp;IHTMLElement&amp;nbsp;to an&amp;nbsp;IHTMLFormElement&amp;nbsp;and call it&amp;nbsp;submit&amp;nbsp;method.&lt;/p&gt;&lt;p&gt;To test this application, compile and run it from Visual Studio .NET, and press the navigate button. The form should load and then automatically forward itself to a google.com search result page..&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.2.6&amp;nbsp; JavaScript&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Many web pages use JavaScript to perform complex interactions between the user and the page. It is important to know how to execute JavaScript functions from within Internet explorer. The simplest method is to use&amp;nbsp;Navigate&amp;nbsp;with the prefixjavascript:&amp;nbsp;then the function name. However, this does not give us a return value, nor will it work correctly in all situations.&lt;/p&gt;&lt;p&gt;We shall start with a HTML page, which contains a JavaScript function to display some text. This will be saved asJavaScript.html&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;html&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;span id="hiddenText" style="display:none"&amp;gt;This was displayed by javascript&amp;lt;/span&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; &amp;lt;script language="javascript"&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; function jsFunction()&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; {&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; window.document.all["hiddenText"].style.display="block";&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp; return "ok";&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp; }&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;lt;/script&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/html&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;We can then use the&amp;nbsp;Document.InvokeScript&amp;nbsp;method to execute the JavaScript thus:&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"C:\javascript.html");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; string strRetVal = "";&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; strRetVal = (string)WebBrowser.Document.InvokeScript("jsFunction");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show(strRetVal);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync("C:\javascript.html")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim strRetVal As String&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; strRetVal = WebBrowser.Document.InvokeScript("jsFunction").ToString()&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show(strRetVal)&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;In .NET 1.x, we would call the&amp;nbsp;parentWindow.execScript&amp;nbsp;method on the&amp;nbsp;HTMLDocument. Not forgetting to add empty parenthesis after the JavaScript function name. Unfortunately&amp;nbsp;execScript&amp;nbsp;returns&amp;nbsp;null&amp;nbsp;instead of the JavaScript return value.&lt;/p&gt;&lt;p&gt;To test the application, compile and run it from Visual Studio .NET, then press the Navigate button.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.3&amp;nbsp;&amp;nbsp; Extracting data from web pages&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;In order to extract HTML from a web page using Internet Explorer, you need to call&amp;nbsp;Body.Parent.OuterHtml&amp;nbsp;in .NET 2.0 orbody.parentElement.outerHTML&amp;nbsp;in .NET 1.x. You should be aware that the HTML returned by this method is different to the actual HTML content of the page.&lt;/p&gt;&lt;p&gt;Internet Explorer will correct HTML in the page by adding &amp;lt;BODY&amp;gt;, &amp;lt;TBODY&amp;gt; and &amp;lt;HEAD&amp;gt; tags where missing. It will also capitalize existing HTML Tags, and make other formatting changes that you should be aware of.&lt;/p&gt;&lt;p&gt;Techniques for parsing this textual data are explained later in the book under the section concerning Regular Expressions.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.4&amp;nbsp;&amp;nbsp; Advanced user interaction&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;When designing an application which uses Internet Explorer as a tool for data mining, it comes of added benefit, that the user can interact with the control in a natural fashion, in order to manipulate its behavior. The following sections describe ways in which a user can interact with Internet Explorer, and how these events can be handled within .NET&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.4.1&amp;nbsp; Design mode&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;If you wanted to provide the user with the ability to manipulate web pages on-the-fly, there is no simpler way to do it, than using the in-built design mode?in internet explorer. This particular feature is not supported with the managed .NET 2.0WebBrowser&amp;nbsp;control. However, it is possible to access the unmanaged interfaces, which we were using in .NET 1.x through the&amp;nbsp;Document.DomDocument&amp;nbsp;property. This can be then cast to the&amp;nbsp;HTMLDocument&amp;nbsp;in the&amp;nbsp;mshtml&amp;nbsp;library (Not to be confused with the managed&amp;nbsp;HtmlDocument&amp;nbsp;class). Therefore, in the case, you will need to add a reference to the&amp;nbsp;mshtml&amp;nbsp;library and add a&amp;nbsp;using mshtml? statement to the top of your code.&lt;/p&gt;&lt;p&gt;In this example, we will create a simple rich text editor based on Internet Explorer design mode. Within design mode the user can perform a wide variety of tasks using intuitive actions, for example, you can insert an image by right clicking on the browser, or convert text to bold by pressing CTRL+B. Many of these tasks can be further automated using the&amp;nbsp;execCommandmethod of the&amp;nbsp;HTMLDocument&amp;nbsp;object. In the following example, we will demonstrate how to set fonts using this method.&lt;/p&gt;&lt;p&gt;Open a new project in Visual Studio .NET, and drag a&amp;nbsp;WebBrowser&amp;nbsp;control onto the form, followed by a button, namedbtnFont. Also Add a&amp;nbsp;FontDialog&amp;nbsp;control named&amp;nbsp;fontDialog. Click on the form and type the following code for the form load event.&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void Form1_Load(object sender, EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; string url = "about:blank";&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; webBrowser.Navigate(url);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Application.DoEvents();&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; HTMLDocument hDoc = (HTMLDocument)webBrowser.Document.DomDocument;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc.designMode = "On";&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub Form1_Load(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles MyBase.Load&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim url As String = "about:blank"&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; webBrowser.Navigate(url)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Application.DoEvents()&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim hDoc As HTMLDocument = webBrowser.Document.DomDocument&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc.designMode = "On"&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;In .NET 1.x, we would cast the Document to an&amp;nbsp;HTMLDocument, rather than referencing the&amp;nbsp;DomDocument&amp;nbsp;property, and also, the&amp;nbsp;Navigate&amp;nbsp;method would be as described in section 5.1.&lt;/p&gt;&lt;p&gt;Now click on the font button and enter some code as follows&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnFont_Click(object sender, EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fontDialog.ShowDialog();&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; HTMLDocument hDoc = (HTMLDocument)webBrowser.Document.DomDocument;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; IHTMLTxtRange selection = (IHTMLTxtRange)hDoc.selection.createRange();&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc.execCommand("FontName", false, fontDialog.Font.FontFamily.Name);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc.execCommand("FontSize", false, fontDialog.Font.Size);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selection.select();&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Private Sub btnFont_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnFont.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fontDialog.ShowDialog()&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim hDoc As HTMLDocument = webBrowser.Document.DomDocument&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim selection As IHTMLTxtRange = hDoc.selection.createRange()&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc.execCommand("FontName", False, fontDialog.Font.FontFamily.Name)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc.execCommand("FontSize", False, fontDialog.Font.Size)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selection.select()&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;From the above code we can see that we get a reference to the text currently highlighted by the user using theselection.createRange&amp;nbsp;method. We then execute two commands on this selection,&amp;nbsp;FontName&amp;nbsp;and&amp;nbsp;FontSize. Other commands that could be used would be&amp;nbsp;ForeColor&amp;nbsp;Italic,&amp;nbsp;Bold&amp;nbsp;and so forth.&lt;/p&gt;&lt;p&gt;To test the application, compile and run it from Visual Studio .NET, enter some text into the space provided, then highlight it. Click on the font button and choose a new font and size. The text should change to the selected font.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.4.2&amp;nbsp; Capturing Post data&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;When a user is navigating through web pages, it may be necessary to keep track of what URLs they are going to, and post data sent between Internet Explorer and the web server. Although there are ways and means of doing this using packet sniffing, or third party tools, these do sometimes tend to listen to too much data and record traffic from other applications. Due to a bug in the .NET wrapper for Internet Explorer (see Microsoft Knowledge base 311298), the&amp;nbsp;beforeNaviate&amp;nbsp;event will not fire as you move between pages.&lt;/p&gt;&lt;p&gt;In order to subscribe to this event, we need to know a little about how COM events work under the hood? Every COM object which generates events will implement the&amp;nbsp;IConnetionPointContainer&amp;nbsp;interface. A client wishing to subscribe to events from this COM object must call the&amp;nbsp;FindConnectionPoint&amp;nbsp;method on this interface, passing the IID (Interface ID) of the required set of events.&lt;/p&gt;&lt;p&gt;Some COM objects support multiple sets of events or connection Points? for example, Internet Explorer supports theDWebBrowserEvents&amp;nbsp;connection point, and the&amp;nbsp;DWebBrowserEvents2&amp;nbsp;connection point. Herein lies the problem, the .NET wrapper will by default attach to the&amp;nbsp;DWebBrowserEvents2&amp;nbsp;connection point, which contains a version of&amp;nbsp;BeforeNavigatewhich is incompatible with .NET due to unsupported variant types.&lt;/p&gt;&lt;p&gt;If you open the ILDASM utility, then click file open, and select&amp;nbsp;Interop.SHDocVw.DLL.&lt;br /&gt;From the information in Figure 5.8 we can see that the Dispatch ID is set to 64 Hex (100 decimal). While using ILDASM we can also find the IID of the&amp;nbsp;DWebBrowserEvents&amp;nbsp;connection point by double clicking on class interface?that is,&amp;nbsp;eab22ac2-30c1-11cf-a7eb-0000c05bae0b. At this point we have everything we need to create an interface in C# for this event.&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;[Guid("eab22ac2-30c1-11cf-a7eb-0000c05bae0b"),&lt;/p&gt;&lt;p class="Code"&gt;InterfaceType(ComInterfaceType.InterfaceIsIDispatch)]&lt;/p&gt;&lt;p class="Code"&gt;public interface IWebBrowserEvents&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; [DispId(100)]&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; void RaiseBeforeNavigate(String url, int flags, String targetFrameName,&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ref Object postData, String headers, ref Boolean cancel);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;To put this all together, create a new project in Visual Studio .NET, and drop in a Web Browser control (not the .NET 2.0 version, but the COM version). Add a reference to&amp;nbsp;Microsoft Internet Controls&amp;nbsp;under COM references. You will need to include both&amp;nbsp;SHDocVw&amp;nbsp;and&amp;nbsp;System.Runtime.InteropServices&amp;nbsp;in the using list at the head of your code. Now add the code for the interface listed above.&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void Form1_Load(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; UCOMIConnectionPointContainer icpc;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; UCOMIConnectionPoint icp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int cookie = -1;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; icpc = (UCOMIConnectionPointContainer)axWebBrowser1.GetOcx();&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Guid g = typeof(DWebBrowserEvents).GUID;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; icpc.FindConnectionPoint(ref g, out icp);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; icp.Advise(this, out cookie);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;What this code does, is that it obtains a reference to Internet Explorer underlying&amp;nbsp;IConnectionPointContainer, by calling the&amp;nbsp;GetOcx&amp;nbsp;method, that axWebBrowser1 has inherited from&amp;nbsp;AxHost.&amp;nbsp; From this, we can then obtain a reference to the required connection point by passing its GUID / IID to theFindConnectionPoint&amp;nbsp;method. To subscribe to events, we call the Advise method. To unsubscribe we should call the&amp;nbsp;unAdvise&amp;nbsp;method, if required.&lt;/p&gt;&lt;p&gt;To handle the event, we shall simply pop up a message box immediately before the page navigates. We shall also display any post data being sent.&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public void RaiseBeforeNavigate(String url, int flags, String&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; targetFrameName, ref Object postData, String headers, ref Boolean cancel)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; string strPostData="";&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (postData!=null)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; strPostData= System.Text.Encoding.UTF8.GetString((byte[])postData);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show(strPostData);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;Since we have specified that our class should implement&amp;nbsp;IWebBrowserEvents, this dictates that the post data must be received as an object. This object should then be cast to a byte array, and then to a UTF8 string for readability.&lt;/p&gt;&lt;p&gt;To finish off the example, add a button to the form, and attach some code to it, to allow it to navigate to some website with a post-form on it, in this example, Amazon.com&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void button1_Click(object sender, EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; object o = null;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.axWebBrowser1.Navigate("http://www.amazon.com",ref o,ref o,ref o,ref o);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;To test the application, run it from Visual Studio .NET, press the navigate button, enter something in the Amazon search box, and press go. You should see a message box appearing, containing the post data which you sent to the web server.&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.4.3&amp;nbsp; Capturing click events&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Although you can capture events such as&amp;nbsp;DocumentComplete&amp;nbsp;to determine when a user navigates to a new page, it is a little trickier to trap events which do not involve page navigation, such as entering text into a text box for instance.&lt;/p&gt;&lt;p&gt;The event trapping technique differs substantially between .NET 1.x and .NET 2.0. In the latter version, you need to implement the default&amp;nbsp;COM interop method, this an entry point in your application which is marked as Dispatch ID 0, which COM uses to call back whenever your application subscribes to an event. In order to use COM interoperability, you need to include a&amp;nbsp;using System.Runtime.InteropServices&amp;nbsp;statement at the top of your code, in .NET 1.x.&lt;/p&gt;&lt;p&gt;In .NET 2.0, it is&amp;nbsp;a little more straightforward. Here, we attach an&amp;nbsp;HtmlElementEventHandler&amp;nbsp;delegate to the&amp;nbsp;Document.Clickevent, and implement it in our own event handler.&lt;/p&gt;&lt;p&gt;Basing this example on the sample code in section 5.1, we shall now add some extra event handling capabilities to pop up a message box whenever the user clicks a HTML element in the web browser.&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"http://www.google.com");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WebBrowser.Document.Click += new HtmlElementEventHandler(Document_Click);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;C# 1.x&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, System.EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NavigateToUrlSync(@"http://www.google.com");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HTMLDocument hDoc = (HTMLDocument)WebBrowser.Document;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; hDoc.onclick = this;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;At this point we have now subscribed to the click event, and in the case of .NET 2.0, supplied a call back delegate namedDocument_Click. For demonstration purposes, we shall simply display the tag name of the element clicked, and the event type (which should always be click?in our case).&lt;/p&gt;&lt;p&gt;C# 2.0&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public void Document_Click(object sender, HtmlElementEventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; string sTag = WebBrowser.Document.GetElementFromPoint(e.MousePosition).TagName;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show("Object: " + sTag + ", type:" + e.EventType);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;C# 1.x&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;[DispId(0)]&lt;/p&gt;&lt;p class="Code"&gt;public void DefaultMethod()&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HTMLDocument hDoc = (HTMLDocument)WebBrowser.Document;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HTMLWindow2 hWin = (HTMLWindow2)hDoc.parentWindow;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageBox.Show("Object: " + hWin.@event.srcElement.tagName +&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ", Type: " + hWin.@event.type);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;In order to get a reference to the element clicked, we have used a different technique for each version of .NET. In .NET 2.0, the&amp;nbsp;GetElementFromPoint&amp;nbsp;method is used to determine the element from the mouse location. In .NET 1.x, we can get the reference to the element via the&amp;nbsp;&lt;a href="mailto:Document.parentWindow.@event.srcElement"&gt;Document.parentWindow.@event.srcElement&lt;/a&gt;&amp;nbsp;property.&lt;/p&gt;&lt;p&gt;To test the application, compile and run it from Visual Studio .NET, press the navigate button, then click anywhere on the screen. You should see a message box appear with the tag name of the HTML element that you clicked on.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Extending Internet Explorer&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;The examples so far have dealt with embedding Internet Explorer in our applications, rather than embedding our applications in Internet Explorer. This may not be ideal for all users, as we loose the familiar interface that users are accustomed to. This section deals with how build applications around running instances of Internet Explorer.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.5.1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Menu extensions&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;When you right click on a web page, you can see a context menu, which you can extend with a simple registry tweak. In this example, you can add a link to 揝end to a friend?in the context menu, which will link to a website that allows you to send emails. Firstly create the following registry key:&lt;/p&gt;&lt;p&gt;HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\MenuExt\Send to a friend&lt;/p&gt;&lt;p&gt;Then set the default value to a location on your hard drive, say&amp;nbsp;c:\SendToAFriend.html, which would contain the following HTML:&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;&amp;lt;script language="JavaScript"&amp;gt;&lt;/p&gt;&lt;p class="Code"&gt;window.open("http://www.pop3webmail.info/reply.aspx?url=" + external.menuArguments.document.URL);&lt;/p&gt;&lt;p class="Code"&gt;&amp;lt;/script&amp;gt;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;After making the change to the registry, close all browser windows.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.5.2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Spawning a new instance of Internet Explorer&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;A simple way of controlling instances of Internet Explorer is to create them yourself using COM. In this example I use COM late binding, which differs from the early-bound examples used earlier in this chapter, specifically in the .NET 1.x examples. Early bound COM objects are compiled into the application at design time. Late bound COM objects are loaded dynamically at run time.&lt;/p&gt;&lt;p&gt;The benefit of early bound objects is that the development environment will be aware of the object model of the component, and&amp;nbsp;Intellisense&amp;nbsp;will assist you determine which methods you can call. We do not have such a luxury with late bound objects. However, there is an advantage that we can bind to COM objects hosted as executables, such as in the following example.&lt;/p&gt;&lt;p&gt;To start off, create a new windows forms application in Visual Studio .NET, drop a button on the form, and attach the following code to it.&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public Type tIE;&lt;/p&gt;&lt;p class="Code"&gt;public object oIE;&lt;/p&gt;&lt;p class="Code"&gt;private void btnNavigate_Click(object sender, EventArgs e)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; object[] oParameter = new object[1];&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; tIE = Type.GetTypeFromProgID("InternetExplorer.Application");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; oIE = Activator.CreateInstance(tIE);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; oParameter[0] = (bool)true;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; tIE.InvokeMember("Visible", BindingFlags.SetProperty, null, oIE, oParameter);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; oParameter[0] = (string)"http://www.google.com";&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; tIE.InvokeMember("Navigate2", BindingFlags.InvokeMethod,&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; null, oIE, oParameter);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;VB.NET&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;Public tIE As Type&lt;/p&gt;&lt;p class="Code"&gt;Public oIE As Object&lt;/p&gt;&lt;p class="Code"&gt;Private Sub btnNavigate_Click(ByVal sender As System.Object, _&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; ByVal e As System.EventArgs) Handles btnNavigate.Click&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim oParameter(0) As Object&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; tIE = Type.GetTypeFromProgID("InternetExplorer.Application")&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; oIE = Activator.CreateInstance(tIE)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; oParameter(0) = CType(True, Boolean)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; tIE.InvokeMember("Visible", BindingFlags.SetProperty, Nothing, oIE, oParameter)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; oParameter(0) = CType("http://www.google.com", String)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; tIE.InvokeMember("Navigate2", BindingFlags.InvokeMethod,&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; Nothing, oIE, oParameter)&lt;/p&gt;&lt;p class="Code"&gt;End Sub&lt;/p&gt;&lt;/div&gt;&lt;p&gt;You should also add references to&amp;nbsp;System.Threading&amp;nbsp;and&amp;nbsp;System.Reflection&amp;nbsp;at the top of your code.&lt;/p&gt;&lt;p&gt;The above code retrieves a reference to the COM object model for the Internet Explorer application by inspecting the&amp;nbsp;ProgID?/span&amp;gt;InternetExplorer.Application? It then creates an instance of this COM object. It sets its&amp;nbsp;Visible&amp;nbsp;property to true, then calls the&amp;nbsp;Navigate2&amp;nbsp;method, passing the URL&amp;nbsp;&lt;a href="http://www.google.com/"&gt;www.google.com&lt;/a&gt;&amp;nbsp;as a parameter.&lt;/p&gt;&lt;p&gt;Unfortunately it is not trivial to subscribe to events from this late bound object, so therefore, if it is necessary to detect navigation between pages, it may be necessary to poll on the&amp;nbsp;LocationURL&amp;nbsp;property of the browser.&lt;/p&gt;&lt;p&gt;To test this application, compile and run it from Visual Studio .NET, then press the button on the form. You should see a new browser window open.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.5.3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Browser Helper Objects&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;When you need to get really tight integration with Internet Explorer, in cases where you want code to execute completely transparently to the user, and yet have full control of the browsers document model and be able to subscribe to events, Browser Helper Objects (BHO) is the way to go.&lt;/p&gt;&lt;p&gt;BHO technology is widely associated with&amp;nbsp;Spyware&amp;nbsp;applications, which silently run in the background, as a user is browsing websites. Since the BHO would have access to the&amp;nbsp;HTMLDocument&amp;nbsp;object of the Internet Explorer instance hosting it, it would be possible to read the text of the webpage being visited, and duly display context-sensitive advertisements.&lt;/p&gt;&lt;p&gt;Internet Explorer expects the BHO object to be COM based, not a .NET assembly. Therefore it is necessary to create a CCW (Com Callable Wrapper) for our assembly. This CCW has a unique Class ID, which we store in the registry at the following location:&lt;/p&gt;&lt;p&gt;HKLM\Software\Microsoft\Windows\CurrentVersion\Explorer\&lt;br /&gt;Browser Helper Objects&lt;/p&gt;&lt;p&gt;When Internet Explorer (or Windows Explorer) starts, it reads all the Class ID抯 listed in at the registry location listed above, and creates instances of their respective COM objects, and in our case, the underlying .NET assembly. It then interrogates the COM object to ensure that it implements the&amp;nbsp;IObjectWithSite&amp;nbsp;interface. This interface is very strictly defined and implemented as follows:&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;using System;&lt;/p&gt;&lt;p class="Code"&gt;using System.Runtime.InteropServices;&lt;/p&gt;&lt;p class="Code"&gt;namespace BrowserHelperObject&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; [ComVisible(true),&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; InterfaceType(ComInterfaceType.InterfaceIsIUnknown),&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Guid("FC4801A3-2BA9-11CF-A229-00AA003D7352")]&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public interface IObjectWithSite&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; [PreserveSig]&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int SetSite([MarshalAs(UnmanagedType.IUnknown)]object site);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; [PreserveSig]&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int GetSite(ref Guid guid, out IntPtr ppvSite);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;Internet Explorer uses the two methods listed above to interact with the BHO. The&amp;nbsp;SetSite&amp;nbsp;method is called by Internet Explorer whenever it starts up, or shuts down. This is to update the BHO with the status of any internal references it may hold to the instance of Internet Explorer which is hosting it.&amp;nbsp;GetSite&amp;nbsp;may be called by Internet Explorer to query the reference a BHO holds to it. Every BHO must implement both of these methods, and handle requests to and from the hosting instance correctly.&lt;/p&gt;&lt;p&gt;To demonstrate Browser Helper Objects, we shall go though a simple example, where we attach a BHO to Internet Explorer, which will append the current date to every page visited by the user.&lt;/p&gt;&lt;p&gt;Start a new class library project in Visual Studio, add a reference to the&amp;nbsp;Microsoft.mstml&amp;nbsp;.NET assembly, and also to the COM object named Microsoft Internet Controls? Add a new class file containing the definition of&amp;nbsp;IObjectWithSite&amp;nbsp;as listed above. Then you can create the skeleton of your BHO thus:&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;using System;&lt;/p&gt;&lt;p class="Code"&gt;using System.Runtime.InteropServices;&lt;/p&gt;&lt;p class="Code"&gt;using SHDocVw;&lt;/p&gt;&lt;p class="Code"&gt;using Microsoft.Win32;&lt;/p&gt;&lt;p class="Code"&gt;using mshtml;&lt;/p&gt;&lt;p class="Code"&gt;namespace BrowserHelperObject&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; [ComVisible(true),&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Guid("F839CC51-A6D8-4e9c-ACE5-F05071AD0C74"),&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ClassInterface(ClassInterfaceType.None)]&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public class DateStamp : IObjectWithSite&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WebBrowser webBrowser;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;What you can see from the code above is that the class implements the&amp;nbsp;IObjectWithSite&amp;nbsp;interface, which is a pre-requisite of any BHO. It also has a GUID (Genuinely Unique Identifier), - this is used to uniquely identify the CCW, and can be chosen arbitrarily, using the&amp;nbsp;GuidGen.exe&amp;nbsp;tool or similar. The&amp;nbsp;WebBrowser&amp;nbsp;class in the code does not refer to the familiarWebBrowser&amp;nbsp;class as used in .NET 2.0, but instead is a class defined within&amp;nbsp;SHDocVw. It is this object which will contain a reference to the hosting instance of Internet Explorer.&lt;/p&gt;&lt;p&gt;As mentioned previously, it is necessary for every BHO to implement both the&amp;nbsp;GetSite&amp;nbsp;and&amp;nbsp;SetSite&amp;nbsp;methods. In most cases, there is little need to perform any custom actions within&amp;nbsp;GetSite, so therefore its implementation would remain standard for most types of BHO. A typical implementation would be as follows:&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public int GetSite(ref Guid guid, out IntPtr ppvSite)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; IntPtr punk = Marshal.GetIUnknownForObject(webBrowser);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int hr = Marshal.QueryInterface(punk, ref guid, out ppvSite);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Marshal.Release(punk);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return hr;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;What this code does, is that it firstly obtains a pointer to the&amp;nbsp;IUnknown&amp;nbsp;COM interface for our reference to the hosting instance of Internet Explorer. It then queries the&amp;nbsp;IUnknown&amp;nbsp;interface with a GUID issued internally by Internet Explorer. This returns a pointer to another object, as required by Internet Explorer. The code then frees the resources associated with theIUnknown&amp;nbsp;pointer, and returns a HRESULT in the event that an error occurred whilst trying to query the interface.&lt;/p&gt;&lt;p&gt;What is of more interest is the&amp;nbsp;SetSite&amp;nbsp;method. This is where we have the opportunity to attach custom event handlers to the hosting web browser. In this case, we attach the&amp;nbsp;DocumentComplete&amp;nbsp;event handler.&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public int SetSite(object site)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (site != null)&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; webBrowser = (WebBrowser)site;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; webBrowser.DocumentComplete += new&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;DWebBrowserEvents2_DocumentCompleteEventHandler(&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;this.OnDocumentComplete);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; webBrowser.DocumentComplete -= new&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;DWebBrowserEvents2_DocumentCompleteEventHandler(&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;this.OnDocumentComplete);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; webBrowser = null;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return 0;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;As mentioned earlier, Internet Explorer may also call this method as it is shutting down, therefore, in which case the passed parameter is&amp;nbsp;null. It is required that we should detach event handlers and free any associated resources at the point when the host closes.&lt;/p&gt;&lt;p&gt;At this point we are in a position to add our own custom logic, which we place within the&amp;nbsp;OnDocumentComplete&amp;nbsp;function thus:&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;private void OnDocumentComplete(object frame, ref object urlObj)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (webBrowser != frame)&amp;nbsp;&amp;nbsp; return;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; IHTMLDocument2 document = (IHTMLDocument2)webBrowser.Document;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; document.body.innerHTML = document.body.innerHTML +&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DateTime.Now.ToShortDateString();&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;In the above code, we retrieve a reference to the&amp;nbsp;HTMLDocument&amp;nbsp;contained within the hosting Internet Explorer instance, and simply add the current date to the HTML of the page.&lt;/p&gt;&lt;p&gt;Before we are ready to try out our new BHO, we should add some extra plumbing to enable the assembly to store the Class ID of it CCW in the registry with the other Browser Helper Objects installed on the system.&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;public static string BHOKEYNAME = "Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\Browser Helper Objects";&lt;/p&gt;&lt;p class="Code"&gt;[ComRegisterFunction]&lt;/p&gt;&lt;p class="Code"&gt;public static void RegisterBHO(Type t)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RegistryKey key = Registry.LocalMachine.OpenSubKey(BHOKEYNAME, true);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (key == null) key = Registry.LocalMachine.CreateSubKey(BHOKEYNAME);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; string guidString = t.GUID.ToString("B");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RegistryKey bhoKey = key.OpenSubKey(guidString);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (bhoKey == null)&amp;nbsp;&amp;nbsp; bhoKey = key.CreateSubKey(guidString);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; key.Close();&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; bhoKey.Close();&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;The above code is called whenever we create a CCW from the assembly. It inserts a new key containing the Class ID, at the registry location as specified.&lt;/p&gt;&lt;p&gt;Similarly, as we un-register the CCW, we will want to remove that key from the registry. This would be implemented thus:&lt;/p&gt;&lt;p&gt;C#&lt;/p&gt;&lt;div&gt;&lt;p class="Code"&gt;[ComUnregisterFunction]&lt;/p&gt;&lt;p class="Code"&gt;public static void UnregisterBHO(Type t)&lt;/p&gt;&lt;p class="Code"&gt;{&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RegistryKey key = Registry.LocalMachine.OpenSubKey(BHOKEYNAME, true);&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; string guidString = t.GUID.ToString("B");&lt;/p&gt;&lt;p class="Code"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (key != null) key.DeleteSubKey(guidString, false);&lt;/p&gt;&lt;p class="Code"&gt;}&lt;/p&gt;&lt;/div&gt;&lt;p&gt;You will find that whilst developing a BHO, you may need to recompile and test the code several times to perfect your application. Every time that you attach a BHO to Internet Explorer, it will also attach itself to Windows Explorer, and the assembly will be locked for the duration of the lifetime of these two processes. When the assembly is locked you will not be able to delete it, or modify it by building a new version of the BHO over it.&lt;/p&gt;&lt;p&gt;To unlock the BHO, you will need to un-register it using&amp;nbsp;regasm /unregister&amp;nbsp;then stop all&amp;nbsp;iexplore.exe&amp;nbsp;and&amp;nbsp;explore.exe&amp;nbsp;processes, through either task manager, or by logging off and logging back in again.&lt;/p&gt;&lt;p&gt;To test the above application, compile the above code, then open up the Visual Studio .NET command prompt and navigate to the folder that contains the output DLL. then run the command&lt;/p&gt;&lt;p&gt;Regasm /codebase browserHelperObject.dll&lt;/p&gt;&lt;p&gt;Now open up an Internet Explorer window and you should see the date written at the bottom of the page.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div&gt;&lt;p class="Note"&gt;If you receive the following warning ?do not panic, as long the GUID you used in your assembly is unique, it will not cause a problem&lt;/p&gt;&lt;p class="Note"&gt;RegAsm warning: Registering an unsigned assembly with /codebase can cause your assembly to interfere with other applications that may be installed on the same computer. The /codebase switch is intended to be used only with signed assemblies. Please give your assembly a strong name and re-register it.&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;5.6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Conclusion&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;This chapter has demonstrated how to control Internet Explorer from within a .NET application. It should pave the way for automating data mining processes using this versatile component.&lt;/p&gt;&lt;p&gt;With the added benefit of enabling a user to interact with web pages in a natural fashion, and being able to trap events from within Internet Explorer, it should be possible to implement data mining training tools, and website test automation utilities with the examples shown in this chapter.&amp;nbsp;&lt;/p&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2467770.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2012/04/24/2467770.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/11/30/2268760.html</id><title type="text">Safe3 WEB安全网关 v3.9.1</title><summary type="text">Safe3WAF是国内第一款免费的Linux轻量级的反向代理Web安全网关，采用类似nginx的占有内存少、高并发架构。作为网页服务器的前置，不但可以抵御各种黑客攻击，还可以高速内存cache服务器缓存相关请求来提高Web服务器的速度，并且具备网站集群负载均衡等功能。目前中国大陆使用Safe3WAF网站有游久网、联合早报等。主要功能:1.拦截GET sql注入2.拦截POST sql注入3.拦截Cookie sql注入4.拦截XSS跨站攻击5.拦截web溢出攻击6.拦截网站信息泄露攻击7.拦截非法http请求方法攻击8.web负载均衡功能9.拦截上传web后门功能10.拦截webserver漏</summary><published>2011-11-30T02:53:00Z</published><updated>2011-11-30T02:53:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/11/30/2268760.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/11/30/2268760.html"/><content type="html">&lt;div&gt;&lt;div&gt;Safe3WAF是国内第一款免费的Linux轻量级的反向代理Web安全网关，采用类似nginx的占有内存少、高并发架构。作为网页服务器的前置，不但可以抵御各种黑客攻击，还可以高速内存cache服务器缓存相关请求来提高Web服务器的速度，并且具备网站集群负载均衡等功能。目前中国大陆使用Safe3WAF网站有游久网、联合早报等。&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;主要功能:&lt;/div&gt;&lt;div&gt;1.拦截GET sql注入&lt;/div&gt;&lt;div&gt;2.拦截POST sql注入&lt;/div&gt;&lt;div&gt;3.拦截Cookie sql注入&lt;/div&gt;&lt;div&gt;4.拦截XSS跨站攻击&lt;/div&gt;&lt;div&gt;5.拦截web溢出攻击&lt;/div&gt;&lt;div&gt;6.拦截网站信息泄露攻击&lt;/div&gt;&lt;div&gt;7.拦截非法http请求方法攻击&lt;/div&gt;&lt;div&gt;8.web负载均衡功能&lt;/div&gt;&lt;div&gt;9.拦截上传web后门功能&lt;/div&gt;&lt;div&gt;10.拦截webserver漏洞&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;黑客攻击日志记录： /usr/local/safe3waf/log/attack.log&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div&gt;&lt;div&gt;v3.9.1更新说明&lt;/div&gt;&lt;div&gt;1.新增自定义拦截跳转url&lt;/div&gt;&lt;div&gt;2.新增拦截IIS脚本解析执行漏洞&lt;/div&gt;&lt;div&gt;3.新增svn、htaccess、mdb等信息泄露拦截&lt;/div&gt;&lt;/div&gt;&lt;p&gt;4.新增struts2框架XSLTResult本地文件代码执行漏洞拦截&amp;nbsp;&lt;/p&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2268760.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2011/11/30/2268760.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/10/19/2217965.html</id><title type="text">scrapy结合webkit抓取js生成的页面</title><summary type="text">1 scedulescrapy 作为抓取框架，包括了spider,pipeline基础设施2 webkitscrapy 本身不能作为js engine,这就导致很多js生成的页面的数据会无法抓取到，因此，一些通用做法是webkit或者xmi_runner(firefox)。通过这个手段可以对于js生成的数据进行抓取。需要安装的包有python-webkit (相关依赖自行解决)Xvfb (用于非Xwindow环境)3 开发downloader middlewarefrom scrapy.http import Request, FormRequest, HtmlResponseimport .</summary><published>2011-10-19T10:34:00Z</published><updated>2011-10-19T10:34:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/10/19/2217965.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/10/19/2217965.html"/><content type="html">&lt;span style="widows: 2; text-transform: none; background-color: rgb(255,255,255); text-indent: 0px; font: 300 15px/24px 'Helvetica Neue', Helvetica, Arial, sans-serif; white-space: normal; orphans: 2; letter-spacing: normal; color: rgb(55,55,55); word-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px" class="Apple-style-span"&gt; &lt;p&gt;1 scedule&lt;/p&gt;&lt;p&gt;scrapy 作为抓取框架，包括了spider,pipeline基础设施&lt;/p&gt;&lt;p&gt;2 webkit&lt;/p&gt;&lt;p&gt;scrapy 本身不能作为js engine,这就导致很多js生成的页面的数据会无法抓取到，因此，一些通用做法是webkit或者xmi_runner(firefox)。通过这个手段可以对于js生成的数据进行抓取。需要安装的包有&lt;/p&gt;&lt;p&gt;python-webkit (相关依赖自行解决)&lt;/p&gt;&lt;p&gt;Xvfb (用于非Xwindow环境)&lt;/p&gt;&lt;p&gt;3 开发downloader middleware&lt;/p&gt;&lt;div style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; font-size: 15px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px" dir="ltr"&gt;&lt;div style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; font-size: 15px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;&lt;div style="border-bottom: silver 1px solid; border-left: silver 1px solid; padding-bottom: 0px; overflow-x: auto; overflow-y: hidden; background-color: rgb(249,249,249); margin: 0px 0px 1.5em; padding-left: 0px; outline-width: 0px; width: 584px; padding-right: 0px; font-family: inherit; color: rgb(17,0,0); font-size: 15px; vertical-align: baseline; border-top: silver 1px solid; border-right: silver 1px solid; padding-top: 0px" class="wp_syntax"&gt;&lt;div style="padding-bottom: 2px; border-right-width: 0px; margin: 0px; padding-left: 4px; outline-width: 0px; padding-right: 4px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; font-size: 15px; vertical-align: top; border-left-width: 0px; padding-top: 2px" class="code"&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;from&lt;/span&gt; scrapy.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;http&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;import&lt;/span&gt; Request, FormRequest, HtmlResponse&lt;br/&gt;&amp;nbsp;&lt;br/&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;import&lt;/span&gt; gtk&lt;br/&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;import&lt;/span&gt; webkit&lt;br/&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;import&lt;/span&gt; jswebkit&lt;br/&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;import&lt;/span&gt; settings&lt;br/&gt;&amp;nbsp;&lt;br/&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;class&lt;/span&gt; WebkitDownloader&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(0,128,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;object&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;:&lt;br/&gt;    &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;def&lt;/span&gt; process_request&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(0,128,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;self&lt;/span&gt;, request, spider &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;:&lt;br/&gt;        &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;if&lt;/span&gt; spider.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;name&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;in&lt;/span&gt; settings.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;WEBKIT_DOWNLOADER&lt;/span&gt;:&lt;br/&gt;            &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;if&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(0,128,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;type&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt;request&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;is&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;not&lt;/span&gt; FormRequest &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;:&lt;br/&gt;                webview = webkit.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;WebView&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;&lt;br/&gt;                webview.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;connect&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(72,61,139); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;'load-finished'&lt;/span&gt;, &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;lambda&lt;/span&gt; v,f: gtk.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;main_quit&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;&lt;br/&gt;                webview.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;load_uri&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; request.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;url&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;&lt;br/&gt;                gtk.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;main&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;&lt;br/&gt;                js = jswebkit.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;JSContext&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; webview.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;get_main_frame&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;get_global_context&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;&lt;br/&gt;                renderedBody = &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(0,128,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;str&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; js.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;EvaluateScript&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(72,61,139); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;'document.body.innerHTML'&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;&lt;br/&gt;                &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;return&lt;/span&gt; HtmlResponse&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;(&lt;/span&gt; request.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;url&lt;/span&gt;, body=renderedBody &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;)&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;4 配置&lt;/p&gt;&lt;p&gt;在scrapy的settings.py中加入：&lt;/p&gt;&lt;div style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; font-size: 15px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px" dir="ltr"&gt;&lt;div style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; font-size: 15px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;&lt;div style="border-bottom: silver 1px solid; border-left: silver 1px solid; padding-bottom: 0px; overflow-x: auto; overflow-y: hidden; background-color: rgb(249,249,249); margin: 0px 0px 1.5em; padding-left: 0px; outline-width: 0px; width: 584px; padding-right: 0px; font-family: inherit; color: rgb(17,0,0); font-size: 15px; vertical-align: baseline; border-top: silver 1px solid; border-right: silver 1px solid; padding-top: 0px" class="wp_syntax"&gt;&lt;div style="padding-bottom: 2px; border-right-width: 0px; margin: 0px; padding-left: 4px; outline-width: 0px; padding-right: 4px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; font-size: 15px; vertical-align: top; border-left-width: 0px; padding-top: 2px" class="code"&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; font-style: italic; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(128,128,128); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;#which spider should use WEBKIT&lt;/span&gt;&lt;br/&gt;WEBKIT_DOWNLOADER=&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;[&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(72,61,139); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;'ccb'&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;]&lt;/span&gt;&lt;br/&gt;&amp;nbsp;&lt;br/&gt;DOWNLOADER_MIDDLEWARES = &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;{&lt;/span&gt;&lt;br/&gt;    &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(72,61,139); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;'rate_crawler.dowloader.WebkitDownloader'&lt;/span&gt;: &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,69,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;543&lt;/span&gt;,&lt;br/&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;}&lt;/span&gt;   &lt;br/&gt;&amp;nbsp;&lt;br/&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(255,119,0); font-size: 12px; vertical-align: baseline; border-left-width: 0px; font-weight: bold; padding-top: 0px"&gt;import&lt;/span&gt; &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(220,20,60); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;os&lt;/span&gt;&lt;br/&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(220,20,60); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;os&lt;/span&gt;.&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;environ&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;[&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(72,61,139); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;"DISPLAY"&lt;/span&gt;&lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: black; font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;]&lt;/span&gt; = &lt;span style="padding-bottom: 0px; border-right-width: 0px; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; font-family: inherit; border-top-width: 0px; border-bottom-width: 0px; color: rgb(72,61,139); font-size: 12px; vertical-align: baseline; border-left-width: 0px; padding-top: 0px"&gt;":0"&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;5 使用&lt;/p&gt;&lt;p&gt;启动 Xvfb (假设DISPLAY=:0)&lt;/p&gt;&lt;p&gt;要与settings.py中的DISPLAY对应（本例中是:0)。&lt;/p&gt;&lt;p&gt;scrapy crawl xxx&lt;/p&gt;&lt;/span&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2217965.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2011/10/19/2217965.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/10/19/2217961.html</id><title type="text">快速构建实时抓取集群</title><summary type="text">定义：首先，我们定义一下定向抓取，定向抓取是一种特定的抓取需求，目标站点是已知的，站点的页面是已知的。本文的介绍里面，主要是侧重于如何快速构建一个实时的抓取系统，并不包含通用意义上的比如链接分析，站点发现等等特性。在本文提到的实例系统里面，主要用到linux+mysql+redis+django+scrapy+webkit，其中scrapy+webkit作为抓取端，redis作为链接库存储，mysql作为网页信息存储，django作为爬虫管理界面，快速实现分布式抓取系统的原型。名词解析：1. 抓取环：抓取环指的是spider在存储中获取url，从互联网上下载网页，然后将网页存储到数据库里面，.</summary><published>2011-10-19T10:31:00Z</published><updated>2011-10-19T10:31:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/10/19/2217961.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/10/19/2217961.html"/><content type="html">&lt;span style="widows: 2; text-transform: none; background-color: rgb(255,255,255); text-indent: 0px; font: 12px/22px 微软雅黑; white-space: normal; orphans: 2; letter-spacing: normal; color: rgb(1,1,1); word-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px" class="Apple-style-span"&gt; &lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; color: rgb(1,1,1); padding-top: 0px"&gt;定义：&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; color: rgb(1,1,1); padding-top: 0px"&gt;首先，我们定义一下定向抓取，定向抓取是一种特定的抓取需求，目标站点是已知的，站点的页面是已知的。本文的介绍里面，主要是侧重于如何快速构建一个实时的抓取系统，并不包含通用意义上的比如链接分析，站点发现等等特性。&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; color: rgb(1,1,1); padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; color: rgb(1,1,1); padding-top: 0px"&gt;在本文提到的实例系统里面，主要用到linux+mysql+redis+django+scrapy+webkit，其中scrapy+webkit作为抓取端，redis作为链接库存储，mysql作为网页信息存储，django作为爬虫管理界面，快速实现分布式抓取系统的原型。&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; color: rgb(1,1,1); padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;名词解析：&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; color: rgb(1,1,1); padding-top: 0px"&gt;1. &amp;nbsp;抓取环：抓取环指的是spider在存储中获取url，从互联网上下载网页，然后将网页存储到数据库里面，最后在从存储里面获取下一个URL的一个流程。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;2. &amp;nbsp;Linkbase：链接库的存储模块，包含一般的链接信息；是抓取系统的核心，使用redis存储。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;3. &amp;nbsp;&lt;a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; color: rgb(13,155,193); text-decoration: none; padding-top: 0px" href="http://www.w3school.com.cn/xpath/index.asp"&gt;XPATH&lt;/a&gt;：一门在 XML 文档中查找信息的语言，XPath 可用来在 XML 文档中对元素和属性进行遍历， 是 W3C XSLT 标准的主要元素。使用XPATH以及相关工具lib进行链接抽取和信息抽取。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;4. &amp;nbsp;XPathOnClick：一个chrome的插件，支持点击页面元素，获取XPATH路径，用于编辑配置模板。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;5. &amp;nbsp;&lt;a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; color: rgb(13,155,193); text-decoration: none; padding-top: 0px" href="http://redis.io/"&gt;Redis&lt;/a&gt;：一个开源的KV的内存数据库，具备很好的数据结构的特征和很高的存取性能。用于存储linkbase信息&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;6. &amp;nbsp;&lt;a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; color: rgb(13,155,193); text-decoration: none; padding-top: 0px" href="https://www.djangoproject.com/"&gt;Django&lt;/a&gt;：爬虫管理工具，用于模板配置，系统监控反馈。Django在这里主要是用来管理一个数据库，使用Admin功能。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;7. &amp;nbsp; Pagebase：页面库，主要是存储网页抓取的结果，以及页面抽取的结果，和dump交互，使用mysql实现。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;8. &amp;nbsp; &amp;nbsp;&lt;a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; color: rgb(13,155,193); text-decoration: none; padding-top: 0px" href="http://scrapy.org/"&gt;Scrapy&lt;/a&gt;：一个开源的机遇twisted框架的python的单机爬虫，该爬虫实际上包含大多数网页抓取的工具包，用于爬虫下载端以及抽取端。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;9.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 列表页：指的商品页面之外的所有页面&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;10. &amp;nbsp; &amp;nbsp;详情页：比如商品B2C的抓取中，特指商品页面，比如：http://item.tmall.com/item.htm?id=10321272374&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; color: rgb(1,1,1); padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;系统架构&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;一：存储 redis+mysql&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;链接库（linkbase）是抓取系统的核心，基于性能和效率的考虑，本文采用基于内存的redis和磁盘的mysql为主，对于linkbase主要是存储抓取必须的链接信息，比如url，anchor，等等；对于mysql，则是存放抓取的网页，便于后续的抽取和处理。&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;a) &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;PageBase：使用Mysql分库分表，存放网页，如下图：&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;&lt;img style="border-bottom: rgb(221,221,221) 1px solid; text-align: center; border-left: rgb(221,221,221) 1px solid; padding-bottom: 5px; background-color: rgb(243,243,243); margin: 10px; padding-left: 5px; padding-right: 5px; border-top: rgb(221,221,221) 1px solid; border-right: rgb(221,221,221) 1px solid; padding-top: 4px; border-top-left-radius: 3px 3px; border-top-right-radius: 3px 3px; border-bottom-right-radius: 3px 3px; border-bottom-left-radius: 3px 3px" class="alignnone size-full wp-image-1501" alt="" src="http://www.searchtb.com/wp-content/uploads/2011/07/pagebase0.png" /&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;b)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Linkbase&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;使用Redis集群，存储linkbase信息。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; color: rgb(13,155,193); text-decoration: none; padding-top: 0px" href="http://www.searchtb.com/wp-content/uploads/2011/07/linkbase1.png"&gt;&lt;img style="border-bottom: rgb(221,221,221) 1px solid; text-align: center; border-left: rgb(221,221,221) 1px solid; padding-bottom: 5px; background-color: rgb(243,243,243); margin: 10px; padding-left: 5px; padding-right: 5px; border-top: rgb(221,221,221) 1px solid; border-right: rgb(221,221,221) 1px solid; padding-top: 4px; border-top-left-radius: 3px 3px; border-top-right-radius: 3px 3px; border-bottom-right-radius: 3px 3px; border-bottom-left-radius: 3px 3px" class="alignnone size-full wp-image-1502" alt="" src="http://www.searchtb.com/wp-content/uploads/2011/07/linkbase1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;几个基本的数据结构：&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;1：抓取队列 (candidate list)&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;分为待抓取的url队列和更新的url队列；队列存放urlhash，使用redis的list数据结构，对于新提取的url，push到对应的列表里面，对于spider抓取模块，从list pop得到。对于一个站点而言，抓取队列有两种类型：列表页抓取队列和详情页抓取队列。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;2：链接库 (linkbase)&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;链接库实际上是存储链接信息的DB；Key是urlhash，Value是linkinfo，包含url，purl，anchor，xpath。。。；在redis使用hash存储，直接存放在redis的里面。KV链接库，不区分页面类型。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;3：已抓取集合（crawled_set）&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;已抓取集合指的是当前已经下载的页面的urlhash，存放已经抓取的网页，使用redis的set实现，set的key是urlhash，score是时间戳，已抓取集合主要是用来记录哪一些页面已经抓取和抓取的时间，用于后续的更新页面调度以及抓取信息的统计。同抓取队列一样，每一个站点有两种类型的已抓取集合，详情页和列表页&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;二：调度模块：&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;调度模块是抓取系统的关键，调度系统的好坏决定了抓取系统的效率；这块是主要是在redis linkbase之上的数据结构，主要有抓取队列、抓取集合、抓取优先级等等数据结构组成；对于一个抓取循环来说：获取URL，提交到抓取模块的待抓取队列，启动抓取，抓取完成之后对新链接进行抽取，最后进入等待抓取的队列里面。&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;调度系统的基本配置：&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;a)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 频率（间隔多少秒）&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;b)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 各个抓取列表的选取比例：get_detail，mod_detail，get_list，mod_list&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;链接抽取：抽取页面的链接，进行除重，对于新的链接，插入到待抓取列表里&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;内容抽取：按照模块的配置XPATH，抽取页面信息，并写入到pagebase中。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;离线调度：按照更新的比例，从crawled_set里面，定期选取url进入Mod队列里面进行刷新。&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;三：抓取模块：&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;抓取模块是抓取的必要条件，抓取模块来说，重要的是应付互联网上各式的问题，以及如何实现对对方站点的ip平衡，当然，这块是和调度系统的紧密结合的，对于抓取模块而言，本文主要使用scrapy工具包里面的下载模块。&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;首先，抓取模块从linkbase获取对应站点的抓取url，进行页面下载，然后将页面信息写回到pipeline中，并完成链接抽取和页面抽取，同时调用调度模块，插入到linkbase和pagebase中。&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;下载端设计：&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;IP：每台机器需要配置多个物理公网IP，下载的时候，随机选择一个IP下载&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;抓取频度调整：读取配置文件，按照配置文件的抓取频率进行选取url&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;四：配置界面：&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;配置界面主要是对抓取系统的管理和配置，包括：站点feed、页面模块抽取、报表系统的反馈等等。&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;类似于通用的抓取架构，本文提到的抓取系统架构如下图：&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; color: rgb(13,155,193); text-decoration: none; padding-top: 0px" href="http://www.searchtb.com/wp-content/uploads/2011/07/archicture1.png"&gt;&lt;img style="border-bottom: rgb(221,221,221) 1px solid; text-align: center; border-left: rgb(221,221,221) 1px solid; padding-bottom: 5px; background-color: rgb(243,243,243); margin: 10px; padding-left: 5px; padding-right: 5px; border-top: rgb(221,221,221) 1px solid; border-right: rgb(221,221,221) 1px solid; padding-top: 4px; border-top-left-radius: 3px 3px; border-top-right-radius: 3px 3px; border-bottom-right-radius: 3px 3px; border-bottom-left-radius: 3px 3px" class="alignnone size-full wp-image-1504" alt="" src="http://www.searchtb.com/wp-content/uploads/2011/07/archicture1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;一个完整的抓取数据流：&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;1：用户提供种子URL&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;2：种子URL进入linkbase中新URL队列中&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;3：调度模块选取url进入到抓取模块的待抓取队列中&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;4：抓取模块读取站点的配置文件，按照执行的频率进行抓取&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;5：抓取的结果返回到pipeline接口中，并完成连接的抽取&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;6：新发现的连接在linkbase里面进行dedup，并push到linkbase的新URL模块里面&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;7：调度模块选取url进入抓取模块的待抓取队列，goto 4&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;8：end&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;&lt;br style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;系统扩展&lt;/span&gt;&lt;/div&gt;&lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"&gt;&lt;span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; font-family: 微软雅黑; padding-top: 0px"&gt;本文提到的抓取系统，核心是调度和存储模块；其中，抓取，存储，调度都是通过数据进行交互的，因此，模块之间可以任意平行扩展，对于系统规模来说，只需要平行扩展mysql和redis存储服务集群以及抓取集群即可。当然，简单的扩展会带来一些问题：比如垃圾列表页的泛滥，链接库的膨胀等等问题，这些问题后续在讨论吧。&lt;/span&gt;&lt;/div&gt;&lt;/span&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2217961.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2011/10/19/2217961.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/08/05/2128175.html</id><title type="text">Safe3 Web漏洞扫描系统 v9.6免费版</title><summary type="text">Safe3 Web漏洞扫描系统是安全伞网络推出的网站安全检测工具，传统的方法往往依靠渗透测试（黑箱、白箱测试），往往局限于测试人员的技术水准高低。软件界面截图目前，大多是采用一系列已知攻击手段进行手工检测，且工作量巨大，由于时间关系以及各类网站系统的复杂性程度不同，通常得不到真正有效的评估，国内能从事此类工作的技术人员往往较少，用户最终得到的评估报告往往仅是找到几个系统已知漏洞、某个注入点或者跨站脚本攻击漏洞等常规漏洞。由于评估人员的知识面局限性使得整体评估不够全面，且深度不足。网站的应用逐步增多，更新较快，每隔一段时间应做一次全面检测，若采用传统渗透测试方法，花费昂贵，且往往得不到真正意义上</summary><published>2011-08-05T01:44:00Z</published><updated>2011-08-05T01:44:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/08/05/2128175.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/08/05/2128175.html"/><content type="html">&lt;div style="text-align: center" align="left"&gt;&lt;span style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;Safe3 Web漏洞扫描系统是安全伞网络推出的网站安全检测工具，传统的方法往往依靠渗透测试&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 21px; font-family: verdana, 'courier new'"&gt;&lt;span class="Apple-style-span" style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;（黑箱、白箱测试），往往局限于测试人员的技术水准高低。&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: center"&gt;&lt;span style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 21px; font-family: verdana, 'courier new'"&gt;&lt;span class="Apple-style-span" style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&amp;nbsp;&lt;/div&gt;&lt;div style="text-align: center"&gt;&lt;span style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 21px; font-family: verdana, 'courier new'"&gt;&lt;span class="Apple-style-span" style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;&lt;/span&gt;&lt;div align="center"&gt;&lt;img height="543" alt="" src="http://images.cnblogs.com/cnblogs_com/safe3/1.jpg" width="721" /&gt;&lt;/div&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;p&gt;软件界面截图&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;span class="Apple-style-span" style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;&amp;nbsp;目前，大多是采用一系列已知攻击手段进行手工检测，且工作量巨大，由于时间关系以及各类网站系统的复杂性程度不同，通常得不到真正有效的评估，国内能从事此类工作的技术人员往往较少，用户最终得到的评估报告往往仅是找到几个系统已知漏洞、某个注入点或者跨站脚本攻击漏洞等常规漏洞。由于评估人员的知识面局限性使得整体评估不够全面，且深度不足。网站的应用逐步增多，更新较快，每隔一段时间应做一次全面检测，若采用传统渗透测试方法，花费昂贵，且往往得不到真正意义上的风险报告。Safe3 Web Vul&amp;nbsp;Scanner使用领先的智能化爬虫技术及SQL注入状态检测技术，使得相比国内外同类产品智能化程度更高，速度更快，结果更准确。&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span" style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;&lt;/span&gt;&amp;nbsp;&lt;/p&gt;&lt;div&gt;&lt;span style="font-size: 13px; line-height: 18px; font-family: Tahoma, Verdana"&gt;&lt;p&gt;系统适用领域：&lt;/p&gt;&lt;p&gt;国内金融、证券、银行、电子政务、电子商务、教育、网游、综合行业门户、IDC等网站必备检测工具。&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;技术优势：&lt;/p&gt;&lt;p&gt;SQL注入网页抓取&lt;br /&gt;网页抓取模块采用广度优先爬虫技术以及网站目录还原技术。广度优先的爬虫技术的不会产生爬虫陷入的问题，可自定义爬行深度和爬行线程，网站目录还原技术则去除了无关结果，提高抓取效率。并且去掉了参数重复的注入页面，使得效率和可观性有了很大提高。&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;SQL注入状态扫描技术&lt;br /&gt;不同于传统的针对错误反馈判断是否存在注入漏洞的方式，而采用状态检测来判断。所谓状态检测，即：针对某一链接输入不同的参数，通过对网站反馈的结果使用向量比较算法进行比对判断，从而确定该链接是否为注入点，此方法不依赖于特定的数据库类型、设置以及CGI语言的种类，对于注入点检测全面，不会产生漏报现象。并且具备绕过WAF、IPS、IDS检测功能，扫描到隐藏的注入点。&lt;/p&gt;&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;超线程和内存回收技术&lt;/p&gt;&lt;p&gt;市面上web漏洞扫描系统通常不能完成腾讯、搜狐等类似门户网站的扫描，原因是第一扫描进度非常慢，第二 随着扫描占用系统内存非常高，最终因为系统内存不够而崩溃退出，Safe3 web漏洞扫描系统在这方面表现尤为出色，软件不仅采用线程池等技术保证了很低的CPU占用，还使用自动内存回收功能回收无用的内存，另外软件内部采用独特存储算法，可以在存储千万url地址时扫描速度依然不减，所以如果你的网站非常大，那么Safe3 web漏洞扫描系统是你的首选。&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;img height="597" alt="" src="http://images.cnblogs.com/cnblogs_com/safe3/waf2.jpg" width="846" /&gt;&lt;/p&gt;&lt;p&gt;企业扫描报表截图&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div&gt;&lt;span style="line-height: 25px; font-family: Arial"&gt;&lt;p&gt;&lt;span style="word-spacing: 0px; font: medium Simsun; text-transform: none; color: #000000; text-indent: 0px; white-space: normal; letter-spacing: normal; border-collapse: separate; orphans: 2; widows: 2"&gt;&lt;span style="font-size: 10pt; color: #444444; line-height: 22px; font-family: Verdana, Helvetica, Arial, sans-serif; border-collapse: collapse"&gt;&lt;strong style="font-weight: bold; line-height: normal; font-style: normal; text-align: left; word-wrap: break-word"&gt;软件运行需要&lt;a title="safe3wvs" style="color: #333333; line-height: normal; text-decoration: none; word-wrap: break-word" href="http://www.crsky.com/soft/4818.html" target="_blank"&gt;Microsoft .NET Framework v2.0&lt;/a&gt;&lt;/strong&gt;&lt;br style="line-height: normal; word-wrap: break-word" /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;span style="font-size: 10pt"&gt;新版功能：&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-size: 10pt"&gt;v9.6&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-size: 10pt"&gt;1.新增错误模式扫描功能 (适合有报错显示的网站)&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-size: 10pt"&gt;2.增强企业报表显示&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-size: 10pt"&gt;3.增强Form智能识别提交功能&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;font class="Apple-style-span" size="3"&gt;&lt;span class="Apple-style-span" style="font-size: 13px"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/font&gt;&lt;/div&gt;&lt;/div&gt;&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&lt;span class="Apple-style-span" style="line-height: 25px; font-family: Arial"&gt;&lt;span style="font-size: 10pt"&gt;下载地址：&amp;nbsp;&lt;/span&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; line-height: normal; border-right-width: 0px; word-wrap: break-word" alt="" src="http://t00ls.net/images/attachicons/rar.gif" border="0" /&gt;&amp;nbsp;&lt;a style="color: #1d58d1; text-decoration: none" href="http://www.safe3.com.cn/safe3wvs.rar"&gt;&lt;span style="font-size: 10pt"&gt;http://www.safe3.com.cn/safe3wvs.rar&lt;/span&gt;&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&amp;nbsp;（&lt;span style="color: red"&gt;免费版支持GET型sql注入和XSS漏洞扫描）&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2128175.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2011/08/05/2128175.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/07/22/2113567.html</id><title type="text">Varnish+Xcache构建高性能WEB构架初探</title><summary type="text">本文主要讲述web优化方案和缓存工具的调研及使用。根据目前的测试结果来看，采用varnish+xcache作为 apache和php缓存这种架构具有高并发、高稳定性，易扩展等优点，服务器的动态请求处理能力是之前的7倍之多。通过分析发现，目前对服务器的负载主要是在cpu使用方面，随着流量的增加瓶颈也将出现在cpu方面，而内存和IO方面都不是问题。针对这样的情况，我们就要研究怎么去降低cpu的负载，消除或降低系统的瓶颈。业务特点分析U服务采用的是LAMP(Linux Apache mysql php)架构，而服务本身的逻辑比较简单，就是根据不同的url返回特定的页面内容，而这些页面内容基本是不会变</summary><published>2011-07-22T02:00:00Z</published><updated>2011-07-22T02:00:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/07/22/2113567.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/07/22/2113567.html"/><content type="html">&lt;div&gt;&lt;span style="font-family: Arial; line-height: 20px; color: #333333; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; "&gt;&lt;p&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;本文主要讲述web优化方案和缓存工具的调研及使用。根据目前的测试结果来看，采用varnish+xcache作为 apache和&lt;/span&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;php缓存这种架构具有高并发、高稳定性，易扩展等优点，服务器的动态请求处理能力是之前的7倍之多。&lt;/span&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;通过分析发现，&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;目前对服务器的负载主要是在cpu使用方面，随着流量的增加瓶颈也将出现在cpu方面，而内存和IO方面都不是问题。&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;针对这样的情况，我们就要研究怎么去降低cpu的负载，消除或降低系统的瓶颈。&lt;br style="line-height: normal; " /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;table cellspacing="0" cellpadding="0" style="table-layout: auto; line-height: normal; "&gt;&lt;tbody style="line-height: normal; "&gt;&lt;tr style="line-height: normal; "&gt;&lt;td valign="top" style="font-family: Arial; word-wrap: break-word; word-break: break-all; visibility: visible !important; zoom: 1 !important; filter: none; font-size: 14px; line-height: normal; "&gt;&lt;table cellspacing="0" cellpadding="0" width="698" height="2267" style="table-layout: auto; line-height: normal; width: 698px; height: 2267px; "&gt;&lt;tbody style="line-height: normal; "&gt;&lt;tr style="line-height: normal; "&gt;&lt;td valign="top" style="font-family: Arial; word-wrap: break-word; word-break: break-all; visibility: visible !important; zoom: 1 !important; filter: none; font-size: 14px; line-height: normal; "&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;业务特点分析&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;U服务采用的是LAMP(Linux Apache mysql php)架构，而服务本身的逻辑比较简单，就是根据不同的url返回特定的页面内容，而这些页面内容基本是不会变的。整个过程是由php动态完成的，不需 要和其他服务器交互。在一个请求的响应过程中，系统cpu的消耗基本都在php处理上面。我们要做的就是尽量减少php的动态处理。&lt;br style="line-height: normal; " /&gt;&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;优化方案调研&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;性能优化通常的方法是采用缓存策略。根据U服务的业务特点，优化主要从两个方面进行，php缓存和前端缓存，原理图如下：&amp;nbsp;&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/591b1ecdb75c6817f31fe7bd.jpg" width="693" height="311" alt="" /&gt;&lt;br style="line-height: normal; " /&gt;&lt;br style="line-height: normal; " /&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;Php缓存调研&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;每次HTTP请求PHP页面时，PHP代码都会被解析和翻译为操作码（PHP 引擎直接执行的原语指令--类似于汇编语言）再执行。在要求很低或可忽略的情况下，服务器看上去能立即执行这个复杂的解释过程。但是一旦处理的页面增加， 重复工作就会对服务器造成很大的负担。在某些情况下，&amp;#8220;编译&amp;#8221;PHP代码的时间会远远超过执行该代码所需的时间，并且会对服务器负载造成很大压力。Php缓存主要是缓存opcode，避免重复编译PHP代码。目前针对php的内存缓存策略主要有 memcache 、 eAccelerator 、 xcache 和 APC 。Memcache是一种基于网络的缓存策略，是 一个高性能的分布式的内存对象缓存系统，通过在内存里维护一个统一的巨大的hash表，它能够用来存储各种格式的数据，包括图像、视频、文件以及数据库检 索的结果等。从memcache的特点来看，更适合用于分布式数据库和分布式计算领域。光从页面缓存和加速角度来说远不及eAccelerator、 xcache和APC。eAccelerator、 xcache 和 APC 差 不多，都是自由开放源码php加速器，优化和动态内容缓存，能够大大提高php脚本性能。它使得PHP脚本在编译的状态下，对服务器的开销几乎完全消除。 其中xcache是一种比较新的加速器，是eAccelerator的替代产品，更加稳定和高效，目前shifen前端已在使用。&lt;br style="line-height: normal; " /&gt;&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;Http前端缓存调研&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;针对HTTP的内存缓存策略使用得比较多就是 squid 和 varnish 。Squid 是一种在Linux下使用得比较 多的优秀的代理服务器，针对web用户来说，它还是一个高性能的代理缓存服务器，它将数据缓存在内存中，同时也缓存DNS的查询结果，可以加快网络浏览的 速度，提高客户机的访问命中率。Squid是目前使用的最广的HTTP加速器之一，目前在百度使用得还很少。Varnish 是另一种高性能的开源 HTTP加速器。Varnish 的作者Poul-Henning Kamp是FreeBSD的内核开发者之一，他认为现在的计算机比起1975年已经复杂许多。在1975年时，储存媒介只有两种：内存与硬盘。但现在计算 机系统的内存除了主存外，还包括了CPU内的L1、L2，甚至有L3快取。硬盘上也有自己的快取装置，因此Squid Cache自行处理物件替换的架构不可能得知这些情况而做到最佳化，但操作系统可以得知这些情况，所以这部份的工作应该交给操作系统处理，这就是 Varnish cache设计架构。目前国内有些门户网站如新浪，腾讯正在使用。&lt;br style="line-height: normal; " /&gt;&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;缓存工具性能对比及选择&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;测试平台&lt;/span&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;： HOST：tc-un-ct00.tc OS： Linux 2.6.9_5-4-0-2 #1 SMP CPU: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz &amp;#215; 4 MEM: 8GB 测试数据 ： jx-rcv00 20081028日18:00 &amp;#8211; 19:00 间的数据 压力工具 ： pfmtest&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;Php缓存工具对比测试&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;几个主流php缓存工具性能对比测试如下：&amp;nbsp;&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/591b1ecdb7526817f31fe7bf.jpg" width="817" height="378" alt="" /&gt;&lt;br style="line-height: normal; " /&gt;&lt;br style="line-height: normal; " /&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;前端缓存工具对比测试&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;apache，squid+apache，varnish+apache 性能对比如下：&amp;nbsp;&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/f947e7daa1571e888c1029b8.jpg" width="818" height="581" alt="" /&gt;&lt;br style="line-height: normal; " /&gt;&lt;br style="line-height: normal; " /&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;测试结论&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;Php缓存工具测试中，xcache在响应时间，cpu占用的方面均有最佳的表现；另一方面，从前端缓存测试结果来看，varnish更适合U服务的全动态服务。&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;strong style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;Varnish及xcache的使用&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;/strong&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;Varnish的使用&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;varnish 工具的介绍及详细使用方法参见&lt;/span&gt;&lt;a href="http://varnish.projects.linpro.no/" style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;http://varnish.projects.linpro.no/&lt;/span&gt;&lt;/a&gt;&lt;br style="line-height: normal; " /&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;Xcache的使用&lt;/span&gt;&lt;br style="line-height: normal; " /&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;Xcache的介绍及详细使用方法参加&lt;/span&gt;&lt;a href="http://xcache.lighttpd.net/" style="line-height: normal; "&gt;&lt;span style="line-height: normal; font-size: 14px; "&gt;http://xcache.lighttpd.net/&lt;/span&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/p&gt;&lt;/span&gt;&lt;/div&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2113567.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2011/07/22/2113567.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/07/17/2108558.html</id><title type="text">IIS监控请求脚本</title><summary type="text">{3a2a4e84-4c21-4981-ae10-3fda0d9b0f83} 0 5 IIS: WWW Server{06b94d9a-b15e-456e-a4ef-37c984a2cb4b} 0 5 IIS: Active Server Pages (ASP){dd5ef90a-6398-47a4-ad34-4dcecdef795f} 0 5 Universal Listener Trace{a1c2040e-8840-4c31-ba11-9871031a19ea} 0 5 IIS: WWW ISAPI Extension{AFF081FE-0247-4275-9C4E-021F3DC1DA</summary><published>2011-07-17T00:55:00Z</published><updated>2011-07-17T00:55:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/07/17/2108558.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/07/17/2108558.html"/><content type="html">&lt;div&gt;&lt;span style="font-family: Verdana, Arial; line-height: 24px; font-size: 12px; color: #626262; "&gt;&lt;div codepanel"="" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: #f5f5ef; border-top-width: 1px; border-top-style: solid; border-top-color: #e7e7d8; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #e7e7d8; background-position: initial initial; background-repeat: initial initial; "&gt;&lt;div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 6px; padding-right: 6px; padding-bottom: 6px; padding-left: 12px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 18px; "&gt;{3a2a4e84-4c21-4981-ae10-3fda0d9b0f83} 0 5 IIS: WWW Server&lt;br /&gt;{06b94d9a-b15e-456e-a4ef-37c984a2cb4b} 0 5 IIS: Active Server Pages (ASP)&lt;br /&gt;{dd5ef90a-6398-47a4-ad34-4dcecdef795f} 0 5 Universal Listener Trace&lt;br /&gt;{a1c2040e-8840-4c31-ba11-9871031a19ea} 0 5 IIS: WWW ISAPI Extension&lt;br /&gt;{AFF081FE-0247-4275-9C4E-021F3DC1DA35} 0 5 IIS: ASP.NET&lt;br /&gt;{d55d3bc9-cba9-44df-827e-132d3a4596c2} 0 5 IIS: Global&lt;br /&gt;{3b7b0b4b-4b01-44b4-a95e-3c755719aebf} 0 5 IIS: Request Monitor&lt;br /&gt;{DC1271C2-A0AF-400f-850C-4E42FE16BE1C} 0 5 IIS: IISADMIN Global&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;以上内容保存为 iistrace.guid&lt;br /&gt;&lt;div codepanel"="" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: #f5f5ef; border-top-width: 1px; border-top-style: solid; border-top-color: #e7e7d8; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #e7e7d8; background-position: initial initial; background-repeat: initial initial; "&gt;&lt;div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 6px; padding-right: 6px; padding-bottom: 6px; padding-left: 12px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 18px; "&gt;&lt;br /&gt;del summary.txt&lt;br /&gt;del workload.txt&lt;br /&gt;C:\windows\system32\logman start "NT Kernel Logger" -p "Windows Kernel Trace" (process,thread,disk) -ct perf -o krnl.etl -ets&lt;br /&gt;C:\windows\system32\logman start "IIS Trace" -pf iistrace.guid -ct perf -o iis.etl -ets&lt;br /&gt;@echo 取样分析建议10分钟以内，请及时点击 &amp;#8220;停止分析并生成报告&amp;#8221; 命令...&lt;br /&gt;pause&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;保存为start.bat&lt;br /&gt;&lt;div codepanel"="" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: #f5f5ef; border-top-width: 1px; border-top-style: solid; border-top-color: #e7e7d8; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #e7e7d8; background-position: initial initial; background-repeat: initial initial; "&gt;&lt;div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 6px; padding-right: 6px; padding-bottom: 6px; padding-left: 12px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 18px; "&gt;&lt;br /&gt;C:\windows\system32\logman stop "IIS Trace" -ets&lt;br /&gt;C:\windows\system32\logman stop "NT Kernel Logger" -ets&lt;br /&gt;C:\windows\system32\tracerpt iis.etl krnl.etl -o -report -summary&lt;br /&gt;del dumpfile.csv&lt;br /&gt;del iis.etl&lt;br /&gt;del krnl.etl&lt;br /&gt;notepad.exe workload.txt&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;保存为stop.bat&lt;br /&gt;&lt;br /&gt;执行start.bat&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;十分钟后执行stop.bat&amp;nbsp;&amp;nbsp;就能获取 10分钟内所有的IIS请求 并且有统计结果 包括响应速率 cpu使用率 请求次数等&lt;/span&gt;&lt;/div&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2108558.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2011/07/17/2108558.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/07/13/2105300.html</id><title type="text">酒店宾馆IP冲突解决办法</title><summary type="text">客人在我所供职的酒店上网的时候，经常会弹出一个对话框，显示一些提示，如上网的注意事项和消费标准等信息;并且有自己的电影和歌曲服务器，DHCP-server也是其中的一台服务器，宾馆、酒店就是用这台机器，为客户分配IP地址提供上网功能，即客户把自己的计算机连上网线，网卡配置自动获取IP地址，就会从动态主机配置协议(DHCP)服务器分配到一个IP地址;采用DHCP server可以自动为用户设置网络IP地址、掩码、网关、DNS、Wins 等网络参数，简化了用户网络设置，提高了管理效率。 那么我们的问题也出现了:常见的，很多用户抱怨用这种方法上不了网，但不是所有客户都上不了网。经过调查发现，住宾馆、</summary><published>2011-07-13T06:29:00Z</published><updated>2011-07-13T06:29:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/07/13/2105300.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/07/13/2105300.html"/><content type="html">&lt;div&gt;&lt;span style="font-family: arial, ����; line-height: 22px; font-size: 12px; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; "&gt;&lt;p&gt;客人在我所供职的酒店上网的时候，经常会弹出一个对话框，显示一些提示，如上网的注意事项和消费标准等信息;并且有自己的电影和歌曲服务器，DHCP-server也是其中的一台服务器，宾馆、酒店就是用这台机器，为客户分配IP地址提供上网功能，即客户把自己的计算机连上网线，网卡配置自动获取IP地址，就会从动态主机配置协议(DHCP)服务器分配到一个IP地址;采用DHCP server可以自动为用户设置网络IP地址、掩码、网关、DNS、Wins 等网络参数，简化了用户网络设置，提高了管理效率。&amp;nbsp;&lt;br /&gt;那么我们的问题也出现了:常见的，很多用户抱怨用这种方法上不了网，但不是所有客户都上不了网。经过调查发现，住宾馆、酒店的人绝大多数是商务人员和工程师，他们携带的手提电脑一般安装的是Windows server版本，server版本默认启动了DHCP server功能，当一台这样的计算机连入网络，在他之后的计算机就会把他当成DHCP服务器，并被分配了不正确的IP地址，从而上不了网。&lt;/p&gt;&lt;p&gt;DHCP服务器地址分配方式&lt;/p&gt;&lt;p&gt;DHCP是一种用于简化主机IP配置管理的协议标准。通过采用DHCP标准，可以使用DHCP服务器为网络上所有启用了DHCP的客户端分配、配置、跟踪和更改(必要时)所有TCP/IP设置。此外，DHCP还可以确保不使用重复地址、重新分配未使用的地址，并且可以自动为主机连接的子网分配适当的IP地址。当一个网络中，有2个或2个以上的DHCP服务器时，提醒切勿将DHCP地址池定义的过大，以免多个地址池之间出现"包含于"的关系，或者是部分客户端手工指定的IP地址包含于DHCP服务器的地址池中，从而造成DHCP的一些异常故障。&lt;/p&gt;&lt;p&gt;针对不同的需求，DHCP服务器有三种机制分配IP地址:&lt;/p&gt;&lt;p&gt;自动分配 DHCP服务器给首次连接到网络的某些客户端分配固定IP地址，该地址由用户长期使用;&lt;/p&gt;&lt;p&gt;动态分配 DHCP服务器给客户端分配有时间限制的IP地址，使用期限到期后，客户端需要重新申请地址，客户端也可以主动释放该地址。绝大多数客户端主机得到的是这种动态分配的地址;&lt;/p&gt;&lt;p&gt;手动分配 由网络管理员为客户端指定固定的IP地址。&lt;/p&gt;&lt;p&gt;三种地址分配方式中，只有动态分配可以重复使用客户端不再需要的地址。&lt;/p&gt;&lt;p&gt;每项技术都是有利有弊的，DHCP也不例外，由于DHCP有着配置简单，管理方便的优点，问题也随之产生，由于DHCP的运作机制，通常服务器和客户端没有认证机制，如果网络上存在多台DHCP服务器将会给网络造成混乱。由于用户不小心配置了DHCP服务器引起的网络混乱非常常见，足可见此问题的普遍性。&lt;/p&gt;&lt;p&gt;本人在从事网络工作的几年里，遇到过很多问题，其中有关DHCP-server冲突的不在少数，在解决问题的同时也总结了一些经验，在这里简单介绍一下，与大家分享，希望给在解决此类问题的同行一些帮助，也希望广大高手指出其中的不足和需要改进的地方。&lt;/p&gt;&lt;p&gt;DHCP服务器冲突的解决方法&lt;/p&gt;&lt;p&gt;使用DHCP snooping技术来解决&lt;/p&gt;&lt;p&gt;针对这种DHCP服务器冲突的解决方法有很多，最直接的方法就是贴告示，让入住的客户在上网时关闭Windows的DHCP网络服务，这个选项在&amp;#8216;控制面板'，&amp;#8216;管理工具'里的&amp;#8216;DHCP网络服务'，进入关闭即可。这里要注意的是，非server版的Windows不用关闭，并且不要把&amp;#8216;控制面板'，&amp;#8216;管理工具'，&amp;#8216;服务'中的DHCP client给停止了，这样是分配不到地址的。&lt;/p&gt;&lt;p&gt;当然上面的方法比较被动也不合常理，更不便于我们网络的管理，所以还是应该从我们网络本身出发来解决问题。&lt;/p&gt;&lt;p&gt;既然是DHCP的问题，那么我们就用DHCP的技术来解决问题，比较有代表的就是DHCP snooping技术。DHCP snooping技术是DHCP安全特性，通过建立和维护DHCP snooping绑定表过滤不可信任的DHCP信息，这些信息是指来自不信任区域的DHCP信息。DHCP snooping绑定表包含不信任区域的用户mac地址、IP地址、租用期、vlan-id接口等信息。&lt;/p&gt;&lt;p&gt;首先定义交换机上的信任端口和不信任端口，其中信任端口连接DHCP服务器或其他交换机的端口;不信任端口连接用户或网络。不信任端口将接收到的DHCP服务器响应的DHCP ack 和DHCP off报文丢弃;而信任端口将此DHCP报文正常转发，从而保证了用户获取正确的IP地址。具体配置如下:&lt;br /&gt;配置中的命令都是以CISCO的设备为基础，但不管是哪个公司的设备，总体设计思想是一致的，不同的可能在命令格式上略有差异，工作人员应该根据具体的实际情况来解决相应的问题。&lt;/p&gt;&lt;p&gt;&lt;br /&gt;在全局模式下启动DHCP snooping功能，这个默认是关闭的，而且不是所有设备都支持这个功能，最好先看使用说明。&lt;/p&gt;&lt;p&gt;switch(config)#ip dhcp-snooping&lt;/p&gt;&lt;p&gt;如果有vlan就使用下面的命令来监测具体的vlan&lt;/p&gt;&lt;p&gt;switch(config)#ip dhcp-snooping vlan vlan-id&lt;/p&gt;&lt;p&gt;然后定义可信任的端口，默认情况交换机的端口均为不信任端口，通常网络设备接口， TRUNK 接口和连接DHCP服务器的端口定义为可信任端口。&lt;/p&gt;&lt;p&gt;switch(config)#int f0/x&lt;/p&gt;&lt;p&gt;switch(config-if)#ip dhcp snooping trust&lt;/p&gt;&lt;p&gt;使用PVLAN技术来解决&lt;/p&gt;&lt;p&gt;有很多二层的技术可以防止DHCP-server冲突的，PVLAN就是其中一个运用比较广的技术。&lt;/p&gt;&lt;p&gt;PVLAN私有局域网(private vlan)，在PVLAN的概念里，端口有3种类型:Isolated port，Community port, Promiscuous port;它们分别对应不同的vlan类型:Isolated port属于Isolated PVLAN，Community port属于Community PVLAN，而代表一个Private vlan整体的是Primary vlan，前面两类vlan需要和它绑定在一起，同时它还包括Promiscuous port。在Isolated PVLAN中，Isolated port只能和Promiscuous port，彼此之间不能访问;在Community PVLAN中，vlan与vlan之间都不能访问，同一Community vlan的接口可以互相访问，并且所有Community vlan的接口都可以与Promiscuous port进行通信。利用这项技术，我们可以把上连或连接DHCP服务器的接口定义为Promiscuous port，其他接口分配到Isolated vlan里，这样所有接口都只能与上连或DHCP服务器进行通信，即使有一台机器设为DHCP服务器，其他机器也不会与它产生流量，把它做为服务器。&lt;/p&gt;&lt;p&gt;利用这个技术解决DHCP-server冲突的方法有很多，也很灵活，下面介绍一种比较简单的方法，也是用的比较多的:&lt;/p&gt;&lt;p&gt;首先把交换机配置成transparents模式:&lt;/p&gt;&lt;p&gt;switch(config)#vtp mode transparent&lt;/p&gt;&lt;p&gt;顺便可以打开端口的保护功能，它的意思是打开端口保护的端口之间不能访问，但打开保护的端口可以与没有开启此项功能的端口通信，可以根据自己的需求来打开保护功能:&lt;/p&gt;&lt;p&gt;switch(config)#int range f0/124&lt;/p&gt;&lt;p&gt;switch(config-if-range)#switchitchport protected&lt;/p&gt;&lt;p&gt;建立isolated vlan和primary vlan，把isolated vlan定义为primary lan的附属vlan，因为要与primary互相访问:&lt;/p&gt;&lt;p&gt;switch(config)#vlan 14&lt;/p&gt;&lt;p&gt;switch(config-vlan)private-vlan isolated&lt;/p&gt;&lt;p&gt;switch(config)#vlan 44&lt;/p&gt;&lt;p&gt;switch(config-vlan)#private-vlan primary&lt;/p&gt;&lt;p&gt;switch(config-vlan)#private-vlan association 14&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;/span&gt;&lt;/div&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2105300.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2011/07/13/2105300.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/06/23/2087936.html</id><title type="text">Safe3 Web应用防火墙14.1版评测</title><summary type="text">四年前的今天，一款名为“Safe3Web应用防火墙”的网站安全防护软件华丽面世，从这时开始国内服务器安防领域开始进入到全新时代。昨天安全伞网络科技公司正式发布了Safe3Web应用防火墙14.1企业版，版本号也由之前的13.X升级为14.1周年预览版。 一般来说如此大幅度的版本升级常常都意味着重大功能的加入，那么这次Safe3Web应用防火墙又会给我们带来哪些惊喜？下面就让我们一同来看一看。软件名称：Safe3Web应用防火墙软件版本：14.1周年版预览版软件大小：4.9M软件授权：试用版适用平台：Win2000Win2003Win2008（32、64位）下载地址：Safe3Web应用防火墙.</summary><published>2011-06-23T06:31:00Z</published><updated>2011-06-23T06:31:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/06/23/2087936.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/06/23/2087936.html"/><content type="html">&lt;div&gt;&lt;span style="font-family: 宋体; line-height: 24px; "&gt;&lt;p&gt;&lt;/p&gt;&lt;div&gt;&lt;span style="line-height: normal; font-size: small; border-collapse: collapse; "&gt;四年前的今天，一款名为&amp;#8220;Safe3&amp;nbsp;Web应用防火墙&amp;#8221;的网站安全防护软件华丽面世，从这时开始国内服务器安防领域开始进入到全新时代。昨天安全伞网络科技公司正式发布了Safe3&amp;nbsp;Web应用防火墙&amp;nbsp;14.1企业版，版本号也由之前的13.X升级为14.1&amp;nbsp;周年预览版。&lt;br style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; word-wrap: break-word; " /&gt;一般来说如此大幅度的版本升级常常都意味着重大功能的加入，那么这次Safe3&amp;nbsp;Web应用防火墙又会给我们带来哪些惊喜？下面就让我们一同来看一看。&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;table border="1" cellspacing="0" bordercolor="#000000" cellpadding="5" width="434" align="center" style="text-align: left; border-collapse: collapse; font-size: 12px; "&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td bgcolor="#0476b3" width="110" align="center"&gt;&lt;strong style="color: #ffffff; "&gt;软件名称：&lt;/strong&gt;&lt;/td&gt;&lt;td width="298" align="left"&gt;&lt;div&gt;&lt;span style="font-size: small; "&gt;Safe3&amp;nbsp;Web应用防火墙&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#0476b3" align="center"&gt;&lt;strong style="color: #ffffff; "&gt;软件版本：&lt;/strong&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;div&gt;&lt;span style="font-size: small; "&gt;14.1&amp;nbsp;周年版预览版&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#0476b3" align="center"&gt;&lt;strong style="color: #ffffff; "&gt;软件大小：&lt;/strong&gt;&lt;/td&gt;&lt;td align="left"&gt;4.9M&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#0476b3" align="center"&gt;&lt;strong style="color: #ffffff; "&gt;软件授权：&lt;/strong&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;div&gt;&lt;span style="font-size: small; "&gt;试用版&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#0476b3" align="center"&gt;&lt;strong style="color: #ffffff; "&gt;适用平台：&lt;/strong&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;div&gt;&lt;span style="font-size: small; "&gt;Win2000&amp;nbsp;Win2003&amp;nbsp;Win2008&amp;nbsp;（32、64位）&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#0476b3" align="center"&gt;&lt;strong style="color: #ffffff; "&gt;下载地址：&lt;/strong&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;u style="color: #d50000; font-weight: bold; "&gt;&lt;div&gt;&lt;a target="_blank" href="http://dl.pconline.com.cn/html_2/1/63/id=41037&amp;amp;pn=0.html" style="color: #000000; text-decoration: none; "&gt;&lt;span style="font-weight: normal; font-size: small; "&gt;&lt;u style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; word-wrap: break-word; "&gt;&lt;/u&gt;&lt;/span&gt;&lt;/a&gt;&lt;u style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; word-wrap: break-word; "&gt;&lt;a href="http://www.safe3.com.cn/safe3.rar" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; word-wrap: break-word; color: #336699; text-decoration: none; "&gt;Safe3&amp;nbsp;Web应用防火墙&lt;/a&gt;&lt;/u&gt;&lt;/div&gt;&lt;/u&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;&lt;/strong&gt;&lt;/p&gt;&lt;div&gt;&lt;span style="line-height: normal; font-size: small; border-collapse: collapse; "&gt;&lt;br style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; word-wrap: break-word; " /&gt;&lt;strong style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; word-wrap: break-word; "&gt;一、安装与界面&lt;/strong&gt;&lt;br style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; word-wrap: break-word; " /&gt;安装过程一如既往地简洁，不过由于新功能的加入，体积较上一版略有增加，达到了4.9MB。整个过程依旧由解压、选择32位或64位安装包、&amp;#8220;安装路径&amp;#8221;等几个关键步骤组成，除了版本号略有调整外，其余基本上与13.X版无异。&lt;br style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; word-wrap: break-word; " /&gt;Safe3&amp;nbsp;Web应用防火墙一向不在安装过程中捆绑其他软件，我们大可放心地一路点击&amp;#8220;下一步&amp;#8221;完成安装，即便没有太多的电脑应用基础也不用发愁。&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/install.jpg" border="0" alt="" width="502" height="386" /&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;图1 安装截图&lt;/p&gt;&lt;p&gt;首次启动后软件将出现功能面板，其中左侧可以直接链接到对应的功能，能够帮助用户更快捷地掌握新功能的位置与属性。&lt;/p&gt;&lt;p&gt;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/main.jpg" border="0" alt="" width="622" height="425" /&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;图2 新版主界面&amp;nbsp;&lt;/p&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;二、 实测软件功能&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;如何提高安全软件的黑客防御能力，一直都是很多厂商都感头疼的问题，虽然web应用防火墙逐渐普及，可大多数厂商仍在不断提高自家产品的拦截性能。在&lt;span class="Apple-style-span" style="line-height: normal; font-size: small; border-collapse: collapse; "&gt;&lt;span style="font-size: 12px;"&gt;Safe3 Web应用防火墙&lt;/span&gt;14.1&lt;/span&gt;中，笔者终于见识了什么叫&amp;#8220;韩信点兵、多多益善&amp;#8221;。&lt;/p&gt;&lt;/div&gt;&lt;p&gt;除了传统的web安全防护以外，Safe3 Web应用防火墙加入了web杀毒、网站监控、篡改检查等诸多实用功能，从而让Safe3 Web应用防火墙成为了一款真正意义上的全功能web信息安全产品。当然正所谓&amp;#8220;无图无真相&amp;#8221;，究竟实用功能能否为Safe3 Web应用防火墙创造奇迹，我们决定还是通过一个真实的评测看一下！&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/wafsql.jpg" border="0" alt="" width="622" height="425" /&gt;&lt;/p&gt;&lt;p&gt;图3 &amp;#8220;传说&amp;#8221;中的sql注入保护功能&amp;nbsp;&lt;/p&gt;&lt;p&gt;和很多专业web应用防火墙一样，Safe3 web应用防火墙也加入了GET和Cookie防注入功能，但是除了这两种保护还有POST sql防注入功能，这个是微软官方开发的产品UrlScan所不具备的。另外软件还内置了安全伞科技多年积累的超强sql注入防护规则，而市面上很多类似产品黑客可轻易绕过防注入，软件形同虚设。 对于高级web管理员，对sql注入和正则表达式比较熟悉的还可以自定sql防注入规则。&lt;/p&gt;&lt;p&gt;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/wafmonitor.jpg" border="0" alt="" width="622" height="425" /&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;图4 功能强大的网站监控功能&amp;nbsp;&lt;/p&gt;&lt;p&gt;网站被黑客上传web后门在平时并不鲜见。而由于web程序员水平参差不齐，很难找出黑客的web后门和漏洞所在网页，而要想杜绝这种问题，光靠web程序员的大海捞针显然不足。&amp;nbsp;这时你就可以使用网站监控功能，它可以监控记录指定类型文件的修改、删除、重命名等所有改动，让黑客对网页的一举一动都在我们的监视之下。我们还可以勾选删除指定类型的新建文件，比如asp文件，这样黑客上传的asp网站后门就可以轻松被干掉，从而杜绝网页上传漏洞带来的进一步危机。&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/2.jpg" border="0" alt="" width="634" height="525" /&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;图5 强力的网站后门扫描功能&lt;/p&gt;&lt;div&gt;&lt;p&gt;Safe3 web应用防火墙率先在国内推出第一款第一款能完全扫描网站后门的全功能web杀毒软件。由于软件采用智能脚本解析扫描引擎，查杀率在国内同类产品中摇摇领先。&lt;/p&gt;&lt;p&gt;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/wafreport.jpg" border="0" alt="" width="818" height="399" /&gt;&lt;/p&gt;&lt;p&gt;图6 人性化的日志查看功能&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;img src="http://images.cnblogs.com/cnblogs_com/safe3/waflog.jpg" border="0" alt="" width="610" height="484" /&gt;&lt;/p&gt;&lt;p&gt;图7 详尽的报表功能&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;Safe3 web应用防火墙具有企业化的防火墙日志查询功能，并且可以导出详细的日志查看报表，这不仅方便网站管理员分析查看黑客攻击，还可以生成报表便于领导查看。&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span" style="font-family: 宋体, tahoma; "&gt;&lt;strong style="padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 14px; font-style: normal; color: #000000; word-wrap: break-word; word-break: break-all; "&gt;总&lt;/strong&gt;&lt;strong style="padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font-size: 14px; font-style: normal; color: #000000; word-wrap: break-word; word-break: break-all; "&gt;结&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span class="Apple-style-span" style="font-family: 宋体, tahoma; "&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;作为一整套完整的企业web信息安全产品，Safe3 web应用防火墙集成了黑客攻击防御、&lt;span class="Apple-style-span" style="font-family: 宋体; "&gt;&lt;span class="Apple-style-span" style="font-family: 宋体, tahoma; "&gt;后门查杀、网站监控、篡改检查、报表生成等多项安全防护及企业报表功能，已经发展成为一款复合型的优秀网站安全产品。此次发布的Safe3 web应用防火墙14.1&lt;span class="Apple-style-span" style="font-family: 宋体; "&gt;&lt;span class="Apple-style-span" style="font-family: 宋体, tahoma; "&gt;则是安全伞网络科技发展史上一个里程碑式的作品，可以说&lt;span class="Apple-style-span" style="font-family: 宋体, &amp;nbsp;tahoma; line-height: 21px; border-collapse: collapse; "&gt;，Safe3 web应用防火墙是一款融入健康理念的安全产品&lt;/span&gt;，它不仅仅以保护网站安全为使命，更把保护网络健康为责任，相信Safe3 web应用防火墙一定能够成为每一位站长的必备软件！&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;/div&gt;&lt;p&gt;&lt;/p&gt;&lt;div&gt;&lt;/div&gt;&lt;/span&gt;&lt;/div&gt;&lt;img src="http://www.cnblogs.com/Safe3/aggbug/2087936.html?type=1" width="1" height="1" alt=""/&gt;&lt;p&gt;&lt;a href="http://www.cnblogs.com/Safe3/archive/2011/06/23/2087936.html" target="_blank"&gt;本文链接&lt;/a&gt;&lt;/p&gt;</content></entry><entry><id>http://www.cnblogs.com/Safe3/archive/2011/06/03/2072213.html</id><title type="text">SQL Injection</title><summary type="text">SQL injection is an attack in which malicious code is inserted into strings that are later passed to an instance of SQL Server for parsing and execution. Any procedure that constructs SQL statements should be reviewed for injection vulnerabilities because SQL Server will execute all syntactically v.</summary><published>2011-06-03T13:12:00Z</published><updated>2011-06-03T13:12:00Z</updated><author><name>Safe3</name><uri>http://www.cnblogs.com/Safe3/</uri></author><link rel="alternate" href="http://www.cnblogs.com/Safe3/archive/2011/06/03/2072213.html"/><link rel="alternate" type="text/html" href="http://www.cnblogs.com/Safe3/archive/2011/06/03/2072213.html"/><content type="html">该文被密码保护。</content></entry></feed>
