-
Notifications
You must be signed in to change notification settings - Fork 105
Tutorial
PM> Install-Package SimpleBrowser
The simplest use of SimpleBrowser is to download the contents of a known URL. In this case we take the English Wikipedia homepage.
var b = new Browser();
b.Navigate("http://en.wikipedia.org");
Console.WriteLine(b.Url);
// http://en.wikipedia.org/wiki/Main_Page
Console.WriteLine(b.CurrentHtml);
// <!DOCTYPE html>
// <html lang="en" dir="ltr" class="client-nojs">
// <head>
// etc..
Note that the URL is not exactly as we had requested it, because Wikipedia redirected us (status 301) to another URL. As SimpleBrowser acts as a visible (non-headless) browser would, it follows the redirect.
If we want to interact with the page, we typically want to select a specific part of the page. For example, by using the ID of an element. The Find() method allows for a number of different ways to search for elements in the page. The homepage of Wikipedia always contains a featured article for today, so let's select that information out:
var todaysFeaturedArticle = b.Find("div", FindBy.Id, "mp-tfa");
Console.WriteLine(todaysFeaturedArticle.Value);
// Full text from the element and it's children. No Markup.
In this case (but not always) the result represents a specific element from the page. You can get the textual value out as in the sample, but you can also interact with the element using the Click() method or the Checked property. The Value can also be set, which is especially appropriate when the result is an input box or text area. You can also access more detailed information using the XElement property. This will expose the XML structure of the element an allow you to navigate the details of the structure of the page.
When multiple elements exist that conform to your specification, you can still use Find(). The return type, HtmlResult can also serve as a collection of elements. When you use te properties and methods described above, it will apply them on the first element found. But it also exposes properties like TotalElementsFound and implements IEnumerable. Let's loop through all links in the page:
var links = b.Find("a", new object { });
foreach (var link in links)
{
Console.WriteLine("Found link with text '{0}' and title '{1}' to {2}", link.Value, link.GetAttribute("title"), link.GetAttribute("href"));
}
//Found link with text 'Sofia' and title 'Sofia' to /wiki/Sofia
//Found link with text 'Ottoman' and title 'Ottoman Empire' to /wiki/Ottoman_Empire
//Found link with text '1942' and title '1942' to /wiki/1942
//Found link with text 'World War II' and title 'World War II' to /wiki/World_War_II
//Found link with text 'Imperial Japanese Army' and title 'Imperial Japanese Army' to /wiki/Imperial_Japanese_Army
//Found link with text 'systematic extermination' and title 'Sook Ching' to /wiki/Sook_Ching
//Found link with text 'Chinese Singaporeans' and title 'Chinese Singaporean' to /wiki/Chinese_Singaporean
//...
###Using Select
The Find() method offers a number of different ways to filter your elements (FindBy.Name, FindBy.Text, FindBy.PartialText, etc...). These methods were designed before jquery made CSS selectors the de facto query language inside HTML documents. To allow you to use this in SimpleBrowser as well, the Select() method was added. It takes a string as its single argument, but you should be able to express most of the queries you'll need with that. This is how we first loop over all links in the "Today's Featured Article" block and then click on the main articles link (which on Wikipedia is the first bold link).
var b = new Browser();
b.Navigate("http://en.wikipedia.org");
var links = b.Select("#mp-tfa a[href]"); // all links with a href inside #mp-tfa
foreach (var link in links)
{
Console.WriteLine("Found link with text '{0}' and title '{1}' to {2}", link.Value, link.GetAttribute("title"), link.GetAttribute("href"));
}
var mainlink = b.Select("#mp-tfa b>a[href]");// all links with <a href> directly inside a <b> inside #mp-tfa
mainlink.Click();
Console.WriteLine("Url: {0}", b.Url);
// Found link with text 'SMS Bayern' and title 'SMS Bayern' to /wiki/SMS_Bayern
// Found link with text 'class' and title 'Ship class' to /wiki/Ship_class
// Found link with text 'battleships' and title 'Battleship' to /wiki/Battleship
// Found link with text 'German Imperial Navy' and title 'Kaiserliche Marine' to /wiki/Kaiserliche_Marine
// ...
// Url: http://en.wikipedia.org/wiki/SMS_Bayern
The Select() method can also be used in the scope of a single element. This allows you to search within a part of the page.
Now that you have learned how to Find() elements, let's look at using those elements to submit forms. The process is to first find the form element, change the form element's value, then, once all form elements in the form have values to submit, submit the form. The following example searches Wikipedia from the form on the Wikipedia home page:
var b = new Browser();
b.Navigate("http://en.wikipedia.org");
// Find for the form element to change.
var searchInput = b.Find("searchInput");
// Optionally, you could do some error checking to see if you found what you were looking for
if(searchInput == null || searchInput.Exists == false)
{
throw new Exception("Element not found");
}
// Assign the value to the form element.
searchInput.Value = "Mersenne twister";
// Submit the form
searchInput.SubmitForm();
Console.WriteLine(b.CurrentHtml);
// <!DOCTYPE html>
// <html lang="en" dir="ltr" class="client-nojs">
// <head>
// <meta charset="UTF-8" />
// <title>Mersenne twister - Wikipedia, the free encyclopedia</title>
// etc..
The Wikipedia search form is a very simple, and well-behaved example. There is only one form element that is easy to find. The above sample code is the equivalent of typing in the search box and pressing enter to submit the form. While completely acceptable to Wikipedia, some web sites insist that the search button be clicked. If this had been the case on Wikipedia, the code would look like this:
var b = new Browser();
b.Navigate("http://en.wikipedia.org");
// Find for the form element to change.
var searchInput = b.Find("searchInput");
// Assign the value to the form element.
searchInput.Value = "Mersenne twister";
// Find the search button
var searchButton = b.Find("searchButton");
// Click the search button
searchButton.Click();
Console.WriteLine(b.CurrentHtml);
// <!DOCTYPE html>
// <html lang="en" dir="ltr" class="client-nojs">
// <head>
// <meta charset="UTF-8" />
// <title>Mersenne twister - Wikipedia, the free encyclopedia</title>
// etc..
Let's look at a more complex form, with a combination of text boxes, radio buttons, check boxes, and selects. It will be helpful for the purposes of this tutorial and, in general, for working with forms to load the page in a browser and view the source code for the page. This is often the best and fastest way to know what the form looks actually like and how you will need to approach interacting with the form.
SimpleBrowser.Browser b = new SimpleBrowser.Browser();
b.Navigate("http://www.tizag.com/phpT/examples/formexample.php");
// Find a text input
var firstName = b.Find(ElementType.TextField, FindBy.Name, "Fname");
firstName.Value = "Michelangelina";
// Note: The HTML form had a maxlength attribute limiting the text to 12 characters. Therefore the value of the text input varies from what was assigned.
Console.WriteLine(firstName.Value);
// Michelangeli
// Find a radio button
var gender = b.Find(ElementType.RadioButton, FindBy.Value, "Female");
gender.Checked = true;
// This will also work to set the selected state of a radio button.
gender.Click();
// Find a check box
var food = b.Find("input", new { name = "food[]", value = "Pizza" });
// This will work to toggle the state of a check box ...
food.Click();
// ... but this will set it to a known value.
food.Checked = true;
// Find a textarea (note that a text input and textarea are both of type ElementType.TextField)
var quote = b.Find(ElementType.TextField, FindBy.Name, "quote");
quote.Value = "I love it when a plan comes together.";
// Find a select (drop-down box)
var education = b.Find(ElementType.SelectBox, FindBy.Name, "education");
education.Value = "College";
// Find a select (drop-down box)
var time = b.Find(ElementType.SelectBox, FindBy.Name, "TofD");
time.Value = "Day";
Every web developer codes their forms differently. When interacting with forms, you will need to familiarize yourself with the intricacies of the form and how the programmer created it. For example, many programmers use JavaScript to create drop down lists.
For example, a year drop down might render in a browser like this:
<select id="selectElementId">
<option value="2015">2015</option>
<option value="2016">2016</option>
<option value="2017">2017</option>
<option value="2018">2018</option>
<option value="2019">2019</option>
</select>
When you view source on the page, however, it might just look like this:
<select id="selectElementId"></select>
Somewhere a chunk of JavaScript has created the option elements in the select. Since SimpleBrowser doesn't support JavaScript, the SimpleBrowser user is required to do the work of the JavaScript manually. For example:
// Create a new option element
System.Xml.Linq.XElement newOption = new System.Xml.Linq.XElement("option");
newOption.SetAttributeCI("value", "2018");
// Find a select (drop-down box)
var year = b.Find("selectElementId");
// Add the new option to the select
year.XElement.Add(newOption);
// Select the option just added
year.Value = "2018";
Note that you don't have to add all of the elements. You only have to add the element with the value you are selecting.
You may be using SimpleBrowser to access public Internet sites. Not all sites are created equal. There are many web site that look beautiful, but are rendered by the browser from completely invalid HTML. All browsers do their best to handle malformed HTML. Occasionally, there are sites that are so poorly written that they will crash the browser. SimpleBrowser is no different from Chrome or FireFox in all these ways.
In cases where malformed HTML is causing a crash in SimpleBrowser, it is often necessary to modify the source HTML before the SimpleBrowser parser processes the HTML. The SimpleBrowser parser will not attempt to parse until the first call to Find() is made. Until that time, you have every opportunity to change the HTML to prevent a SimpleBrowser crash. For example, if there was a "badtoken" that caused the parser to crash, and all you needed to do to make it work was replace it with "goodtoken", just to that before you call Find():
var b = new Browser();
b.Navigate("http://www.example.com");
b.CurrentHtml.Replace("badtoken", "goodtoken");
b.Find("tokenId")
TBD
TBD
TBD
TBD
TBD
TBD
TBD
TBD
TBD
TBD