Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
6271b3d
add native dir
deanmao May 30, 2012
2e8e575
package it up
deanmao May 31, 2012
6bd5f88
x
deanmao Jun 25, 2012
ccce15e
get working for v0.8
deanmao Jul 4, 2012
431a3b5
package for public
deanmao Jul 4, 2012
75d52a1
ignore gyp file
deanmao Jul 4, 2012
7746775
add license
deanmao Jul 4, 2012
0001d25
add v0.6 stuff for linux
deanmao Jul 4, 2012
d28d0b2
support 0.6
deanmao Jul 4, 2012
9b46ec0
update binary
deanmao Jul 4, 2012
b1910a8
bump
deanmao Jul 4, 2012
06eb09b
remove v0.6 for macs
deanmao Jul 4, 2012
a87e921
bsd
deanmao Jan 3, 2013
00b5d58
add qt and add some basic documentation
deanmao Jan 3, 2013
d2a373c
add example in readme
deanmao Jan 3, 2013
b629960
tweaks
deanmao Jan 3, 2013
8de41e5
add bad example
deanmao Jan 3, 2013
55dac5f
remove old examples and update to use new qt
deanmao Jan 5, 2013
9983e3d
remove static qt libraries
deanmao Jan 5, 2013
f8c598c
move deprecated stuff away
deanmao Jan 5, 2013
f3a96b5
more cleanup
deanmao Jan 5, 2013
1edac38
update moc and add updated native libs
deanmao Jan 5, 2013
15956e8
fix qt
deanmao Jan 5, 2013
353c26e
add compiled dir
deanmao Jan 5, 2013
068b23d
make npm package smaller
deanmao Jan 5, 2013
de6208c
make compiled dir
deanmao Jan 5, 2013
20fd788
add openssl
deanmao Jan 6, 2013
deef3d4
specific headers
deanmao Jan 6, 2013
6c548cb
remove old ssl
deanmao Jan 6, 2013
9f6982c
custom configure for ssl
deanmao Jan 6, 2013
002f85c
add instructions
deanmao Jan 6, 2013
e489742
final step
deanmao Jan 6, 2013
63afe53
instructions
deanmao Jan 6, 2013
6cbae30
readme
deanmao Jan 6, 2013
7f619b1
remove bad qt version
deanmao Jan 6, 2013
05e77df
add good qt version
deanmao Jan 6, 2013
bdf1c3a
make a few compile scripts
deanmao Jan 6, 2013
24918f5
update instructions
deanmao Jan 6, 2013
3552d0d
fixes for macosx
deanmao Jan 6, 2013
153f976
bug
deanmao Jan 6, 2013
bdcce2b
get compilation smooth on osx
deanmao Jan 6, 2013
6106c0c
update instructions for both platforms
deanmao Jan 6, 2013
e2e65b4
update chimera on darwin, and add some examples
deanmao Jan 6, 2013
0a2d8bb
clean up scripts
deanmao Jan 6, 2013
c3909a3
remove old crap
deanmao Jan 6, 2013
7ca8690
downloader
deanmao Jan 6, 2013
0aec278
make it download gzipped binaries
deanmao Jan 6, 2013
f25b6ed
add s3 binary uploader
deanmao Jan 6, 2013
c7f0147
bump version
deanmao Jan 6, 2013
90a6cff
ignore
deanmao Jan 6, 2013
5a2e1f4
add ia32 native binary fix, fix qt compile
deanmao Jan 7, 2013
485c34a
introduce weird hack
deanmao Jan 7, 2013
cc3978a
remove hack
deanmao Jan 7, 2013
44532cd
bump
deanmao Jan 7, 2013
0689530
add configurable network proxy
deanmao Jan 7, 2013
f8c7fec
add navigation function
deanmao Jan 7, 2013
6727fec
tag binary with package version
deanmao Jan 8, 2013
99bcf55
remove bad qt
deanmao Jan 13, 2013
76a89c0
casting uv_queue_work last arg to make it compatible with node 0.10
njoubert Mar 20, 2013
9ec663b
Mentioning that node-gyp is needed to build this
njoubert Mar 20, 2013
6d7308f
Build using the Qt from 'brew install qt'
njoubert Mar 20, 2013
2bbef47
Commented the main code path
njoubert Mar 21, 2013
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,16 @@ lib-cov
*.dat
*.out
*.pid
*.gz

pids
logs
results

build
moc
.DS_Store
node_modules
npm-debug.log

qt_compiled
lib/chimera.node
8 changes: 8 additions & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
deps
build
binding.gyp
qt
qt_compiled
openssl
native
lib/chimera.node
18 changes: 18 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Copyright 2010 - 2013, Dean Mao <deanmao@gmail.com>. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
201 changes: 200 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,202 @@
# Node-Chimera
# Chimera: A new kind of phantom for NodeJS

I was inspired by [PhantomJS](http://phantomjs.org) and wanted something similar, but could be run inside of the nodejs
environment, without calling out to an external process. PhantomJS is run as an external process that users can run
under any language, however one must create a fancy glue wrapper so that development isn't impaired. I created
something that does exactly what phantomjs is capable of doing, except in a full js environment, called Chimera.

## Installation

Installing is simple via npm:

npm install chimera

It will download the native chimera binary in the postinstall script. Currently we have binaries for 64bit darwin (mac),
and 64bit linux. If you use something different, you may have to compile your own or wait for me to build one for your
platform.

## Usage

The most basic skeleton should look something like this:

var Chimera = require('chimera').Chimera;

var c = new Chimera();
c.perform({
url: "http://www.google.com",
locals: {

},
run: function(callback) {
callback(null, "success");
},
callback: function(err, result) {

}
});

When you instantiate a new chimera with `new Chimera()`, you're actually creating a new browser instance which does
not share session data with other browser sessions. It has it's own in memory cookie database and url history.

The `locals` hash should contain variables you wish to pass to the web page. These values should be types that can be
turned into json because the sandboxing environment of the browser's js engine prevents us from passing actual nodejs
variable references.

The `run` function is run immediately as the page is loaded. You may wish to wait until the entire page is loaded
before you perform your logic, so you'd have to do the same stuff that you'd do in normal javascript embedded in
webpages. For example, if you were using jquery, you'd be doing the standard `$(document).ready(function(){stuff})`
type of code to wait for the page to fully load. Keep in mind that the run function is run inside the webpage
so you won't have access to any scoped variables in nodejs. The `callback` parameter should be called when you're
ready to pause the browser instance and pass control back to the nodejs world.

The `callback` function is run in the nodejs context so you'll have access to scoped variables as usual. This
function is called when you call the callback function from inside of `run()`.

## Chimera options

var c = new Chimera({
userAgent: 'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6',
libraryCode: '(function() { window.my_special_variable = 1234; })()',
cookies: '',
disableImages: true
});

Here are all the possible options available when creating a new browser instance:

- `userAgent`: Any string that represents a user agent. By default it uses the one shown in the example, a windows chrome browser.
- `libraryCode`: If you want to inject jquery into all your webpages, you should do something like `fs.readFileSync("jquery.js")` here.
- `cookies`: as seen in later examples, you can save the cookies from a previous browser instance and use them here
- `disableImages`: If you don't need images in your scraper, this can drastically reduce memory and speed up webpages. However, your screenshots may look like crap.

## A simple login example

In the example code below, we show how to login to a website using a native mouse button click on the submit button, then load a second
browser instance using the logged in cookies from the first browser instance.

var Chimera = require('chimera').Chimera;

var myUsername = "my_username";
var myPassword = "my_password";

var c = new Chimera();
c.perform({
url: "http://www.mywebsite.com",
locals: {
username: myUsername,
password: myPassword
},
run: function(callback) {
// find the form fields and press submit
pos = jQuery('#login-button').offset()
window.chimera.sendEvent("click", pos.left + 10, pos.top + 10)
},
callback: function(err, result) {
// capture a screen shot
c.capture("screenshot.png");

// save the cookies and close out the browser session
var cookies = c.cookies();
c.close();

// Create a new browser session with cookies from the previous session
var c2 = new Chimera({
cookies: cookies
});
c2.perform({
url: "http://www.mywebsite.com",
run: function(callback) {
// You're logged in here!
},
callback: function(err, result) {
// capture a screen shot that shows we're logged in
c2.capture("screenshot_logged_in.png");
c2.close();
}
});
}
});

### A few notes

In the example above, you may notice `window.chimera.sendEvent()`. The `chimera` variable is a global inside webpages and
allow you to call functions that you otherwise wouldn't be able to. You can take a screenshot with `chimera.capture()` for
example.

When we are in the callback() for the first browser instance, we nab the cookies via `c.cookies()`. If you inspect the
cookies, you'll see that it's just a giant string containing the domain, keys, and values. This may contain http & https
cookies as well, which are normally not accessible via javascript from inside the webpage. You'll also probably notice
there are cookies from tracking companies like google analytics or mixpanel. The cookies string will basically contain
everything that a browser may have. If you want to remove the google analytics cookies, you'll have to parse the cookie
string and remove them manually yourself. There are many cookie parsers out there -- check out the one that is included in
the expressjs middleware if you need something quick and dirty.

## A bad example

Here's a few things that you should not do.

var c = new Chimera();
var fs = require('fs');
c.perform({
url: "http://www.mywebsite.com",
locals: {
fs: fs
},
run: function(callback) {
var os = require('os');
},
callback: function(err, result) {

}
});

In the above example, we try to pass the `fs` variable as a local variable. We can't do this because `fs` cannot be
turned into a json string. Just because it looks like it might work, it won't. The sandbox in the web browser
prevents scoped variables from being available.

A second thing wrong is that the `run()` function doesn't perform the callback function with `callback()`. If you do
this, the context will never be passed back to the nodejs world so you'll be wondering why you can't scrape anything.

The third thing wrong here is that inside the `run()` function, we're trying to call `require('os')`. The require
function pulls from the nodejs scoped context which isn't available inside the webpage. You only have access to typical
variables in a webpage like `window.document` etc.

## Compiling your own version

Since this library does use native libraries, I may not have a native version for your platform (people have been asking
me about arm-linux and sunos). Hopefully I can describe how one can compile this under your platform, and perhaps we can
move to something easier.

### Getting node-gyp

This project uses the Generate Your Projects build system, which you can install using node-gyp:

npm install -g node-gyp


### Compiling on the mac:

Getting a binary on the mac is fairly easy, but it does take a long time to compile Qt. Unlike Linux, you don't need
the custom openssl included with chimera. Here's the basic steps to take the mac:

./scripts/compile_qt.sh
./scripts/compile_binary.sh

The final binary should be inside of node-chimera/lib.


### Compiling on linux:

You'll need the ssl headers, freetype, and fontconfig libraries first, so you'll have to install with a command like:

apt-get install libfreetype6-dev libfontconfig1-dev libssl-dev

Since nodejs comes with it's own version of ssl, we have to make Qt also use this version of ssl or else we'll have segfaults.
Compile the openssl included first (we have some additional flags like `-fPIC` which allow the libraries to be statically included
later on). Here are all the steps required to build chimera:

./scripts/compile_openssl.sh
./scripts/compile_qt.sh
./scripts/compile_binary.sh

The final chimera.node binary should exist inside the node-chimera/lib directory. If you don't see it in there, something bad
probably happened along the way.
51 changes: 34 additions & 17 deletions binding.gyp
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,50 @@
'targets': [
{
'target_name': 'chimera',
'type': '<(library)',
'sources': [
'src/top.cc',
'src/cookiejar.cc',
'src/chimera.cc',
'src/top.cc',
'src/cookiejar.cc',
'src/chimera.cc',
'src/browser.cc'
],
'conditions': [
['OS=="mac"', {
'include_dirs': [
'deps/qt-4.8.0/darwin/x64/include',
'deps/qt-4.8.0/darwin/x64/include/QtCore',
'deps/qt-4.8.0/darwin/x64/include/QtGui',
'deps/qt-4.8.0/darwin/x64/include/QtNetwork',
'deps/qt-4.8.0/darwin/x64/include/QtWebkit'
'/usr/local/include',
'/usr/local/include/QtCore',
'/usr/local/include/QtGui',
'/usr/local/include/QtNetwork',
'/usr/local/include/QtWebkit'
],
'libraries': [
'-F/usr/local/lib',
'-F//System/Library/Frameworks',
'-framework AppKit',
'../deps/qt-4.8.0/darwin/x64/lib/libQtGui.a',
'../deps/qt-4.8.0/darwin/x64/lib/libQtCore.a',
'../deps/qt-4.8.0/darwin/x64/lib/libQtNetwork.a',
'../deps/qt-4.8.0/darwin/x64/lib/libQtWebKit.a',
'../deps/qt-4.8.0/darwin/x64/lib/libjscore.a',
'../deps/qt-4.8.0/darwin/x64/lib/libwebcore.a',
'../deps/qt-4.8.0/darwin/x64/lib/libQtXml.a'
'-framework QtCore',
'-framework QtGui',
'-framework QtNetwork',
'-framework QtWebkit'

],
}]
}],
['OS=="linux"', {
'include_dirs': [
'qt_compiled/include',
'qt_compiled/include/QtCore',
'qt_compiled/include/QtGui',
'qt_compiled/include/QtNetwork',
'qt_compiled/include/QtWebKit'
],
'libraries': [
'../qt_compiled/lib/libQtCore.a',
'../qt_compiled/lib/libQtGui.a',
'../qt_compiled/lib/libQtXml.a',
'../qt_compiled/lib/libQtNetwork.a',
'../qt_compiled/lib/libQtWebKit.a',
'../qt_compiled/lib/libwebcore.a',
'../qt_compiled/lib/libjscore.a'
],
}]
]
}
]
Expand Down
Loading