diff --git a/.gitignore b/.gitignore index 70c22ad0..6bfbc72a 100644 --- a/.gitignore +++ b/.gitignore @@ -2,3 +2,9 @@ projectFilesBackup extension.zip +/.vs/web-scraper-chrome-extension/v15/.suo +/.vs/web-scraper-chrome-extension/v15 +/.vs/VSWorkspaceState.json +/.vs/slnx.sqlite +/.vs/ProjectSettings.json +/.vs/config/applicationhost.config diff --git a/README.md b/README.md index 5a64886a..8d85f6c5 100644 --- a/README.md +++ b/README.md @@ -5,35 +5,28 @@ should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data. Scraped data later can be exported as CSV. -Install the extension from [Chrome store] [chrome-store] - -### Features - - 1. Scrape multiple pages - 2. Sitemaps and scraped data are stored in browsers local storage or in CouchDB - 3. Multiple data selection types - 4. Extract data from dynamic pages (JavaScript+AJAX) - 5. Browse scraped data - 6. Export scraped data as CSV - 7. Import, Export sitemaps - 8. Depends only on Chrome browser - -### Help - - Documentation and tutorials are available on [webscraper.io] [webscraper.io] - - Ask for help, submit bugs, suggest features on [google groups] [google-groups] +#### Latest Version +To run the latest version you need to [download the project][latest-releases] to your system and [follow the description on Google][get-started-chrome]) (select the `extension` folder). - Submit bugs and suggest features on [bug tracker] [github-issues] - -#### Bugs -When submitting a bug please attach an exported sitemap if possible. - -## License -LGPLv3 - ## Changelog +### v0.3 + * Enabled pasting of multible start URLs (by [@jwillmer](https://github.com/jwillmer)) + * Added scraping of dynamic table columns (by [@jwillmer](https://github.com/jwillmer)) + * Added style extraction type (by [@jwillmer](https://github.com/jwillmer)) + * Added text manipulation (trim, replace, prefix, suffix, remove HTML) (by [@jwillmer](https://github.com/jwillmer)) + * Added image improvements to find images in div background (by [@jwillmer](https://github.com/jwillmer)) + * Added support for vertical tables (by [@jwillmer](https://github.com/jwillmer)) + * Added random delay function between requests (by [@Euphorbium](https://github.com/Euphorbium)) + * Start URL can now also be a local URL (by [@3flex](https://github.com/3flex)) + * Added CSV export options (by [@mohamnag](https://github.com/mohamnag)) + * Added Regex group for select (by [@RuneHL](https://github.com/RuneHL)) + * JSON export/import of settings (by [@haisi](https://github.com/haisi)) + * Added date and number pattern in URL (by [@codoff](https://github.com/codoff)) + * Added pagination selector limit (by [@codoff](https://github.com/codoff)) + * Improved CSV export (by [@haisi](https://github.com/haisi)) + * Added click limit option (by [@panna-ahmed](https://github.com/panna-ahmed)) + ### v0.2 * Added Element click selector * Added Element scroll down selector @@ -55,7 +48,19 @@ LGPLv3 * Added ranged start urls * Fixed bug which made selector tree not to show on some operating systems +#### Bugs +When submitting a bug please attach an exported sitemap if possible. + +#### Development +Read the [Development Instructions](/docs/Development.md) before you start. + +## License +LGPLv3 + [chrome-store]: https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn [webscraper.io]: http://webscraper.io/ [google-groups]: https://groups.google.com/forum/#!forum/web-scraper [github-issues]: https://github.com/martinsbalodis/web-scraper-chrome-extension/issues + [get-started-chrome]: https://developer.chrome.com/extensions/getstarted#unpacked + [issue-14]: https://github.com/jwillmer/web-scraper-chrome-extension/issues/14 + [latest-releases]: https://github.com/jwillmer/web-scraper-chrome-extension/releases diff --git a/docs/Development.md b/docs/Development.md new file mode 100644 index 00000000..7eb9914f --- /dev/null +++ b/docs/Development.md @@ -0,0 +1,56 @@ +# Development Instructions + +## Selector Development + +This section demonstrates all steps that are needed in order to create or extend a selector for the web scraper. In this example we are creating a "Select All" selector. + +### Create Selector Logic +You can skip the file creation steps if you intend to extend other selectors with functionallity. + +- Duplicate the file `SelectorElementStyle.js` in `scripts/Selector/` +- Rename the duplicated file to `SelectorAll.js` +- Modify the `getData` method to return all content +- Specify which features you like to have enabled in the `getFeatures` function +- Implement the logic for the enabled features (Feature `textmanipulation` will work out of the box) + +### Create Selector Controls + +- Add a section into the `SelectorEdit.html` file in `devtools/views/` +- Add section class `form-group feature feature-AllSelector` +- You can use `{{#selectorName}}` and `{{/selectorName}}` to prevent content from displaying (used for checkobx controls) +- Use `{{selector.selectorAll}}` to define a variable + + +### Set references to your selector + +#### Controler + +- Open the `Controler.js` in `scripts/` +- Add a variable in the function `getCurrentlyEditedSelector` to select your HTML section value +- Add the variable to the `newSelector` object (every selector in `scripts/Selector/` that references this feature can access the value) +- Add validation rules to your variable in the function `initSelectorValidation` + + +#### File reference + +- Add a reference in `extension/manifest.json` in the section `content_scripts` and `scripts` +- Add a reference to `extension\devtools\devtools_scraper_panel.html` +- Add a eference to `playgrounds\extension\index.html` +- Add a reference to `tests\SpecRunner.html` + + +### Testing + +For testing you need to run a web server. Personally I use [Web Server for Chrome](https://chrome.google.com/webstore/detail/web-server-for-chrome/ofhbbkphhbklhfoeikjpcbhemlocgigb) and reference the working directory of the project. + +- Duplicate a test file in `tests/Selector` and rename it +- Write your tests for your selector +- Run the tests by opening `tests/SpecRunner.html` +- Try you implementation by opening `playgrounds/extension/index.html` +- Extend the playground if it does not cover your scenario + +### Documentation + +- Create a `md` file in `docs/selectors` +- Describe the usage, options, etc + diff --git a/docs/Selectors/Element attribute selector.md b/docs/Selectors/Element attribute selector.md index f7a04341..3eb4b060 100644 --- a/docs/Selectors/Element attribute selector.md +++ b/docs/Selectors/Element attribute selector.md @@ -8,6 +8,11 @@ this link: `link`. * multiple - multiple records are being extracted. * attribute name - the attribute that is going to be extracted. For example `title`, `data-id`. + * remove HTML + * trim text + * replace text - regular expression in the replace field possible + * text prefix/suffix + * delay - delay the extraction ## Use cases See [Text selector] [text-selector] use cases. diff --git a/docs/Selectors/Element click selector.md b/docs/Selectors/Element click selector.md index c46be9fd..04daa4e1 100644 --- a/docs/Selectors/Element click selector.md +++ b/docs/Selectors/Element click selector.md @@ -18,6 +18,7 @@ events triggered by the button. be clicked to load more elements. * click type - type of how the selector knows when there will be no new elements and clicking should stop. + * pagination limit - the number of clicks you want the selector to perform. * click element uniqueness - type of how selector knows which buttons are already clicked. * multiple - multiple records are being extracted (almost always should be diff --git a/docs/Selectors/Element scroll down selector.md b/docs/Selectors/Element scroll down selector.md index ef702aee..1bb62b42 100644 --- a/docs/Selectors/Element scroll down selector.md +++ b/docs/Selectors/Element scroll down selector.md @@ -16,6 +16,7 @@ infinitely then this selector will be stuck in an infinite loop. should usually be specified because the data won't be loaded immediately from the server after scrolling down. More than 2000 ms might be a good choice if you you don't want to loose data because the server didn't respond fast enough. + * pagination limit - the number of clicks you want the selector to perform. ## Use cases See [Element selector] [element-selector] use cases. diff --git a/docs/Selectors/Element selector.md b/docs/Selectors/Element selector.md index 07f74ed9..092bcfc2 100644 --- a/docs/Selectors/Element selector.md +++ b/docs/Selectors/Element selector.md @@ -17,6 +17,7 @@ on a button then you should try these selectors: be used as parent elements for child selectors. * multiple - multiple records are being extracted (almost always should be checked). Multiple option for child selectors usually should not be checked. + * delay - delay the extraction ## Use cases diff --git a/docs/Selectors/Element style selector.md b/docs/Selectors/Element style selector.md new file mode 100644 index 00000000..fc178161 --- /dev/null +++ b/docs/Selectors/Element style selector.md @@ -0,0 +1,21 @@ +# Element style selector +Element style selector can extract an style value of an HTML element. +For example you could use this selector to extract the with attribute from +this div: `
`. + +## Configuration options + * selector - [CSS selector] [css-selector] for the element. + * multiple - multiple records are being extracted. + * style name - the attribute that is going to be extracted. For example + `width`, `background-image`. + * remove HTML + * trim text + * replace text - regular expression in the replace field possible + * text prefix/suffix + * delay - delay the extraction + +## Use cases +See [Text selector] [text-selector] use cases. + + [text-selector]: Text%20selector.md + [css-selector]: ../CSS%20selector.md \ No newline at end of file diff --git a/docs/Selectors/Grouped selector.md b/docs/Selectors/Grouped selector.md index 9683c11f..1573b73e 100644 --- a/docs/Selectors/Grouped selector.md +++ b/docs/Selectors/Grouped selector.md @@ -9,6 +9,11 @@ The extracted data will be stored as JSON. * attribute name - optionally this selector can extract an attribute of the selected element. If specified the extractor will also add this attribute to the resulting JSON. + * remove HTML + * trim text + * replace text - regular expression in the replace field possible + * text prefix/suffix + * delay - delay the extraction ## Use cases diff --git a/docs/Selectors/HTML selector.md b/docs/Selectors/HTML selector.md index b4ac9839..749b9b0d 100644 --- a/docs/Selectors/HTML selector.md +++ b/docs/Selectors/HTML selector.md @@ -6,6 +6,11 @@ inner HTML of the element will be extracted. * selector - [CSS selector] [css-selector] for the element whose inner HTML will be extracted. * multiple - multiple records are being extracted. + * remove HTML + * trim text + * replace text - regular expression in the replace field possible + * text prefix/suffix + * delay - delay the extraction ## Use cases See [Text selector] [text-selector] use cases. diff --git a/docs/Selectors/Image selector.md b/docs/Selectors/Image selector.md index 953b6843..a65a076d 100644 --- a/docs/Selectors/Image selector.md +++ b/docs/Selectors/Image selector.md @@ -15,6 +15,7 @@ report it as a bug. checked for Image selector. * download image - downloads and store images on local drive. When CouchDB storage back end is used the image is also stored locally. + * delay - delay the extraction ## Use cases See [Text selector] [text-selector] use cases. diff --git a/docs/Selectors/Link selector.md b/docs/Selectors/Link selector.md index d5366c51..d1ab32c4 100644 --- a/docs/Selectors/Link selector.md +++ b/docs/Selectors/Link selector.md @@ -23,6 +23,7 @@ link selector is not working for you then you can try these workarounds: * selector - [CSS selector] [css-selector] for the link element from which the link for navigation will be extracted. * multiple - multiple records are being extracted. Usually should be checked. + * delay - delay the extraction ## Use cases diff --git a/docs/Selectors/Table selector.md b/docs/Selectors/Table selector.md index f796f6d5..bfb8eb17 100644 --- a/docs/Selectors/Table selector.md +++ b/docs/Selectors/Table selector.md @@ -17,6 +17,7 @@ shows what you should select when extracting data from a table. * data rows selector - [CSS selector] [css-selector] for table data rows. * multiple - multiple records are being extracted. Usually should be checked for Table selector because you are extracting multiple rows. + * delay - delay the extraction ## Use cases See [Text selector] [text-selector] use cases. diff --git a/docs/Selectors/Text selector.md b/docs/Selectors/Text selector.md index b3071e0c..4b62d1cc 100644 --- a/docs/Selectors/Text selector.md +++ b/docs/Selectors/Text selector.md @@ -16,6 +16,11 @@ resulting data. multiple checked then you might actually need [Element selector] [element-selector]. * regex - regular expression to extract a substring from the result. + * remove HTML + * trim text + * replace text - regular expression in the replace field possible + * text prefix/suffix + * delay - delay the extraction ### Regex diff --git a/extension/assets/papaparse.min.js b/extension/assets/papaparse.min.js new file mode 100644 index 00000000..102c82a2 --- /dev/null +++ b/extension/assets/papaparse.min.js @@ -0,0 +1,6 @@ +/*! + Papa Parse + v4.1.0 + https://github.com/mholt/PapaParse +*/ +!(function(e){"use strict";function u(t,n){n=n||{};if(n.worker&&Papa.WORKERS_SUPPORTED){var r=m();r.userStep=n.step;r.userChunk=n.chunk;r.userComplete=n.complete;r.userError=n.error;n.step=x(n.step);n.chunk=x(n.chunk);n.complete=x(n.complete);n.error=x(n.error);delete n.worker;r.postMessage({input:t,config:n,workerId:r.id});return}var i=null;if(typeof t==="string"){if(n.download)i=new l(n);else i=new h(n)}else if(e.File&&t instanceof File||t instanceof Object)i=new c(n);return i.stream(t)}function a(t,n){function a(){if(typeof n!=="object")return;if(typeof n.delimiter==="string"&&n.delimiter.length==1&&e.Papa.BAD_DELIMITERS.indexOf(n.delimiter)==-1){o=n.delimiter}if(typeof n.quotes==="boolean"||n.quotes instanceof Array)s=n.quotes;if(typeof n.newline==="string")u=n.newline}function f(e){if(typeof e!=="object")return[];var t=[];for(var n in e)t.push(n);return t}function l(e,t){var n="";if(typeof e==="string")e=JSON.parse(e);if(typeof t==="string")t=JSON.parse(t);var r=e instanceof Array&&e.length>0;var i=!(t[0]instanceof Array);if(r){for(var s=0;s0)n+=o;n+=c(e[s],s)}if(t.length>0)n+=u}for(var a=0;a0)n+=o;var h=r&&i?e[l]:l;n+=c(t[a][h],l)}if(a-1||t.charAt(0)==" "||t.charAt(t.length-1)==" ";return r?'"'+t+'"':t}function h(e,t){for(var n=0;n-1)return true;return false}var r="";var i=[];var s=false;var o=",";var u="\r\n";a();if(typeof t==="string")t=JSON.parse(t);if(t instanceof Array){if(!t.length||t[0]instanceof Array)return l(null,t);else if(typeof t[0]==="object")return l(f(t[0]),t)}else if(typeof t==="object"){if(typeof t.data==="string")t.data=JSON.parse(t.data);if(t.data instanceof Array){if(!t.fields)t.fields=t.data[0]instanceof Array?t.fields:f(t.data[0]);if(!(t.data[0]instanceof Array)&&typeof t.data[0]!=="object")t.data=[t.data]}return l(t.fields||[],t.data||[])}throw"exception: Unable to serialize unrecognized input"}function f(n){function r(e){var t=E(e);t.chunkSize=parseInt(t.chunkSize);this._handle=new p(t);this._handle.streamer=this;this._config=t}this._handle=null;this._paused=false;this._finished=false;this._input=null;this._baseIndex=0;this._partialLine="";this._rowCount=0;this._start=0;this._nextChunk=null;r.call(this,n);this.parseChunk=function(n){var r=this._partialLine+n;this._partialLine="";var i=this._handle.parse(r,this._baseIndex,!this._finished);if(this._handle.paused())return;var s=i.meta.cursor;if(!this._finished){this._partialLine=r.substring(s-this._baseIndex);this._baseIndex=s}if(i&&i.data)this._rowCount+=i.data.length;var o=this._finished||this._config.preview&&this._rowCount>=this._config.preview;if(t){e.postMessage({results:i,workerId:Papa.WORKER_ID,finished:o})}else if(x(this._config.chunk)){this._config.chunk(i,this._handle);if(this._paused)return;i=undefined}if(o&&x(this._config.complete)&&(!i||!i.meta.aborted))this._config.complete(i);if(!o&&(!i||!i.meta.paused))this._nextChunk();return i};this._sendError=function(n){if(x(this._config.error))this._config.error(n);else if(t&&this._config.error){e.postMessage({workerId:Papa.WORKER_ID,error:n,finished:false})}}}function l(e){function r(e){var t=e.getResponseHeader("Content-Range");return parseInt(t.substr(t.lastIndexOf("/")+1))}e=e||{};if(!e.chunkSize)e.chunkSize=Papa.RemoteChunkSize;f.call(this,e);var n;if(t){this._nextChunk=function(){this._readChunk();this._chunkLoaded()}}else{this._nextChunk=function(){this._readChunk()}}this.stream=function(e){this._input=e;this._nextChunk()};this._readChunk=function(){if(this._finished){this._chunkLoaded();return}n=new XMLHttpRequest;if(!t){n.onload=S(this._chunkLoaded,this);n.onerror=S(this._chunkError,this)}n.open("GET",this._input,!t);if(this._config.step||this._config.chunk){var e=this._start+this._config.chunkSize-1;n.setRequestHeader("Range","bytes="+this._start+"-"+e);n.setRequestHeader("If-None-Match","webkit-no-cache")}try{n.send()}catch(r){this._chunkError(r.message)}if(t&&n.status==0)this._chunkError();else this._start+=this._config.chunkSize};this._chunkLoaded=function(){if(n.readyState!=4)return;if(n.status<200||n.status>=400){this._chunkError();return}this._finished=!this._config.step&&!this._config.chunk||this._start>r(n);this.parseChunk(n.responseText)};this._chunkError=function(e){var t=n.statusText||e;this._sendError(t)}}function c(e){e=e||{};if(!e.chunkSize)e.chunkSize=Papa.LocalChunkSize;f.call(this,e);var t,n;var r=typeof FileReader!=="undefined";this.stream=function(e){this._input=e;n=e.slice||e.webkitSlice||e.mozSlice;if(r){t=new FileReader;t.onload=S(this._chunkLoaded,this);t.onerror=S(this._chunkError,this)}else t=new FileReaderSync;this._nextChunk()};this._nextChunk=function(){if(!this._finished&&(!this._config.preview||this._rowCount=this._input.size;this.parseChunk(e.target.result)};this._chunkError=function(){this._sendError(t.error)}}function h(e){e=e||{};f.call(this,e);var t;var n;this.stream=function(e){t=e;n=e;return this._nextChunk()};this._nextChunk=function(){if(this._finished)return;var e=this._config.chunkSize;var t=e?n.substr(0,e):n;n=e?n.substr(e):"";this._finished=!n;return this.parseChunk(t)}}function p(e){function c(){if(f&&u){b("Delimiter","UndetectableDelimiter","Unable to auto-detect delimiting character; defaulted to '"+Papa.DefaultDelimiter+"'");u=false}if(e.skipEmptyLines){for(var t=0;t=a.length){if(!n["__parsed_extra"])n["__parsed_extra"]=[];n["__parsed_extra"].push(f.data[t][r])}else n[a[r]]=f.data[t][r]}}if(e.header){f.data[t]=n;if(r>a.length)b("FieldMismatch","TooManyFields","Too many fields: expected "+a.length+" fields but parsed "+r,t);else if(r1){a+=Math.abs(h-s);s=h}}f/=l.data.length;if((typeof i==="undefined"||a1.99){i=a;r=u}}e.delimiter=r;return{successful:!!r,bestDelimiter:r}}function g(e){e=e.substr(0,1024*1024);var t=e.split("\r");if(t.length==1)return"\n";var n=0;for(var r=0;r=t.length/2?"\r\n":"\r"}function y(e){var n=t.test(e);return n?parseFloat(e):e}function b(e,t,n,r){f.errors.push({type:e,code:t,message:n,row:r})}var t=/^\s*-?(\d*\.?\d+|\d+\.?\d*)(e[-+]?\d+)?\s*$/i;var n=this;var r=0;var i;var s;var o=false;var u;var a=[];var f={data:[],errors:[],meta:{}};if(x(e.step)){var l=e.step;e.step=function(t){f=t;if(h())c();else{c();if(f.data.length==0)return;r+=t.data.length;if(e.preview&&r>e.preview)s.abort();else l(f,n)}}}this.parse=function(t,n,r){if(!e.newline)e.newline=g(t);u=false;if(!e.delimiter){var a=m(t);if(a.successful)e.delimiter=a.bestDelimiter;else{u=true;e.delimiter=Papa.DefaultDelimiter}f.meta.delimiter=e.delimiter}var l=E(e);if(e.preview&&e.header)l.preview++;i=t;s=new d(l);f=s.parse(i,n,r);c();return o?{meta:{paused:true}}:f||{meta:{paused:false}}};this.paused=function(){return o};this.pause=function(){o=true;s.abort();i=i.substr(s.getCharIndex())};this.resume=function(){o=false;n.streamer.parseChunk(i)};this.abort=function(){s.abort();if(x(e.complete))e.complete(f);i=""}}function d(e){e=e||{};var t=e.delimiter;var n=e.newline;var r=e.comments;var i=e.step;var s=e.preview;var o=e.fastMode;if(typeof t!=="string"||t.length!=1||Papa.BAD_DELIMITERS.indexOf(t)>-1)t=",";if(r===t)throw"Comment character same as delimiter";else if(r===true)r="#";else if(typeof r!=="string"||Papa.BAD_DELIMITERS.indexOf(r)>-1)r=false;if(n!="\n"&&n!="\r"&&n!="\r\n")n="\n";var u=0;var a=false;this.parse=function(e,f,l){function C(e){m.push(e);b=u}function k(t){if(l)return A();if(!t)t=e.substr(u);y.push(t);u=c;C(y);if(v)O();return A()}function L(t){u=t;C(y);y=[];x=e.indexOf(n,u)}function A(e){return{data:m,errors:g,meta:{delimiter:t,linebreak:n,aborted:a,truncated:!!e,cursor:b+(f||0)}}}function O(){i(A());m=[],g=[]}if(typeof e!=="string")throw"Input must be a string";var c=e.length,h=t.length,p=n.length,d=r.length;var v=typeof i==="function";u=0;var m=[],g=[],y=[],b=0;if(!e)return A();if(o||o!==false&&e.indexOf('"')===-1){var w=e.split(n);for(var E=0;E=s){m=m.slice(0,s);return A(true)}}return A()}var S=e.indexOf(t,u);var x=e.indexOf(n,u);for(;;){if(e[u]=='"'){var T=u;u++;for(;;){var T=e.indexOf('"',T+1);if(T===-1){if(!l){g.push({type:"Quotes",code:"MissingQuotes",message:"Quoted field unterminated",row:m.length,index:u})}return k()}if(T===c-1){var N=e.substring(u,T).replace(/""/g,'"');return k(N)}if(e[T+1]=='"'){T++;continue}if(e[T+1]==t){y.push(e.substring(u,T).replace(/""/g,'"'));u=T+1+h;S=e.indexOf(t,u);x=e.indexOf(n,u);break}if(e.substr(T+1,p)===n){y.push(e.substring(u,T).replace(/""/g,'"'));L(T+1+p);S=e.indexOf(t,u);if(v){O();if(a)return A()}if(s&&m.length>=s)return A(true);break}}continue}if(r&&y.length===0&&e.substr(u,d)===r){if(x==-1)return A();u=x+p;x=e.indexOf(n,u);S=e.indexOf(t,u);continue}if(S!==-1&&(S=s)return A(true);continue}break}return k()};this.abort=function(){a=true};this.getCharIndex=function(){return u}}function v(){var e=document.getElementsByTagName("script");return e.length?e[e.length-1].src:""}function m(){if(!Papa.WORKERS_SUPPORTED)return false;if(!n&&Papa.SCRIPT_PATH===null)throw new Error("Script path cannot be determined automatically when Papa Parse is loaded asynchronously. "+"You need to set Papa.SCRIPT_PATH manually.");var t=new e.Worker(Papa.SCRIPT_PATH||r);t.onmessage=g;t.id=s++;i[t.id]=t;return t}function g(e){var t=e.data;var n=i[t.workerId];var r=false;if(t.error)n.userError(t.error,t.file);else if(t.results&&t.results.data){var s=function(){r=true;y(t.workerId,{data:[],errors:[],meta:{aborted:true}})};var o={abort:s,pause:b,resume:b};if(x(n.userStep)){for(var u=0;u - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + - \ No newline at end of file + diff --git a/extension/devtools/views/SelectorEdit.html b/extension/devtools/views/SelectorEdit.html index 04928fc9..1acfe3aa 100644 --- a/extension/devtools/views/SelectorEdit.html +++ b/extension/devtools/views/SelectorEdit.html @@ -1,210 +1,283 @@
-
- - -
- -
-
- -
- - -
- -
-
- -
- - -
-
- - - - - - -
-
-
- -
- - -
-
- - - - - -
-
-
- - -
- -
-
- - - - - -
-
-
- - -
- -
-
- - - - - -
-
-
- - -
- - -
- -
-
- - -
- -
- -
-
- -
-
-
- -
-
-
- - -
-
-
- -
-
-
- - -
-
-
- -
-
-
- -
- - -
- -
-
- -
- - -
- -
-
- -
- - -
- -
-
- -
- - -
- -
-
- -
- - -
- - - - - - - - - - {{#selector.columns}} - - - - - - {{/selector.columns}} - -
ColumnResult keyInclude into result
{{header}}
-
-
- -
-
- -
-
- -
-
- -
-
+
+ + +
+ +
+
+ +
+ + +
+ +
+
+ +
+ + +
+
+ + + + + + +
+
+
+ +
+ + +
+
+ + + + + +
+
+
+ + +
+ +
+
+ + + + + + +
+
+
+ + +
+ +
+
+ + + + + +
+
+
+ + +
+ +
+
+ +
+
+
+ + +
+ + +
+ +
+
+ + +
+ + +
+ +
+
+ + +
+ +
+ +
+
+ +
+ +
+
+ +
+
+
+ + +
+ +
+
+ +
+
+
+ + +
+ +
+
+ +
+
+
+ + +
+ + +
+ +
+
+ + +
+ + +
+ +
+
+ + +
+ + +
+ +
+
+ +
+ +
+
+ + + Trim will be applied before regex. +
+
+ +
+
+ +
+
+ + +
+
+ + +
+
+
+ +
+ + +
+ +
+
+ +
+ + +
+
+ + +
+
+
+ +
+ + +
+ +
+
+ +
+ + +
+ + + + + + + + + + {{#selector.columns}} + + + + + + {{/selector.columns}} + +
ColumnResult keyInclude into result
{{header}}
+
+
+ +
+
+ +
+
+ +
+
+ +
+
\ No newline at end of file diff --git a/extension/devtools/views/SitemapCreate.html b/extension/devtools/views/SitemapCreate.html index 5a45cbf5..4f30c7ac 100644 --- a/extension/devtools/views/SitemapCreate.html +++ b/extension/devtools/views/SitemapCreate.html @@ -7,14 +7,10 @@
- +
- - - - - +
diff --git a/extension/devtools/views/SitemapEditMetadata.html b/extension/devtools/views/SitemapEditMetadata.html index 544df6dd..71d89a14 100644 --- a/extension/devtools/views/SitemapEditMetadata.html +++ b/extension/devtools/views/SitemapEditMetadata.html @@ -7,36 +7,71 @@ - {{#startUrl.push}} - {{#startUrl}} -
- -
-
- - - - - -
-
-
- {{/startUrl}} - {{/startUrl.push}} - {{^startUrl.push}} -
- -
-
- - - - - -
-
-
- {{/startUrl.push}} +
+ +
+
+ +
+
+
+ +
+ Supported URL patterns:
+ 1. Numeric with optional step and zero padding – [START-END:STEP] – [001-010:10]
+ 2. Date interval – [date<PATTERN><START><END>] – [date<dd.MM.yyyy><01.01.2017><now>]
+ +
+
diff --git a/extension/devtools/views/SitemapExportDataCSV.html b/extension/devtools/views/SitemapExportDataCSV.html index 4cfc595a..2496c4fa 100644 --- a/extension/devtools/views/SitemapExportDataCSV.html +++ b/extension/devtools/views/SitemapExportDataCSV.html @@ -1,4 +1,24 @@ -

- Export {{_id}} data as CSV.
Waiting for the download button to appear. > -
Download now! -

\ No newline at end of file +
Console
+ +

Export {{_id}} data as CSV

+
+ + + +
+ + + +
+ + + +
+ + +
+
+ +

+ Waiting for process to finish. > Download now! +

diff --git a/extension/devtools/views/SitemapListItem.html b/extension/devtools/views/SitemapListItem.html index cdbe1a47..de564654 100644 --- a/extension/devtools/views/SitemapListItem.html +++ b/extension/devtools/views/SitemapListItem.html @@ -1,14 +1,9 @@ {{_id}} - {{#startUrl.push}} - {{#startUrl}} + {{#startUrls}} {{.}}, - {{/startUrl}} - {{/startUrl.push}} - {{^startUrl.push}} - {{startUrl}} - {{/startUrl.push}} + {{/startUrls}} diff --git a/extension/devtools/views/SitemapScrapeConfig.html b/extension/devtools/views/SitemapScrapeConfig.html index 3ea00f90..f04bd4b0 100644 --- a/extension/devtools/views/SitemapScrapeConfig.html +++ b/extension/devtools/views/SitemapScrapeConfig.html @@ -6,6 +6,12 @@
+
+ +
+ +
+
@@ -23,4 +29,4 @@
- \ No newline at end of file + diff --git a/extension/devtools/views/SitemapStartUrlField.html b/extension/devtools/views/SitemapStartUrlField.html deleted file mode 100644 index fe0146d0..00000000 --- a/extension/devtools/views/SitemapStartUrlField.html +++ /dev/null @@ -1,13 +0,0 @@ -
- - -
-
- - - - - -
-
-
\ No newline at end of file diff --git a/extension/devtools/views/Viewport.html b/extension/devtools/views/Viewport.html index 2085cb39..23fe9da4 100644 --- a/extension/devtools/views/Viewport.html +++ b/extension/devtools/views/Viewport.html @@ -1,34 +1,32 @@ -