Skip to content

Commit 3589dc8

Browse files
committed
Fix experience and education bug
1 parent f3c7f9c commit 3589dc8

File tree

4 files changed

+247
-222
lines changed

4 files changed

+247
-222
lines changed

README.rst

Lines changed: 92 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,57 @@ Linkedin Scraper
77

88
Scrapes Linkedin User Data
99

10+
`Linkedin Scraper <#linkedin-scraper>`_
11+
12+
13+
* `Installation <#installation>`_
14+
* `Setup <#setup>`_
15+
* `Usage <#usage>`_
16+
17+
* `Sample Usage <#sample-usage>`_
18+
* `User Scraping <#user-scraping>`_
19+
* `Company Scraping <#company-scraping>`_
20+
* `Job Scraping <#job-scraping>`_
21+
* `Job Search Scraping <#job-search-scraping>`_
22+
* `Scraping sites where login is required first <#scraping-sites-where-login-is-required-first>`_
23+
* `Scraping sites and login automatically <#scraping-sites-and-login-automatically>`_
24+
25+
* `API <#api>`_
26+
27+
* `Person <#person>`_
28+
29+
* `\ ``linkedin_url`` <#linkedin_url>`_
30+
* `\ ``name`` <#name>`_
31+
* `\ ``about`` <#about>`_
32+
* `\ ``experiences`` <#experiences>`_
33+
* `\ ``educations`` <#educations>`_
34+
* `\ ``interests`` <#interests>`_
35+
* `\ ``accomplishment`` <#accomplishment>`_
36+
* `\ ``company`` <#company>`_
37+
* `\ ``job_title`` <#job_title>`_
38+
* `\ ``driver`` <#driver>`_
39+
* `\ ``scrape`` <#scrape>`_
40+
* `\ ``scrape(close_on_complete=True)`` <#scrapeclose_on_completetrue>`_
41+
42+
* `Company <#company>`_
43+
44+
* `\ ``linkedin_url`` <#linkedin_url-1>`_
45+
* `\ ``name`` <#name-1>`_
46+
* `\ ``about_us`` <#about_us>`_
47+
* `\ ``website`` <#website>`_
48+
* `\ ``headquarters`` <#headquarters>`_
49+
* `\ ``founded`` <#founded>`_
50+
* `\ ``company_type`` <#company_type>`_
51+
* `\ ``company_size`` <#company_size>`_
52+
* `\ ``specialties`` <#specialties>`_
53+
* `\ ``showcase_pages`` <#showcase_pages>`_
54+
* `\ ``affiliated_companies`` <#affiliated_companies>`_
55+
* `\ ``driver`` <#driver-1>`_
56+
* `\ ``get_employees`` <#get_employees>`_
57+
* `\ ``scrape(close_on_complete=True)`` <#scrapeclose_on_completetrue-1>`_
58+
59+
* `Contribution <#contribution>`_
60+
1061
Installation
1162
------------
1263

@@ -42,7 +93,7 @@ Sample Usage
4293
email = "some-email@email.address"
4394
password = "password123"
4495
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
45-
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)
96+
person = Person("https://www.linkedin.com/in/joey-sham-aa2a50122", driver=driver)
4697
4798
**NOTE**\ : The account used to log-in should have it's language set English to make sure everything works as expected.
4899

@@ -62,6 +113,42 @@ Company Scraping
62113
from linkedin_scraper import Company
63114
company = Company("https://ca.linkedin.com/company/google")
64115
116+
Job Scraping
117+
^^^^^^^^^^^^
118+
119+
.. code-block:: python
120+
121+
from linkedin_scraper import JobSearch, actions
122+
from selenium import webdriver
123+
124+
driver = webdriver.Chrome()
125+
email = "some-email@email.address"
126+
password = "password123"
127+
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
128+
input("Press Enter")
129+
job = Job("https://www.linkedin.com/jobs/collections/recommended/?currentJobId=3456898261", driver=driver, close_on_complete=False)
130+
131+
Job Search Scraping
132+
^^^^^^^^^^^^^^^^^^^
133+
134+
.. code-block:: python
135+
136+
from linkedin_scraper import JobSearch, actions
137+
from selenium import webdriver
138+
139+
driver = webdriver.Chrome()
140+
email = "some-email@email.address"
141+
password = "password123"
142+
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
143+
input("Press Enter")
144+
job_search = JobSearch(driver=driver, close_on_complete=False, scrape=False)
145+
# job_search contains jobs from your logged in front page:
146+
# - job_search.recommended_jobs
147+
# - job_search.still_hiring
148+
# - job_search.more_jobs
149+
150+
job_listings = job_search.search("Machine Learning Engineer") # returns the list of `Job` from the first page
151+
65152
Scraping sites where login is required first
66153
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
67154

@@ -156,12 +243,12 @@ This is the interests they have. A list of ``linkedin_scraper.scraper.Interest``
156243
This is the accomplishments they have. A list of ``linkedin_scraper.scraper.Accomplishment``
157244

158245
``company``
159-
^^^^^^^^^^^^^^^
246+
~~~~~~~~~~~~~~~
160247

161248
This the most recent company or institution they have worked at.
162249

163250
``job_title``
164-
^^^^^^^^^^^^^^^^^
251+
~~~~~~~~~~~~~~~~~
165252

166253
This the most recent job title they have.
167254

@@ -183,7 +270,7 @@ For example
183270
When this is **True**\ , the scraping happens automatically. To scrape afterwards, that can be run by the ``scrape()`` function from the ``Person`` object.
184271

185272
``scrape(close_on_complete=True)``
186-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
273+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
187274

188275
This is the meat of the code, where execution of this function scrapes the profile. If *close_on_complete* is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.
189276

@@ -267,7 +354,7 @@ For example
267354
company = Company("https://ca.linkedin.com/company/google", driver=driver)
268355
269356
``scrape(close_on_complete=True)``
270-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
357+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
271358

272359
This is the meat of the code, where execution of this function scrapes the company. If *close_on_complete* is True (which it is by default), then the browser will close upon completion. If scraping of other companies are desired, then you might want to set that to false so you can keep using the same driver.
273360

linkedin_scraper/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from .jobs import Job
66
from .job_search import JobSearch
77

8-
__version__ = "2.10.0"
8+
__version__ = "2.10.1"
99

1010
import glob
1111
modules = glob.glob(dirname(__file__)+"/*.py")

linkedin_scraper/objects.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from dataclasses import dataclass
2+
from time import sleep
23

34
from selenium.webdriver import Chrome
45

@@ -19,6 +20,7 @@ class Contact:
1920
@dataclass
2021
class Institution:
2122
institution_name: str = None
23+
linkedin_url: str = None
2224
website: str = None
2325
industry: str = None
2426
type: str = None
@@ -62,6 +64,10 @@ class Scraper:
6264
WAIT_FOR_ELEMENT_TIMEOUT = 5
6365
TOP_CARD = "pv-top-card"
6466

67+
@staticmethod
68+
def wait(duration):
69+
sleep(int(duration))
70+
6571
def focus(self):
6672
self.driver.execute_script('alert("Focus window")')
6773
self.driver.switch_to.alert.accept()

0 commit comments

Comments
 (0)