Skip to content

Commit fdf2e74

Browse files
committed
Update README.md
1 parent d8c2e74 commit fdf2e74

File tree

1 file changed

+2
-250
lines changed

1 file changed

+2
-250
lines changed

README.md

Lines changed: 2 additions & 250 deletions
Original file line numberDiff line numberDiff line change
@@ -39,253 +39,5 @@ Output :
3939
"comments": ""
4040
}
4141
```
42-
43-
44-
## Available properties and methods
45-
```python
46-
# You can use any of below properties and methods instead `a_tags_mp3`
47-
page.a_tags_mp3
48-
```
49-
<details>
50-
51-
<summary>Click to expand!</summary>
52-
53-
54-
#### <kbd>property</kbd> a_tag_hrefs
55-
56-
57-
58-
59-
60-
---
61-
62-
#### <kbd>property</kbd> a_tag_texts
63-
64-
65-
66-
67-
68-
---
69-
70-
#### <kbd>property</kbd> a_tags_mp3
71-
72-
73-
74-
75-
76-
---
77-
78-
#### <kbd>property</kbd> a_tags_rar
79-
80-
81-
82-
83-
84-
---
85-
86-
#### <kbd>property</kbd> a_tags_with_href
87-
88-
89-
90-
91-
92-
---
93-
94-
#### <kbd>property</kbd> article_tag
95-
96-
returns an article tag which has the most text length
97-
98-
---
99-
100-
#### <kbd>property</kbd> children
101-
102-
returns a list of `EzSoup` instances from `self.important_hrefs` ##### using `ThreadPoolExecutor` to crawl children much faster than normal `for` loop
103-
104-
---
105-
106-
#### <kbd>property</kbd> favicon_href
107-
108-
109-
110-
111-
112-
---
113-
114-
#### <kbd>property</kbd> important_a_tags
115-
116-
returns `a` tags that includes header (h2, h3) inside or `a` tags inside headers or elements with class `item` or `post` I call these important becuase they're most likely to be crawlable contentful webpages
117-
118-
---
119-
120-
#### <kbd>property</kbd> important_hrefs
121-
122-
123-
124-
125-
126-
---
127-
128-
#### <kbd>property</kbd> json_summary
129-
130-
131-
132-
133-
134-
---
135-
136-
#### <kbd>property</kbd> main_html
137-
138-
139-
140-
141-
142-
---
143-
144-
#### <kbd>property</kbd> main_image_src
145-
146-
147-
148-
149-
150-
---
151-
152-
#### <kbd>property</kbd> main_text
153-
154-
155-
156-
157-
158-
---
159-
160-
#### <kbd>property</kbd> meta_article_modified_time
161-
162-
163-
164-
165-
166-
---
167-
168-
#### <kbd>property</kbd> meta_article_published_time
169-
170-
171-
172-
173-
174-
---
175-
176-
#### <kbd>property</kbd> meta_description
177-
178-
179-
180-
181-
182-
---
183-
184-
#### <kbd>property</kbd> meta_image_src
185-
186-
187-
188-
189-
190-
---
191-
192-
#### <kbd>property</kbd> possible_topic_names
193-
194-
returns possible topic/breadcrump names of webpage ### values can be unreliable since they aren't generated with NLP methods yet .
195-
196-
---
197-
198-
#### <kbd>property</kbd> summary_dict
199-
200-
201-
202-
203-
204-
---
205-
206-
#### <kbd>property</kbd> text
207-
208-
209-
210-
211-
212-
---
213-
214-
#### <kbd>property</kbd> title
215-
216-
usually the `<h1>` tag content of a web page is cleaner than original page `<title>` text so if the h1 or h2 text is similar to the title it is better to return it instead of original title text
217-
218-
---
219-
220-
#### <kbd>property</kbd> title_tag_text
221-
222-
223-
224-
225-
226-
227-
228-
---
229-
230-
### <kbd>method</kbd> `from_url`
231-
232-
```python
233-
from_url(url: str)
234-
```
235-
236-
237-
238-
239-
240-
---
241-
242-
### <kbd>method</kbd> `get_important_children_soups`
243-
244-
```python
245-
get_important_children_soups(multithread: bool = True, limit: int = None)
246-
```
247-
248-
returns a list of `EzSoup` instances from `self.important_hrefs` ## Parameters :
249-
--- `multithread` : True by default , using `ThreadPoolExecutor` to crawl children much faster
250-
--- `limit`: limit children count that will be crawled
251-
252-
---
253-
254-
### <kbd>method</kbd> `save_content_summary_html`
255-
256-
```python
257-
save_content_summary_html(path: str = None)
258-
```
259-
260-
261-
262-
263-
264-
---
265-
266-
### <kbd>method</kbd> `save_content_summary_json`
267-
268-
```python
269-
save_content_summary_json(path: str = None)
270-
```
271-
272-
273-
274-
275-
276-
---
277-
278-
### <kbd>method</kbd> `save_content_summary_txt`
279-
280-
```python
281-
save_content_summary_txt(path: str = None)
282-
```
283-
284-
</details>
285-
286-
---
287-
288-
<sub>
289-
This README.md was automatically generated via https://github.com/ml-tooling/lazydocs
290-
</sub>
291-
42+
## Documentation
43+
Soon...

0 commit comments

Comments
 (0)