Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,23 @@
import com.linkedin.urls.detection.UrlDetector;
import com.linkedin.urls.detection.UrlDetectorOptions;

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.List;
import java.util.Optional;
import java.util.Set;
import java.util.concurrent.CompletableFuture;

/**
* Utility class to detect links.
*/
public class LinkDetection {
private static final HttpClient HTTP_CLIENT = HttpClient.newHttpClient();

private static final Set<LinkFilter> DEFAULT_FILTERS =
Set.of(LinkFilter.SUPPRESSED, LinkFilter.NON_HTTP_SCHEME);
Comment on lines +22 to +23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve name. what does this filter, what does it mean, what is it used for?


/**
* Possible ways to filter a link.
Expand Down Expand Up @@ -55,7 +64,117 @@ public static List<String> extractLinks(String content, Set<LinkFilter> filter)
* @return true if the content contains at least one link
*/
public static boolean containsLink(String content) {
return !(new UrlDetector(content, UrlDetectorOptions.BRACKET_MATCH).detect().isEmpty());
return !new UrlDetector(content, UrlDetectorOptions.BRACKET_MATCH).detect().isEmpty();
}

/**
* Checks whether the given URL is considered broken.
*
* <p>
* A link is considered broken if:
* <ul>
* <li>An HTTP request fails</li>
* <li>The HTTP response status code is outside the 200–399 range</li>
* </ul>
*
* <p>
* The method first performs an HTTP {@code HEAD} request and falls back to an HTTP {@code GET}
* request if the {@code HEAD} request indicates a failure.
Comment on lines +81 to +82
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation detail like that goes into a comment inside the method, not in the javadoc 👍

* </p>
*
* <p>
* Notes:
* <ul>
* <li>Status code {@code 200} is considered valid, even if the response body is empty</li>
* <li>The response body content is not inspected</li>
* </ul>
*
* @param url the URL to check (must be a valid {@link URI})
* @return a future completing with {@code true} if the link is broken
* @throws IllegalArgumentException if the given URL is not a valid URI
*/

public static CompletableFuture<Boolean> isLinkBroken(String url) {
HttpRequest headCheckRequest = HttpRequest.newBuilder(URI.create(url))
.method("HEAD", HttpRequest.BodyPublishers.noBody())
.build();

return HTTP_CLIENT.sendAsync(headCheckRequest, HttpResponse.BodyHandlers.discarding())
.thenApply(response -> {
int status = response.statusCode();
return status < 200 || status >= 400;
})
.exceptionally(ignored -> true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idiomatic name for something you ignore is _

.thenCompose(result -> {
if (!result) {
return CompletableFuture.completedFuture(false);
}
HttpRequest getFallbackRequest =
HttpRequest.newBuilder(URI.create(url)).GET().build();
return HTTP_CLIENT
.sendAsync(getFallbackRequest, HttpResponse.BodyHandlers.discarding())
.thenApply(resp -> resp.statusCode() >= 400)
.exceptionally(ignored -> true); // still never null
});
}

/**
* Replaces all broken links in the given text with the provided replacement string.
*
* <p>
* The link checks are performed asynchronously.
* </p>
*
* <p>
* Example:
*
* <pre>{@code
* replaceDeadLinks("""
* Test
* http://deadlink/1
* http://workinglink/1
* """, "broken")
* }</pre>
*
* <p>
* Results in:
*
* <pre>{@code
* Test
* broken
* http://workinglink/1
* }</pre>
*
* @param text the input text containing URLs (must not be {@code null})
* @param replacement the string to replace broken links with (must not be {@code null})
Comment on lines +148 to +149
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u dont need to write that as ur params are all (implicitly) annotated with @NonNull already :)

* @return a future containing the modified text
*/
public static CompletableFuture<String> replaceDeadLinks(String text, String replacement) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve naming. avoid multiple terms for the same think. you called it isLinkBroken and now you call it replaceDeadLinks. either dead or broken, not both. pick one and align the other

List<String> links = extractLinks(text, DEFAULT_FILTERS);

if (links.isEmpty()) {
return CompletableFuture.completedFuture(text);
}

List<CompletableFuture<Optional<String>>> deadLinkFutures = links.stream()
.distinct()
.map(link -> isLinkBroken(link)
.thenApply(isBroken -> isBroken ? Optional.of(link) : Optional.<String>empty()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u misunderstood me. its better if u simply filter out these, skip them here instead of later.
.filter(...) so they are not even part of the list

.toList();


return CompletableFuture.allOf(deadLinkFutures.toArray(CompletableFuture[]::new))
.thenApply(ignored -> deadLinkFutures.stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(_ instead of ignored)

.map(CompletableFuture::join)
.flatMap(Optional::stream)
.toList())
.thenApply(deadLinks -> {
String result = text;
for (String deadLink : deadLinks) {
result = result.replace(deadLink, replacement);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performance trap. have a quick check if StringBuilder provides replace. if not, u can keep it, its not that big of a deal in this case, i guess.

}
return result;
});
}

private static Optional<String> toLink(Url url, Set<LinkFilter> filter) {
Expand All @@ -76,7 +195,6 @@ private static Optional<String> toLink(Url url, Set<LinkFilter> filter) {
// Remove trailing punctuation
link = link.substring(0, link.length() - 1);
}

return Optional.of(link);
}

Expand Down
Loading