-
Notifications
You must be signed in to change notification settings - Fork 14
Feature/issue 2641 #2735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/1.2.3
Are you sure you want to change the base?
Feature/issue 2641 #2735
Conversation
WalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant WebParsingUtils
participant Database
participant OpenStreetMapAPI
participant FileSystem
User->>WebParsingUtils: ParseZipFileFromWebAsync()
WebParsingUtils->>FileSystem: Download and extract ZIP file
WebParsingUtils->>Database: Fetch existing toponyms
WebParsingUtils->>FileSystem: Write reconstructed CSV (data.csv)
WebParsingUtils->>FileSystem: Read and deduplicate extracted CSV (ignore postal code)
loop For each new row
WebParsingUtils->>OpenStreetMapAPI: FetchCoordsByAddressAsync(address)
alt If fetch fails
WebParsingUtils->>OpenStreetMapAPI: FetchCoordsByAddressAsync(shorter address)
end
WebParsingUtils->>FileSystem: Append row with coordinates to data.csv
end
WebParsingUtils->>FileSystem: Clean up temporary files (if flagged)
WebParsingUtils->>Database: SaveToponymsToDbAsync(data.csv)
Database-->>WebParsingUtils: Save completion status
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
Streetcode/Streetcode.WebApi/Utils/WebParsingUtils.cs (3)
250-275: Coordinate look-ups run sequentially – consider limited parallelism
FetchCoordsByAddressAsyncis awaited inside aforeach, so N calls are serialized.
With thousands of rows this can take hours and risks hitting Nominatim’s rate limits.A balanced approach:
- Batch with
Parallel.ForEachAsyncorTask.WhenAllbut cap concurrency to, say, 2–3 requests/second (respecting Nominatim’s usage policy).- Cache failures to avoid retrying the same address.
This change is optional but will drastically cut processing time.
377-384:GetDistinctRows– index handling is fragileThe postal-code removal relies on positional index 5 within the truncated array (
cols.Take(beforeColumn)), not the original one.
IfbeforeColumnis ever changed, the constant5may point to the wrong column.Pass the column index explicitly:
-private static List<string> GetDistinctRows(IEnumerable<string> rows, byte beforeColumn = 7) => +private static List<string> GetDistinctRows( + IEnumerable<string> rows, + byte beforeColumn = 7, + byte columnToIgnore = 5) => rows.Select(x => { var cols = x.Split(';'); - var filtered = cols.Take(beforeColumn).Where((_, idx) => idx != 5).ToList(); - filtered.Insert(5, string.Empty); + var filtered = cols.Take(beforeColumn) + .Where((_, idx) => idx != columnToIgnore) + .ToList(); + filtered.Insert(columnToIgnore, string.Empty); return string.Join(";", filtered); }) .Distinct() .ToList();
300-302: Full table truncation may cause data-loss race conditions
RemoveRange+SaveChangesAsyncdeletes all toponyms before new ones are inserted.
If the import fails halfway, the table remains empty.Wrap the whole operation in a transaction or use
SqlBulkCopyinto a temp table andALTER TABLE … SWITCH. At minimum:await using var tx = await _streetcodeContext.Database.BeginTransactionAsync(); // delete + insert await tx.CommitAsync();This guarantees atomicity.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Streetcode/Streetcode.DAL/houses.csvis excluded by!**/*.csv,!**/Streetcode/Streetcode.DAL/houses.csv
📒 Files selected for processing (1)
Streetcode/Streetcode.WebApi/Utils/WebParsingUtils.cs(1 hunks)
| var zipPath = $"houses.zip"; | ||
|
|
||
| // Change the path of extractTo to your path, smth like "/Users/admin/Documents/GitHub/StreetCode/Streetcode/Streetcode.DAL"; | ||
| var extractTo = $"/root/build/StreetCode/Streetcode/Streetcode.DAL"; | ||
|
|
||
| var cancellationToken = new CancellationTokenSource().Token; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hard-coded extractTo path reduces portability
extractTo is set to /root/build/StreetCode/Streetcode/Streetcode.DAL.
On any non-CI machine (local dev, staging, Windows runner, etc.) this path will be invalid, causing the ZIP extraction to fail.
Make the target directory configurable (e.g. via IConfiguration, env-var, or constructor parameter) and fall back to a temp folder when no explicit path is provided:
-var extractTo = $"/root/build/StreetCode/Streetcode/Streetcode.DAL";
+var extractTo = _configuration["ToponymExtractionPath"] // appsettings.json
+ ?? Path.Combine(Path.GetTempPath(), "StreetcodeToponyms");
+Directory.CreateDirectory(extractTo);Committable suggestion skipped: line range outside the PR's diff.
| clientHandler.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate; | ||
| clientHandler.ServerCertificateCustomValidationCallback = (_, _, _, _) => true; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling TLS certificate validation is a security risk
clientHandler.ServerCertificateCustomValidationCallback = (_, _, _, _) => true; accepts any certificate, exposing you to MITM attacks.
Unless you have a very good reason (e.g., an internal test CA), remove this override or restrict it to DEBUG builds only:
-clientHandler.ServerCertificateCustomValidationCallback = (_, _, _, _) => true;
+#if DEBUG
+clientHandler.ServerCertificateCustomValidationCallback = (_, _, _, _) => true;
+#endif| string excelPath = Directory.GetFiles(extractTo).First(fName => fName.EndsWith("houses.csv")); | ||
|
|
||
| var rows = new List<string>(await File.ReadAllLinesAsync(excelPath, Encoding.GetEncoding(1251))); | ||
|
|
||
| // Grouping all rows from initial csv in order to get rid of duplicated streets | ||
|
|
||
| var forParsingRows = GetDistinctRows(rows); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
First(…EndsWith("houses.csv")) can throw when the file is missing
If the ZIP’s content changes or extraction fails, First will raise InvalidOperationException.
Safer:
-var excelPath = Directory.GetFiles(extractTo)
- .First(fName => fName.EndsWith("houses.csv"));
+var excelPath = Directory.GetFiles(extractTo)
+ .FirstOrDefault(f => f.EndsWith("houses.csv"));
+if (excelPath is null)
+{
+ Console.WriteLine("houses.csv not found after extraction");
+ return;
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| string excelPath = Directory.GetFiles(extractTo).First(fName => fName.EndsWith("houses.csv")); | |
| var rows = new List<string>(await File.ReadAllLinesAsync(excelPath, Encoding.GetEncoding(1251))); | |
| // Grouping all rows from initial csv in order to get rid of duplicated streets | |
| var forParsingRows = GetDistinctRows(rows); | |
| var excelPath = Directory.GetFiles(extractTo) | |
| .FirstOrDefault(f => f.EndsWith("houses.csv")); | |
| if (excelPath is null) | |
| { | |
| Console.WriteLine("houses.csv not found after extraction"); | |
| return; | |
| } | |
| var rows = new List<string>(await File.ReadAllLinesAsync(excelPath, Encoding.GetEncoding(1251))); | |
| // Grouping all rows from initial csv in order to get rid of duplicated streets | |
| var forParsingRows = GetDistinctRows(rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
Streetcode/Streetcode.WebApi/Utils/WebParsingUtils.cs (2)
81-83: Hard-coded extraction path still limits portability
The absolute path/root/build/StreetCode/Streetcode/Streetcode.DALwill break on most local or Windows environments and was flagged in a previous review. Please make the target directory configurable (e.g.,IConfiguration, env-var, or fallback toPath.GetTempPath()).
125-128: TLS validation is disabled – serious security risk
ServerCertificateCustomValidationCallback = (_, _, _, _) => true;accepts any certificate and leaves the download vulnerable to MITM attacks. Either remove this line or wrap it in#if DEBUGso it never reaches production.
🧹 Nitpick comments (2)
Streetcode/Streetcode.WebApi/Utils/WebParsingUtils.cs (2)
78-80: Remove dead code to keepParseZipFileFromWebAsynctidy
Directory.GetParent(Environment.CurrentDirectory)?.FullNameis evaluated and immediately discarded.
Unless you intend to use the parent directory later, delete the statement to avoid confusing future readers.- _ = Directory.GetParent(Environment.CurrentDirectory)?.FullName!;
295-319: Empty latitude/longitude strings will throwFormatException
decimal.Parseis executed even when the CSV cell is empty (see rows originating from legacy data), caught and ignored. This silently drops those toponyms.Either:
- Skip rows without coordinates earlier, or
- Use
decimal.TryParseand decide how to handle missing values (e.g., keepCoordinate = null).Fail-fast or log with context so data loss is visible.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Streetcode/Streetcode.WebApi/Utils/WebParsingUtils.cs(1 hunks)
🔇 Additional comments (1)
Streetcode/Streetcode.WebApi/Utils/WebParsingUtils.cs (1)
30-58: Verify street-type mapping correctnessPrefix
"жилий масив "is mapped to street type"парк", which looks semantically incorrect. Please double-check with domain experts; an erroneous mapping will skew search queries and downstream analytics.
| foreach (var row in remainsToParse) | ||
| { | ||
| var communityCol = row[CommunityColumn]; | ||
| string cityStringSearchOptimized = communityCol.Substring(communityCol.IndexOf(" ", StringComparison.Ordinal) + 1); | ||
|
|
||
| var (streetName, streetType) = OptimizeStreetname(row[AddressColumn]); | ||
| string addressRow = $"{cityStringSearchOptimized} {streetName} {streetType}"; | ||
|
|
||
| var (latitude, longitude) = await FetchCoordsByAddressAsync(addressRow); | ||
|
|
||
| if (latitude is null || longitude is null) | ||
| { | ||
| addressRow = cityStringSearchOptimized; | ||
| (latitude, longitude) = await FetchCoordsByAddressAsync(addressRow); | ||
| } | ||
|
|
||
| var newRow = string.Empty; | ||
| for (int i = 0; i <= AddressColumn; i++) | ||
| { | ||
| newRow += $"{row[i]};"; | ||
| } | ||
|
|
||
| newRow += $"{latitude};{longitude}"; | ||
|
|
||
| await File.AppendAllTextAsync(csvPath, newRow + "\n", Encoding.GetEncoding(1251)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Coordinate fetch loop creates a new HttpClient on every iteration
FetchCoordsByAddressAsync instantiates HttpClient each call (see L345). Thousands of requests will exhaust sockets and slow the job.
Refactor to reuse a single static HttpClient, or inject one. Example:
+private static readonly HttpClient _nominatimClient = CreateNominatimClient();
+private static HttpClient CreateNominatimClient()
+{
+ var client = new HttpClient();
+ client.DefaultRequestHeaders.Add("User-Agent", "HistoryCode");
+ return client;
+}And in FetchCoordsByAddressAsync:
- using var client = new HttpClient();
- client.DefaultRequestHeaders.Add("User-Agent", "HistoryCode");
+ var client = _nominatimClient;Committable suggestion skipped: line range outside the PR's diff.
|




dev
JIRA
Summary of issue
The task aims to refactor the logic for fetching and updating toponyms data in WebPasingUtils.cs file. Currently, the system downloads a ZIP file from a predefined URL, extracts a CSV file (houses.csv), processes it, compares the results with a local data.csv, and then saves the toponyms into an MS SQL database. The new requirement is to eliminate local CSV file storage.
Summary of change
Deleted the local files, implemented the new logic of fetching and parsing data from a db to temporary files and then deleting them.
Summary by CodeRabbit
New Features
Bug Fixes
Chores