Skip to content

Commit 7a73447

Browse files
committed
refactor: WIP. Module name to filepath optimisation
This is related to #4598. This changes the file to module associating logic done during dependency graph building. Before, each time a module `Foo.Bar` is found, HLS is testing inside all the import path for the existence of a relevant fiel.. It means that for `i` import paths and `m` modules to locate, `m * n` filesystem operations are done. Note also that this involves a lot of complex string concatenation primitive to build the `FilePath`. A module is tested for each `import` for each of the file of the project. We also test for `boot` files, doubling the number of test. In #4598 we have a project with `1100` modules, in more than 250 import paths and we count more than `17000` `import` statments, resulting on over 6 millions test for file existences. This project was blocking for more than 3 minutes during HLS startup. This commit changes the way this is computed: - At startup, a `Map ModuleName FilePath` (the real type is a bit more involved for performance, multiples unit and boot files handling) is built by scanning all the import paths for files representing the different modules. - Directory scanning is efficient and if import path only contains haskell module, this will never do more job that listing the files of the project. - The lookup is now simplify a `Map` lookup. The performance improvement is as follows: - The number of IO operation is dramatically reduced, from multiples millions to a few recursive directories listing. - A lot of the boilerplate of converting path had be removed. - TODO: add an RTS stats before / after with number of allocations - On my project, the graph building time is reduced from a few minutes to 3s. Limitations: - How to rebuild the `Map` if the content of one directory change? - If one directory is filled with millions of files which are not of interested, performance can be damaged. TODO: add a diagnostic during this phase so the user can learn about this issue. Code status: - The `lookup` is not fully restored, especially it does not include the handling of home unit as well as reexport. - The initialisation phase is cached inside a `TVar` stored as a top level identifier using `unsafePerformIO`. This is to be improved. A note about performance Most users won't see the benefits of these change, but I think they apply to everbody: - We are still doing 1 lookup per `import` per module. But the lookup result is not multiples IO, so it should be faster by a large amount. - Most project only have 1 (or a few) import paths so won't benefit as dramatically as me from this. TODO for allocations
1 parent 21e35e0 commit 7a73447

File tree

5 files changed

+97
-32
lines changed

5 files changed

+97
-32
lines changed

ghcide/ghcide.cabal

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ library
106106
, unliftio-core
107107
, unordered-containers >=0.2.10.0
108108
, vector
109+
, pretty-simple
109110

110111
if os(windows)
111112
build-depends: Win32

ghcide/src/Development/IDE/Core/RuleTypes.hs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,9 @@ type instance RuleResult GetModSummary = ModSummaryResult
412412
-- | Generate a ModSummary with the timestamps and preprocessed content elided, for more successful early cutoff
413413
type instance RuleResult GetModSummaryWithoutTimestamps = ModSummaryResult
414414

415+
type instance RuleResult GetModulesPaths = (M.Map ModuleName (UnitId, NormalizedFilePath),
416+
M.Map ModuleName (UnitId, NormalizedFilePath))
417+
415418
data GetParsedModule = GetParsedModule
416419
deriving (Eq, Show, Generic)
417420
instance Hashable GetParsedModule
@@ -524,6 +527,13 @@ data GetModSummaryWithoutTimestamps = GetModSummaryWithoutTimestamps
524527
instance Hashable GetModSummaryWithoutTimestamps
525528
instance NFData GetModSummaryWithoutTimestamps
526529

530+
-- | Scan all the import directory for existing modules and build a map from
531+
-- module name to paths
532+
data GetModulesPaths = GetModulesPaths
533+
deriving (Eq, Show, Generic)
534+
instance Hashable GetModulesPaths
535+
instance NFData GetModulesPaths
536+
527537
data GetModSummary = GetModSummary
528538
deriving (Eq, Show, Generic)
529539
instance Hashable GetModSummary

ghcide/src/Development/IDE/Core/Rules.hs

Lines changed: 58 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
{-# LANGUAGE CPP #-}
55
{-# LANGUAGE DuplicateRecordFields #-}
66
{-# LANGUAGE TypeFamilies #-}
7+
{-# LANGUAGE PartialTypeSignatures #-}
78

89
-- | A Shake implementation of the compiler service, built
910
-- using the "Shaker" abstraction layer for in-memory use.
@@ -93,7 +94,7 @@ import Data.Proxy
9394
import qualified Data.Text as T
9495
import qualified Data.Text.Encoding as T
9596
import qualified Data.Text.Utf16.Rope.Mixed as Rope
96-
import Data.Time (UTCTime (..))
97+
import Data.Time (UTCTime (..), getCurrentTime, diffUTCTime)
9798
import Data.Time.Clock.POSIX (posixSecondsToUTCTime)
9899
import Data.Tuple.Extra
99100
import Data.Typeable (cast)
@@ -175,6 +176,12 @@ import System.Info.Extra (isWindows)
175176

176177
import qualified Data.IntMap as IM
177178
import GHC.Fingerprint
179+
import Text.Pretty.Simple
180+
import qualified Data.Map.Strict as Map
181+
import System.FilePath (takeExtension, takeFileName, normalise, dropTrailingPathSeparator, dropExtension, splitDirectories)
182+
import Data.Char (isUpper)
183+
import System.Directory.Extra (listFilesRecursive, listFilesInside)
184+
import System.IO.Unsafe
178185

179186
data Log
180187
= LogShake Shake.Log
@@ -319,30 +326,21 @@ getParsedModuleDefinition packageState opt file ms = do
319326
getLocatedImportsRule :: Recorder (WithPriority Log) -> Rules ()
320327
getLocatedImportsRule recorder =
321328
define (cmapWithPrio LogShake recorder) $ \GetLocatedImports file -> do
329+
322330
ModSummaryResult{msrModSummary = ms} <- use_ GetModSummaryWithoutTimestamps file
323-
(KnownTargets targets targetsMap) <- useNoFile_ GetKnownTargets
331+
-- TODO: should we reverse this concatenation, there are way less
332+
-- source import than normal import in theory, so it should be faster
324333
let imports = [(False, imp) | imp <- ms_textual_imps ms] ++ [(True, imp) | imp <- ms_srcimps ms]
325334
env_eq <- use_ GhcSession file
326335
let env = hscEnv env_eq
327336
let import_dirs = map (second homeUnitEnv_dflags) $ hugElts $ hsc_HUG env
328337
let dflags = hsc_dflags env
329338
opt <- getIdeOptions
330-
let getTargetFor modName nfp
331-
| Just (TargetFile nfp') <- HM.lookup (TargetFile nfp) targetsMap = do
332-
-- reuse the existing NormalizedFilePath in order to maximize sharing
333-
itExists <- getFileExists nfp'
334-
return $ if itExists then Just nfp' else Nothing
335-
| Just tt <- HM.lookup (TargetModule modName) targets = do
336-
-- reuse the existing NormalizedFilePath in order to maximize sharing
337-
let ttmap = HM.mapWithKey const (HashSet.toMap tt)
338-
nfp' = HM.lookupDefault nfp nfp ttmap
339-
itExists <- getFileExists nfp'
340-
return $ if itExists then Just nfp' else Nothing
341-
| otherwise = do
342-
itExists <- getFileExists nfp
343-
return $ if itExists then Just nfp else Nothing
339+
340+
moduleMaps <- use_ GetModulesPaths file
344341
(diags, imports') <- fmap unzip $ forM imports $ \(isSource, (mbPkgName, modName)) -> do
345-
diagOrImp <- locateModule (hscSetFlags dflags env) import_dirs (optExtensions opt) getTargetFor modName mbPkgName isSource
342+
343+
diagOrImp <- locateModule moduleMaps (hscSetFlags dflags env) import_dirs (optExtensions opt) modName mbPkgName isSource
346344
case diagOrImp of
347345
Left diags -> pure (diags, Just (modName, Nothing))
348346
Right (FileImport path) -> pure ([], Just (modName, Just path))
@@ -632,10 +630,51 @@ getModuleGraphRule recorder = defineEarlyCutOffNoFile (cmapWithPrio LogShake rec
632630
fs <- toKnownFiles <$> useNoFile_ GetKnownTargets
633631
dependencyInfoForFiles (HashSet.toList fs)
634632

633+
{-# NOINLINE cacheVar #-}
634+
cacheVar = unsafePerformIO (newTVarIO mempty)
635+
636+
getModulesPathsRule :: Recorder (WithPriority Log) -> Rules ()
637+
getModulesPathsRule recorder = defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModulesPaths file -> do
638+
env_eq <- use_ GhcSession file
639+
640+
cache <- liftIO (readTVarIO cacheVar)
641+
case Map.lookup (envUnique env_eq) cache of
642+
Just res -> pure (mempty, ([], Just res))
643+
Nothing -> do
644+
let env = hscEnv env_eq
645+
let import_dirs = map (second homeUnitEnv_dflags) $ hugElts $ hsc_HUG env
646+
opt <- getIdeOptions
647+
let exts = (optExtensions opt)
648+
let acceptedExtensions = concatMap (\x -> ['.':x, '.':x <> "-boot"]) exts
649+
650+
(unzip -> (a, b)) <- flip mapM import_dirs $ \(u, dyn) -> do
651+
(unzip -> (a, b)) <- flip mapM (importPaths dyn) $ \dir' -> do
652+
let dir = dropTrailingPathSeparator dir'
653+
let predicate path = pure (path == dir || isUpper (head (takeFileName path)))
654+
let dir_number_directories = length (splitDirectories dir)
655+
let toModule file = mkModuleName (intercalate "." $ drop dir_number_directories (splitDirectories (dropExtension file)))
656+
657+
-- TODO: we are taking/droping extension, this could be factorized to save a few cpu cycles ;)
658+
-- TODO: do acceptedextensions needs to be a set ? or a vector?
659+
modules <- fmap (\path -> (toModule path, toNormalizedFilePath' path)) . filter (\y -> takeExtension y `elem` acceptedExtensions) <$> liftIO (listFilesInside predicate dir)
660+
let isSourceModule (_, path) = "-boot" `isSuffixOf` fromNormalizedFilePath path
661+
let (sourceModules, notSourceModules) = partition isSourceModule modules
662+
pure $ (Map.fromList notSourceModules, Map.fromList sourceModules)
663+
pure (fmap (u,) $ mconcat a, fmap (u, ) $ mconcat b)
664+
665+
let res = (mconcat a, mconcat b)
666+
liftIO $ atomically $ modifyTVar' cacheVar (Map.insert (envUnique env_eq) res)
667+
668+
pure (mempty, ([], Just $ (mconcat a, mconcat b)))
669+
635670
dependencyInfoForFiles :: [NormalizedFilePath] -> Action (BS.ByteString, DependencyInformation)
636671
dependencyInfoForFiles fs = do
672+
-- liftIO $ print ("fs length", length fs)
637673
(rawDepInfo, bm) <- rawDependencyInformation fs
674+
-- liftIO $ print ("ok with raw deps")
675+
-- liftIO $ pPrint rawDepInfo
638676
let (all_fs, _all_ids) = unzip $ HM.toList $ pathToIdMap $ rawPathIdMap rawDepInfo
677+
-- liftIO $ print ("all_fs length", length all_fs)
639678
msrs <- uses GetModSummaryWithoutTimestamps all_fs
640679
let mss = map (fmap msrModSummary) msrs
641680
let deps = map (\i -> IM.lookup (getFilePathId i) (rawImports rawDepInfo)) _all_ids
@@ -714,6 +753,7 @@ loadGhcSession recorder ghcSessionDepsConfig = do
714753
IdeGhcSession{loadSessionFun} <- useNoFile_ GhcSessionIO
715754
-- loading is always returning a absolute path now
716755
(val,deps) <- liftIO $ loadSessionFun $ fromNormalizedFilePath file
756+
-- TODO: this is responsible for a LOT of allocations
717757

718758
-- add the deps to the Shake graph
719759
let addDependency fp = do
@@ -1235,6 +1275,7 @@ mainRule recorder RulesConfig{..} = do
12351275
getModIfaceRule recorder
12361276
getModSummaryRule templateHaskellWarning recorder
12371277
getModuleGraphRule recorder
1278+
getModulesPathsRule recorder
12381279
getFileHashRule recorder
12391280
knownFilesRule recorder
12401281
getClientSettingsRule recorder

ghcide/src/Development/IDE/Import/FindImports.hs

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55

66
module Development.IDE.Import.FindImports
77
( locateModule
8-
, locateModuleFile
98
, Import(..)
109
, ArtifactsLocation(..)
1110
, modSummaryToArtifactsLocation
@@ -14,9 +13,8 @@ module Development.IDE.Import.FindImports
1413
) where
1514

1615
import Control.DeepSeq
17-
import Control.Monad.Extra
1816
import Control.Monad.IO.Class
19-
import Data.List (find, isSuffixOf)
17+
import Data.List (isSuffixOf)
2018
import Data.Maybe
2119
import qualified Data.Set as S
2220
import Development.IDE.GHC.Compat as Compat
@@ -26,7 +24,8 @@ import Development.IDE.Types.Diagnostics
2624
import Development.IDE.Types.Location
2725
import GHC.Types.PkgQual
2826
import GHC.Unit.State
29-
import System.FilePath
27+
import Data.Map.Strict (Map)
28+
import qualified Data.Map.Strict as Map
3029

3130

3231
#if MIN_VERSION_ghc(9,11,0)
@@ -70,6 +69,7 @@ data LocateResult
7069
| LocateFoundReexport UnitId
7170
| LocateFoundFile UnitId NormalizedFilePath
7271

72+
{-
7373
-- | locate a module in the file system. Where we go from *daml to Haskell
7474
locateModuleFile :: MonadIO m
7575
=> [(UnitId, [FilePath], S.Set ModuleName)]
@@ -94,6 +94,7 @@ locateModuleFile import_dirss exts targetFor isSource modName = do
9494
maybeBoot ext
9595
| isSource = ext ++ "-boot"
9696
| otherwise = ext
97+
-}
9798

9899
-- | This function is used to map a package name to a set of import paths.
99100
-- It only returns Just for unit-ids which are possible to import into the
@@ -110,36 +111,45 @@ mkImportDirs _env (i, flags) = Just (i, (importPaths flags, reexportedModules fl
110111
-- Haskell
111112
locateModule
112113
:: MonadIO m
113-
=> HscEnv
114+
=> (Map ModuleName (UnitId, NormalizedFilePath),Map ModuleName (UnitId, NormalizedFilePath))
115+
-> HscEnv
114116
-> [(UnitId, DynFlags)] -- ^ Import directories
115117
-> [String] -- ^ File extensions
116-
-> (ModuleName -> NormalizedFilePath -> m (Maybe NormalizedFilePath)) -- ^ does file exist predicate
117118
-> Located ModuleName -- ^ Module name
118119
-> PkgQual -- ^ Package name
119120
-> Bool -- ^ Is boot module
120121
-> m (Either [FileDiagnostic] Import)
121-
locateModule env comp_info exts targetFor modName mbPkgName isSource = do
122+
locateModule moduleMaps@(moduleMap, moduleMapSource) env comp_info exts modName mbPkgName isSource = do
122123
case mbPkgName of
123124
-- 'ThisPkg' just means some home module, not the current unit
124125
ThisPkg uid
126+
-- TODO: there are MANY lookup on import_paths, which is a problem considering that it can be large.
125127
| Just (dirs, reexports) <- lookup uid import_paths
126-
-> lookupLocal uid dirs reexports
128+
-> lookupLocal moduleMaps uid dirs reexports
127129
| otherwise -> return $ Left $ notFoundErr env modName $ LookupNotFound []
128130
-- if a package name is given we only go look for a package
129131
OtherPkg uid
130132
| Just (dirs, reexports) <- lookup uid import_paths
131-
-> lookupLocal uid dirs reexports
133+
-> lookupLocal moduleMaps uid dirs reexports
132134
| otherwise -> lookupInPackageDB
133135
NoPkgQual -> do
134136

135137
-- Reexports for current unit have to be empty because they only apply to other units depending on the
136138
-- current unit. If we set the reexports to be the actual reexports then we risk looping forever trying
137139
-- to find the module from the perspective of the current unit.
138-
mbFile <- locateModuleFile ((homeUnitId_ dflags, importPaths dflags, S.empty) : other_imports) exts targetFor isSource $ unLoc modName
140+
---- locateModuleFile ((homeUnitId_ dflags, importPaths dflags, S.empty) : other_imports) exts targetFor isSource $ unLoc modName
141+
--
142+
-- TODO: handle the other imports, the unit id, ..., reexport.
143+
-- - TODO: should we look for file existence now? If the file was
144+
-- removed from the disk, how will it behaves? How do we invalidate
145+
-- that?
146+
let mbFile = case Map.lookup (unLoc modName) (if isSource then moduleMapSource else moduleMap) of
147+
Nothing -> LocateNotFound
148+
Just (uid, file) -> LocateFoundFile uid file
139149
case mbFile of
140150
LocateNotFound -> lookupInPackageDB
141151
-- Lookup again with the perspective of the unit reexporting the file
142-
LocateFoundReexport uid -> locateModule (hscSetActiveUnitId uid env) comp_info exts targetFor modName noPkgQual isSource
152+
LocateFoundReexport uid -> locateModule moduleMaps (hscSetActiveUnitId uid env) comp_info exts modName noPkgQual isSource
143153
LocateFoundFile uid file -> toModLocation uid file
144154
where
145155
dflags = hsc_dflags env
@@ -168,12 +178,15 @@ locateModule env comp_info exts targetFor modName mbPkgName isSource = do
168178
let genMod = mkModule (RealUnit $ Definite uid) (unLoc modName) -- TODO support backpack holes
169179
return $ Right $ FileImport $ ArtifactsLocation file (Just loc) (not isSource) (Just genMod)
170180

171-
lookupLocal uid dirs reexports = do
172-
mbFile <- locateModuleFile [(uid, dirs, reexports)] exts targetFor isSource $ unLoc modName
181+
lookupLocal moduleMaps@(moduleMapSource, moduleMap) uid dirs reexports = do
182+
-- mbFile <- locateModuleFile [(uid, dirs, reexports)] exts targetFor isSource $ unLoc modName
183+
let mbFile = case Map.lookup (unLoc modName) (if isSource then moduleMapSource else moduleMap) of
184+
Nothing -> LocateNotFound
185+
Just (uid, file) -> LocateFoundFile uid file
173186
case mbFile of
174187
LocateNotFound -> return $ Left $ notFoundErr env modName $ LookupNotFound []
175188
-- Lookup again with the perspective of the unit reexporting the file
176-
LocateFoundReexport uid' -> locateModule (hscSetActiveUnitId uid' env) comp_info exts targetFor modName noPkgQual isSource
189+
LocateFoundReexport uid' -> locateModule moduleMaps (hscSetActiveUnitId uid' env) comp_info exts modName noPkgQual isSource
177190
LocateFoundFile uid' file -> toModLocation uid' file
178191

179192
lookupInPackageDB = do

ghcide/src/Development/IDE/Types/HscEnvEq.hs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{-# LANGUAGE CPP #-}
22
module Development.IDE.Types.HscEnvEq
33
( HscEnvEq,
4-
hscEnv, newHscEnvEq,
4+
hscEnv, newHscEnvEq, envUnique,
55
updateHscEnvEq,
66
envPackageExports,
77
envVisibleModuleNames,

0 commit comments

Comments
 (0)