Merge ChangeSet@1.4: Documentation about flist scalability

b3e6c815 · Martin Pool · a6a3c3df · b3e6c815
Commit b3e6c815 authored Jan 11, 2002 by Martin Pool
Hide whitespace changes
Inline Side-by-side

Showing with 22 additions and 0 deletions

TODO TODO +22 -0

No files found.
--- a/TODO
+++ b/TODO
@@ -40,10 +40,31 @@ Performance
  start, which makes us use a lot of memory and also not pipeline
  network access as much as we could.

+  We need to be careful of duplicate names getting into the file list.
+  See clean_flist.  This could happen if multiple arguments include
+  the same file.  Bad.  
+
+  I think duplicates are only a problem if they're both flowing
+  through the pipeline at the same time.  For example we might have
+  updated the first occurrence after reading the checksums for the
+  second.  So possibly we just need to make sure that we don't have
+  both in the pipeline at the same time.  
+
+  Possibly if we did one directory at a time that would be sufficient.
+
+  Alternatively we could pre-process the arguments to make sure no
+  duplicates will ever be inserted.  
+
+  We could have a hash table.
+
 Memory accounting

  At exit, show how much memory was used for the file list, etc.

+  Also we do a wierd exponential-growth allocation in flist.c.  I'm
+  not sure this makes sense with modern mallocs.  At any rate it will
+  make us allocate a huge amount of memory for large file lists.
+
 Hard-link handling

  At the moment hardlink handling is very expensive, so it's off by
@@ -238,3 +259,4 @@ rsyncsh
   current host, directory and so on.  We can probably even do
   completion of remote filenames.

+%K%