Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Extended pass InitCsets and underlying code with more log output geared towards memory introspection, and added markers for special locations. Extended my notes with general observations from the first test runs over my example CVS repositories. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
27ed4f7dc3a0c032f255bc0d6a3734c0 |
User & Date: | aku 2008-02-16 06:46:41.000 |
Context
2008-02-17
| ||
02:06 | Reworked the basic structure of pass InitCSets to keep memory consumption down. Now incremental creates, breaks, saves, and releases changesets, instead of piling them on before saving all at the end. Memory tracking confirms that this changes the accumulating mountain into a near-constant usage, with the expected spikes from the breaking. ... (check-in: f46458d5 user: aku tags: trunk) | |
2008-02-16
| ||
06:46 | Extended pass InitCsets and underlying code with more log output geared towards memory introspection, and added markers for special locations. Extended my notes with general observations from the first test runs over my example CVS repositories. ... (check-in: 27ed4f7d user: aku tags: trunk) | |
06:45 | Integrated memory tracking into the option processor for activation and configuration, and into the log system for use. The latter means that each actual output to the log is an introspection point. ... (check-in: 7b71f647 user: aku tags: trunk) | |
Changes
Changes to cvs2fossil.txt.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | Known problems and areas to work on =================================== * Not yet able to handle the specification of multiple projects for one CVS repository. I.e. I can, for example, import all of tcllib, or a single subproject of tcllib, like tklib, but not multiple sub-projects in one go. * We have to look into the pass 'InitCsets' and hunt for the cause of the large amount of memory it is gobbling up. * Look at the dependencies on external packages and consider which of them can be moved into the importer, either as a simple utility command, or wholesale. struct::list assign, map, reverse, filter | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | Known problems and areas to work on =================================== * Not yet able to handle the specification of multiple projects for one CVS repository. I.e. I can, for example, import all of tcllib, or a single subproject of tcllib, like tklib, but not multiple sub-projects in one go. * We have to look into the pass 'InitCsets' and hunt for the cause of the large amount of memory it is gobbling up. Results from the first look using the new memory tracking subsystem: (1) The general architecture, workflow, is a bit wasteful. All changesets are generated and kept in memory before getting persisted. This means that allocated memory piles up over time, with later changesets pushing the boundaries. This is made worse that some of the preliminary changesets seem to require a lot of temporary memory as part of getting broken down into the actual ones. InititializeBreakState seems to be the culprit here. Its memory usage is possibly quadratic in the number of items in the changeset. (2) A number of small inefficiencies. Like 'state eval' always pulling the whole result into memory before processing it with 'foreach'. Here potentially large lists. (3) We maintain an in-memory map from tagged items to their changesets. While this is needed later in the sorting passes during the creation this is wasted space. And also wasted time, to maintain it during the creation and breaking. Changes: (a) Re-architect to create, break, and persist changesets one by one, completely releasing all associated in-memory data before going to the next. Should be low-hanging fruit with high impact, as we have all the necessary operations already, just not in that order, and that alone should already keep the pile from forming, making the spikes of (2) more manageable. (b) Look into the smaller problems described in (2), and especially (3). These should still be low-hanging fruit, although of lesser effect than (a). For (3) disable the map and its maintenace during construction, and put it into a separate command, to be used when loading the created changesets at the end. (c) With larger effect, but more difficult to achieve, go into command 'InitializeBreakState' and the preceding 'internalsuccessors', and rearchitect it. Definitely not a low-hanging fruit. Possibly also something we can skip if doing (a) had a large enough effect. * Look at the dependencies on external packages and consider which of them can be moved into the importer, either as a simple utility command, or wholesale. struct::list assign, map, reverse, filter |
︙ | ︙ | |||
35 36 37 38 39 40 41 | struct::graph In toto snit In toto sqlite3 | | | 81 82 83 84 85 86 87 88 | struct::graph In toto snit In toto sqlite3 In toto |
Changes to tools/cvs2fossil/lib/c2f_pinitcsets.tcl.
︙ | ︙ | |||
17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # # ## ### ##### ######## ############# ##################### ## Requirements package require Tcl 8.4 ; # Required runtime. package require snit ; # OO system. package require vc::tools::misc ; # Text formatting. package require vc::tools::log ; # User feedback. package require vc::fossil::import::cvs::repository ; # Repository management. package require vc::fossil::import::cvs::state ; # State storage. package require vc::fossil::import::cvs::integrity ; # State integrity checks. package require vc::fossil::import::cvs::project::rev ; # Project level changesets # # ## ### ##### ######## ############# ##################### ## Register the pass with the management | > | 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | # # ## ### ##### ######## ############# ##################### ## Requirements package require Tcl 8.4 ; # Required runtime. package require snit ; # OO system. package require vc::tools::misc ; # Text formatting. package require vc::tools::log ; # User feedback. package require vc::tools::mem ; # Memory tracking. package require vc::fossil::import::cvs::repository ; # Repository management. package require vc::fossil::import::cvs::state ; # State storage. package require vc::fossil::import::cvs::integrity ; # State integrity checks. package require vc::fossil::import::cvs::project::rev ; # Project level changesets # # ## ### ##### ######## ############# ##################### ## Register the pass with the management |
︙ | ︙ | |||
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | # Note: We could have written this loop to create the csets # early, extending them with all their revisions. This # however would mean lots of (slow) method invokations # on the csets. Doing it like this, late creation, means # less such calls. None, but the creation itself. foreach {mid rid pid} [state run { SELECT M.mid, R.rid, M.pid FROM revision R, meta M -- R ==> M, using PK index of M. WHERE R.mid = M.mid ORDER BY M.mid, R.date }] { if {$lastmeta != $mid} { if {[llength $revisions]} { incr n set p [repository projectof $lastproject] project::rev %AUTO% $p rev $lastmeta $revisions set revisions {} } set lastmeta $mid set lastproject $pid } lappend revisions $rid } if {[llength $revisions]} { incr n set p [repository projectof $lastproject] project::rev %AUTO% $p rev $lastmeta $revisions } log write 4 initcsets "Created [nsp $n {revision changeset}]" return } proc CreateSymbolChangesets {} { log write 3 initcsets {Create changesets based on symbols} # Tags and branches induce changesets as well, containing the # revisions they are attached to (tags), or spawned from # (branches). set n 0 | > > > > > > > > > > > > > > > > | 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | # Note: We could have written this loop to create the csets # early, extending them with all their revisions. This # however would mean lots of (slow) method invokations # on the csets. Doing it like this, late creation, means # less such calls. None, but the creation itself. log write 14 initcsets meta_begin mem::mark foreach {mid rid pid} [state run { SELECT M.mid, R.rid, M.pid FROM revision R, meta M -- R ==> M, using PK index of M. WHERE R.mid = M.mid ORDER BY M.mid, R.date }] { log write 14 initcsets meta_next if {$lastmeta != $mid} { if {[llength $revisions]} { incr n set p [repository projectof $lastproject] log write 14 initcsets meta_cset_begin mem::mark project::rev %AUTO% $p rev $lastmeta $revisions log write 14 initcsets meta_cset_done mem::mark set revisions {} } set lastmeta $mid set lastproject $pid } lappend revisions $rid } if {[llength $revisions]} { incr n set p [repository projectof $lastproject] log write 14 initcsets meta_cset_begin mem::mark project::rev %AUTO% $p rev $lastmeta $revisions log write 14 initcsets meta_cset_done mem::mark } log write 14 initcsets meta_done mem::mark log write 4 initcsets "Created [nsp $n {revision changeset}]" return } proc CreateSymbolChangesets {} { log write 3 initcsets {Create changesets based on symbols} mem::mark # Tags and branches induce changesets as well, containing the # revisions they are attached to (tags), or spawned from # (branches). set n 0 |
︙ | ︙ | |||
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 | if {[llength $branches]} { incr n set p [repository projectof $lastproject] project::rev %AUTO% $p sym::branch $lastsymbol $branches } log write 4 initcsets "Created [nsp $n {symbol changeset}]" return } proc BreakInternalDependencies {} { # This code operates on the revision changesets created by # 'CreateRevisionChangesets'. As such it has to follow after # it, before the symbol changesets are made. The changesets # are inspected for internal conflicts and any such are broken # by splitting the problematic changeset into multiple # fragments. The results are changesets which have no internal # dependencies, only external ones. log write 3 initcsets {Break internal dependencies} set old [llength [project::rev all]] foreach cset [project::rev all] { $cset breakinternaldependencies } set n [expr {[llength [project::rev all]] - $old}] log write 4 initcsets "Created [nsp $n {additional revision changeset}]" log write 4 initcsets Ok. return } proc PersistTheChangesets {} { log write 3 initcsets "Saving [nsp [llength [project::rev all]] {initial changeset}] to the persistent state" foreach cset [project::rev all] { | > > > | 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | if {[llength $branches]} { incr n set p [repository projectof $lastproject] project::rev %AUTO% $p sym::branch $lastsymbol $branches } log write 4 initcsets "Created [nsp $n {symbol changeset}]" mem::mark return } proc BreakInternalDependencies {} { # This code operates on the revision changesets created by # 'CreateRevisionChangesets'. As such it has to follow after # it, before the symbol changesets are made. The changesets # are inspected for internal conflicts and any such are broken # by splitting the problematic changeset into multiple # fragments. The results are changesets which have no internal # dependencies, only external ones. log write 3 initcsets {Break internal dependencies} mem::mark set old [llength [project::rev all]] foreach cset [project::rev all] { $cset breakinternaldependencies } set n [expr {[llength [project::rev all]] - $old}] log write 4 initcsets "Created [nsp $n {additional revision changeset}]" log write 4 initcsets Ok. mem::mark return } proc PersistTheChangesets {} { log write 3 initcsets "Saving [nsp [llength [project::rev all]] {initial changeset}] to the persistent state" foreach cset [project::rev all] { |
︙ | ︙ | |||
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | namespace eval initcsets { namespace import ::vc::fossil::import::cvs::repository namespace import ::vc::fossil::import::cvs::state namespace import ::vc::fossil::import::cvs::integrity namespace eval project { namespace import ::vc::fossil::import::cvs::project::rev } namespace import ::vc::tools::misc::* namespace import ::vc::tools::log log register initcsets } } # # ## ### ##### ######## ############# ##################### ## Ready package provide vc::fossil::import::cvs::pass::initcsets 1.0 return | > > > | 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 | namespace eval initcsets { namespace import ::vc::fossil::import::cvs::repository namespace import ::vc::fossil::import::cvs::state namespace import ::vc::fossil::import::cvs::integrity namespace eval project { namespace import ::vc::fossil::import::cvs::project::rev } namespace eval mem { namespace import ::vc::tools::mem::mark } namespace import ::vc::tools::misc::* namespace import ::vc::tools::log log register initcsets } } # # ## ### ##### ######## ############# ##################### ## Ready package provide vc::fossil::import::cvs::pass::initcsets 1.0 return |
Changes to tools/cvs2fossil/lib/c2f_prev.tcl.
︙ | ︙ | |||
131 132 133 134 135 136 137 | # item -> list (item) method nextmap {} { $mytypeobj successors tmp $myitems return [array get tmp] } method breakinternaldependencies {} { | | > | 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | # item -> list (item) method nextmap {} { $mytypeobj successors tmp $myitems return [array get tmp] } method breakinternaldependencies {} { log write 14 csets {[$self str] BID} vc::tools::mem::mark ## ## NOTE: This method, maybe in conjunction with its caller ## seems to be a memory hog, especially for large ## changesets, with 'large' meaning to have a 'long list ## of items, several thousand'. Investigate where the ## memory is spent and then look for ways of rectifying ## the problem. |
︙ | ︙ | |||
164 165 166 167 168 169 170 | # b -> a). # Array of dependencies (parent -> child). This is pulled from # the state, and limited to successors within the changeset. array set dependencies {} $mytypeobj internalsuccessors dependencies $myitems | | > > > | 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | # b -> a). # Array of dependencies (parent -> child). This is pulled from # the state, and limited to successors within the changeset. array set dependencies {} $mytypeobj internalsuccessors dependencies $myitems if {![array size dependencies]} { return 0 } ; # Nothing to break. log write 5 csets ...[$self str]....................................................... vc::tools::mem::mark # We have internal dependencies to break. We now iterate over # all positions in the list (which is chronological, at least # as far as the timestamps are correct and unique) and # determine the best position for the break, by trying to # break as many dependencies as possible in one go. When a # break was found this is redone for the fragments coming and |
︙ | ︙ | |||
1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 | }]] } # var(dv) = dict (revision -> list (revision)) typemethod internalsuccessors {dv revisions} { upvar 1 $dv dependencies set theset ('[join $revisions {','}]') # See 'successors' below for the main explanation of # the various cases. This piece is special in that it # restricts the successors we look for to the same set of # revisions we start from. Sensible as we are looking for # changeset internal dependencies. | > > | 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 | }]] } # var(dv) = dict (revision -> list (revision)) typemethod internalsuccessors {dv revisions} { upvar 1 $dv dependencies set theset ('[join $revisions {','}]') log write 14 cset internalsuccessors # See 'successors' below for the main explanation of # the various cases. This piece is special in that it # restricts the successors we look for to the same set of # revisions we start from. Sensible as we are looking for # changeset internal dependencies. |
︙ | ︙ | |||
1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 | # will greatly reduces the risk of getting far separated # revisions of the same file into one changeset. # We allow revisions to be far apart in time in the same # changeset, but in turn need the pseudo-dependencies to # handle this. array set fids {} foreach {rid fid} [state run [subst -nocommands -nobackslashes { SELECT R.rid, R.fid FROM revision R WHERE R.rid IN $theset }]] { lappend fids($fid) $rid } foreach {fid rids} [array get fids] { if {[llength $rids] < 2} continue foreach a $rids { foreach b $rids { if {$a == $b} continue if {[info exists dep($a,$b)]} continue if {[info exists dep($b,$a)]} continue lappend dependencies($a) $b set dep($a,$b) . set dep($b,$a) . } } } return } # result = 4-list (itemtype itemid nextitemtype nextitemid ...) typemethod loops {revisions} { # Note: Tags and branches cannot cause the loop. Their id's, # being of a fundamentally different type than the revisions | > > > > | 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 | # will greatly reduces the risk of getting far separated # revisions of the same file into one changeset. # We allow revisions to be far apart in time in the same # changeset, but in turn need the pseudo-dependencies to # handle this. log write 14 cset pseudo-internalsuccessors array set fids {} foreach {rid fid} [state run [subst -nocommands -nobackslashes { SELECT R.rid, R.fid FROM revision R WHERE R.rid IN $theset }]] { lappend fids($fid) $rid } foreach {fid rids} [array get fids] { if {[llength $rids] < 2} continue foreach a $rids { foreach b $rids { if {$a == $b} continue if {[info exists dep($a,$b)]} continue if {[info exists dep($b,$a)]} continue lappend dependencies($a) $b set dep($a,$b) . set dep($b,$a) . } } } log write 14 cset complete return } # result = 4-list (itemtype itemid nextitemtype nextitemid ...) typemethod loops {revisions} { # Note: Tags and branches cannot cause the loop. Their id's, # being of a fundamentally different type than the revisions |
︙ | ︙ | |||
1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 | namespace import ::vc::tools::log log register csets # Set up the helper singletons namespace eval rev { namespace import ::vc::fossil::import::cvs::state namespace import ::vc::fossil::import::cvs::integrity } namespace eval sym::tag { namespace import ::vc::fossil::import::cvs::state namespace import ::vc::fossil::import::cvs::integrity } namespace eval sym::branch { namespace import ::vc::fossil::import::cvs::state namespace import ::vc::fossil::import::cvs::integrity } } } # # ## ### ##### ######## ############# ##################### ## Ready package provide vc::fossil::import::cvs::project::rev 1.0 return | > > > | 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 | namespace import ::vc::tools::log log register csets # Set up the helper singletons namespace eval rev { namespace import ::vc::fossil::import::cvs::state namespace import ::vc::fossil::import::cvs::integrity namespace import ::vc::tools::log } namespace eval sym::tag { namespace import ::vc::fossil::import::cvs::state namespace import ::vc::fossil::import::cvs::integrity namespace import ::vc::tools::log } namespace eval sym::branch { namespace import ::vc::fossil::import::cvs::state namespace import ::vc::fossil::import::cvs::integrity namespace import ::vc::tools::log } } } # # ## ### ##### ######## ############# ##################### ## Ready package provide vc::fossil::import::cvs::project::rev 1.0 return |