diff --git "a/database/tawos/deep/SERVER_deep-se.csv" "b/database/tawos/deep/SERVER_deep-se.csv" new file mode 100644--- /dev/null +++ "b/database/tawos/deep/SERVER_deep-se.csv" @@ -0,0 +1,520 @@ +"issuekey","created","title","description","storypoint" +"SERVER-16612","12/19/2014 20:33:42","Implicitly zeroed files in WiredTiger","There is a problem with the implicit zeroing of files by the kernel on certain platforms - see SERVER-15369 for more details. A workaround was put into place for this issue for mmapv1 files to explicitly zero .ns files. The purpose of this ticket is to determine: * in what areas will WiredTiger have similar vulnerabilities to this issue? * what will be the customer impact of this issue to WiredTiger? * beyond advising customers to avoid using WiredTiger on platforms with the issue, can we reasonably work around the problem by explicitly zeroing all files rather than relying on the kernel's implicit zeroing?",5 +"SERVER-17014","01/22/2015 23:57:46","foreground index build blocks database reads and writes","Some discussion is at https://groups.google.com/forum/#!topic/mongodb-dev/_1IrogzovEQ. When I create an index with background:false then many (all?) operations in the db are blocked even for engines like WiredTiger that don't require a per-db writer lock. The URL above shows thread stacks where background jobs (TTLMonitor, ClientCursorMonitor) get blocked on a per-db lock by a background:false index create. I assume bad things can happen when TTL enforcement doesn't run for too long. This creates other problems as ""show collections"", db.$foo.getIndexes() and queries from other collections in the same database will be blocked for the duration of the index create. While background:true is the workaround background index creation can take more time.",0 +"SERVER-18840","06/05/2015 17:04:28","resmoke should indicate status of test in abbreviated log output during run, before logging everything at the end","When resmoke is running a batch of tests and logging to buildlogger, after each test finishes it prints a line like: {noformat} [2015/05/20 13:12:36.365] [executor:js_test:job0] sync_passive.js ran in 138.85 seconds. {noformat} It would be helpful to indicate here if the test passed/failed, so if a suite is running in evergreen but hasn't finished yet, i can eyeball if the suite is going to fail by looking at the logs. ",1 +"SERVER-19895","08/12/2015 18:58:44","resmoke failures should self-document","When resmoke fails, it should print out steps to help the user debug the failure. E.g. when resmoke detects that it's run in Evergreen, it should print out the places that the user should look for symptoms. Original description: At shutdown time, a fixture (Replication or Sharding) checks the return status of all the process shutdown procedures. If any of them have returned 'false' (which means that the process returned a non-0 exit status), it fails the test suite associated with that fixture. It would be helpful if the fixture wrote a message to the log stating which process caused the suite failure. Currently, the only way to diagnose this is to scour the logs looking for the exit status of each process; since we are looking for a line that is *not* ""exited with code 0."", this is not a simple search to undertake.",2 +"SERVER-20056","08/20/2015 02:51:46","Log a startup warning if wiredTigerCacheSizeGB is > 80% of RAM","Currently you can set wiredTigerCacheSizeGB to be 100% of available memory, which will almost certainly lead to problems. Perhaps a startup check here to confirm that wiredTigerCacheSizeGB is < 80% of available Memory. ",3 +"SERVER-20960","10/16/2015 04:14:25","Default index build option support in config/runtime","Add support for index build preferences in the config file and/or mongod runtime parameters. Specifically to control preferences for background or foreground index build options. *EDIT*: in SERVER-24041 the following request was added: {quote} It would be great if I can block foreground index creation by configuration. {quote}",0 +"SERVER-21861","12/11/2015 15:35:39","Better Timestamp object comparison in mongo shell","It would be great if the <, >, <=, >= operators on Timestamp objects would work as expected. The bsonWoCompare can be used for the mean time even though it will end up comparing the member functions as well as long as 't' fields are compared first over the 'i' fields.",3 +"SERVER-24918","07/06/2016 07:12:26","perf.yml is not using Evergreen modules","The control file for the performance project in Evergreen 'perf.yml' is not using the Evergreen module support. We should move to use modules to make testing changes in mongo-perf and dsi easier. Module for DSI and for mongo-perf repos. ",2 +"SERVER-25548","08/10/2016 21:46:16","resmoke should not continue after a shell crash","Currently, resmoke stops the test if a mongo server has crashed in {{job.py}}. We should have it do the same for a shell crash. For suites that use the shell to spawn servers, a shell crash would cause the servers to not be terminated. Subsequent tests will then run on servers from previous tests, causing either test failures or hangs. Having these additional failures defeat the purpose of running with continueOnFailure.",2 +"SERVER-26319","09/26/2016 00:07:49","Deleting a test results in ""ValueError: Unrecognized jstest""","When a file is included or [excluded explicitly by name|https://github.com/mongodb/mongo/blob/r3.4.0-rc1/buildscripts/resmokelib/selector.py#L237] (i.e. not by glob pattern or tag), then an error is raised if the test isn't found. This is done to ensure that the blacklist for a suite is updated when a test is renamed or deleted. We should consider improving [the ""ValueError: Unrecognized jstest"" error message|https://github.com/mongodb/mongo/blob/r3.4.0-rc1/buildscripts/resmokelib/selector.py#L295] to be clearer about potentially needing to update an entry in the blacklist. h6. Original description In master, deleting the test loglong.js causes a failure: https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_sharded_collections_jscore_passthrough_c172cc49e735b6e48f58662e5588961216d3cff0_16_09_25_23_17_00 In this patch, I moved loglong.js to a new location and it causes the compile to fail: https://evergreen.mongodb.com/version/57e577943ff12239ba00ab4f",1 +"SERVER-26625","10/13/2016 22:41:59","Make collStats command consistent on background index reporting","The collStats command includes indexes that are undergoing background index build in the 'indexSizes' section. It does not include them in 'nindexes' or 'indexDetails'. We should make reporting consistent, and make clear which indexes are undergoing background build (if any).",2 +"SERVER-26867","11/01/2016 17:21:24","Timeout-related assert.soon failures should trigger the hang analyzer","When an test fails due to a timeout, it would be nice if that would trigger the hang analyzer before tearing down the processes. This would let us see what the server is doing that is preventing it from doing what we expect. -Timeout-related failures include:- * -assert.soon()- * -wtimeout expiring- * -$maxTimeMs expiring- Update: Per discussion below we will be limiting this to {{assert.soon}}, {{assert.soonNoExcept}}, {{assert.retry}}, and {{assert.assert.retryNoExcept}}.",2 +"SERVER-26953","11/08/2016 18:18:53","Track bytes read into cache per query","When slow queries are logged we include information about documents and index entries scanned to help diagnose performance issues. However that information doesn't tell us whether the documents or index entries scanned were in cache or not, and if they are not in cache they can have a much more significant performance impact. Reporting bytes read into cache for each query logged would help diagnose the cause of performance issues related to cache pressure. This might be accomplished by tracking bytes read into cache per cursor or per session in WT, and computing the difference between the value of this counter before and after each query. Query performance impact would need to be evaluated.",0 +"SERVER-26988","11/10/2016 21:58:01","Secondary delay causes large drop in insert rate on the primary due to cache full condition","With a heavy insert load and a secondary that is delayed the cache on the primary fills to 100% and operation rates drop. Here's a run showing behavior on the primary with the secondary delayed due to lag, but a similar effect is seen if the secondary is intentionally delayed using slaveDelay. !lagging.png|width=100%! * from D-E the cache is 95% full and insert rate drops considerably, possibly due to application threads doing evictions? * F-G and H-I seem to be seem to be related to checkpoints, possibly also in combination with the full cache? * the rate of pages walked for eviction is generally very high, about 6k times the rate of pages actually evicted, suggesting that the issue is difficulty finding pages to evict to keep the cache at target levels The high rate of pages walked for eviction suggests a connection to SERVER-22831, which also showed that symptom in connection with a full cache; however the above run was on 3.2.5-rc1 where SERVER-22831 was fixed, so it seems there is a different issue here. The above test involved * 3.2.5-rc1 * 2-node replica set * 25 GB cache * 100 GB oplog * 5 threads inserting 800 byte documents into 5 separate collections {code} for t in $(seq 5); do mongo --eval "" x = '' for (var i=0; i<800; i++) x += 'x' docs = [] for (var i=0; i<1000; i++) docs.push({x:x}) ops = [{ op: 'insert', ns: 'test.c' + '$t', doc: docs, }] res = benchRun({ ops: ops, seconds: 10000, parallel: 1 }) "" & done wait {code} ",5 +"SERVER-28940","04/24/2017 16:21:25","Make resmoke fixture setup/teardown their own testcases.","The fixture setup and teardown logs only go to logkeeper, but don't show up on the Evergreen sidebar. So the assertions can't be extracted without figuring out the logkeeper URL from the task log. Since everything else that logs to logkeeper (i.e. test and hooks) is a testcase that shows up on the sidebar, the fixture events should get their own spots as well.",5 +"SERVER-29999","07/06/2017 14:43:41","Implement FSM workload scheduler for concurrency_simultaneous task","We'll want to port [the {{scheduleWorkloads()}} function|https://github.com/mongodb/mongo/blob/r3.7.7/jstests/concurrency/fsm_libs/runner.js#L148-L206] from JavaScript to Python so that resmoke.py is in control over the groups of FSM workloads that are run together. The behavior around the number of subsets an individual FSM workload can be a part of should be identical to what it is for the {{concurrency_simultaneous.yml}} test suite today. Additionally, specifying {{\-\-suites=concurrency_simultaneous jstests/concurrency/fsm_workloads/workloadA.js jstests/concurrency/fsm_workloads/workloadB.js}} should run those two FSM workloads together rather than in sequence. That is to say, if a list of files is omitted then {{numSubsets}} groups of FSM workloads should be run and if a list of files is present then exactly 1 group of FSM workloads should be run. The latter aims to serve an engineer who wishes to reproduce a particular failure by running the same group of FSM workloads together.",5 +"SERVER-30204","07/18/2017 15:30:54","Create resmoke.py hook that drops all databases without restarting the cluster","-Add support for including background/perpetual workloads.- Background/perpetual workloads should be done as background threads in resmoke.py and don't need any special handling other than ensuring adequate documentation in our internal wiki. Also, add a new hook to drop all DBs and collections after every FSM test. When running in ""the same"" DB Or ""same collection"" FSM modes, pass the DB or collection that are not dropped to the new cleanup hook. CleanupOption of not dropping certain DBs will be taken into account as needed. The new hook will be used in place of CleanEveryN to avoid the overhead of spinning up a large cluster multiple times.",3 +"SERVER-31535","10/12/2017 22:01:50","Platform Support: remove Ubuntu 12.04 builds","Ubuntu 12.04 is EOL, so we should drop support for it in 3.6.",3 +"SERVER-31570","10/13/2017 21:37:13","Adjust mongobridge port allocations for easier debugging","Mongobridge assigns ports sequentially so you have to use modular arithmetic to figure out which bridges are associated with which mongods. If we assigned (bridge port) = (mongod port) + 10000, it would be very easy to map them to each other. ",3 +"SERVER-32223","12/08/2017 15:53:38","Add burn-in tests for configurations in other variants in addition to other suites","Many tests accidentally forget to mark ""requires_persistence"" or other things that are only tested on non-required builders. Similarly, mmap is now run on its own builder. Incorporating some of these ""variant"" flags into burn-in tests on the required builders could avoid easy test failures.",2 +"SERVER-33000","12/12/2017 20:40:31","Platform Support: add Ubuntu 18.04","This can go to the backlog for now, but Ubuntu 18.04 will be available soon, so we should be ready to add this platform as soon as there are images out there.",8 +"SERVER-32437","12/21/2017 17:18:27","Platform Support: add Amazon Linux 2","On December 13th, Amazon released Amazon Linux 2. The installation of our current package for AMZL fails: {noformat} [ec2-user@ip-172-31-32-108 ~]$ sudo yum install -y mongodb-enterprise Loaded plugins: langpacks, update-motd amzn2-core | 2.0 kB 00:00:00 https://repo.mongodb.com/yum/redhat/2017.12/mongodb-enterprise/3.6/x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 404 - Not Found Trying other mirror. Resolving Dependencies --> Running transaction check ---> Package mongodb-enterprise.x86_64 0:3.6.0-1.amzn1 will be installed --> Processing Dependency: mongodb-enterprise-tools = 3.6.0 for package: mongodb-enterprise-3.6.0-1.amzn1.x86_64 --> Processing Dependency: mongodb-enterprise-shell = 3.6.0 for package: mongodb-enterprise-3.6.0-1.amzn1.x86_64 --> Processing Dependency: mongodb-enterprise-server = 3.6.0 for package: mongodb-enterprise-3.6.0-1.amzn1.x86_64 --> Processing Dependency: mongodb-enterprise-mongos = 3.6.0 for package: mongodb-enterprise-3.6.0-1.amzn1.x86_64 --> Running transaction check ---> Package mongodb-enterprise-mongos.x86_64 0:3.6.0-1.amzn1 will be installed --> Processing Dependency: libsasl2.so.2()(64bit) for package: mongodb-enterprise-mongos-3.6.0-1.amzn1.x86_64 ---> Package mongodb-enterprise-server.x86_64 0:3.6.0-1.amzn1 will be installed --> Processing Dependency: libsasl2.so.2()(64bit) for package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 --> Processing Dependency: libnetsnmpmibs.so.20()(64bit) for package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 --> Processing Dependency: libnetsnmphelpers.so.20()(64bit) for package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 --> Processing Dependency: libnetsnmpagent.so.20()(64bit) for package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 --> Processing Dependency: libnetsnmp.so.20()(64bit) for package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 ---> Package mongodb-enterprise-shell.x86_64 0:3.6.0-1.amzn1 will be installed --> Processing Dependency: libsasl2.so.2()(64bit) for package: mongodb-enterprise-shell-3.6.0-1.amzn1.x86_64 ---> Package mongodb-enterprise-tools.x86_64 0:3.6.0-1.amzn1 will be installed --> Processing Dependency: libsasl2.so.2()(64bit) for package: mongodb-enterprise-tools-3.6.0-1.amzn1.x86_64 --> Finished Dependency Resolution Error: Package: mongodb-enterprise-mongos-3.6.0-1.amzn1.x86_64 (mongodb-enterprise) Requires: libsasl2.so.2()(64bit) Error: Package: mongodb-enterprise-shell-3.6.0-1.amzn1.x86_64 (mongodb-enterprise) Requires: libsasl2.so.2()(64bit) Error: Package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 (mongodb-enterprise) Requires: libnetsnmpmibs.so.20()(64bit) Error: Package: mongodb-enterprise-tools-3.6.0-1.amzn1.x86_64 (mongodb-enterprise) Requires: libsasl2.so.2()(64bit) Error: Package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 (mongodb-enterprise) Requires: libnetsnmpagent.so.20()(64bit) Error: Package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 (mongodb-enterprise) Requires: libnetsnmphelpers.so.20()(64bit) Error: Package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 (mongodb-enterprise) Requires: libnetsnmp.so.20()(64bit) Error: Package: mongodb-enterprise-server-3.6.0-1.amzn1.x86_64 (mongodb-enterprise) Requires: libsasl2.so.2()(64bit) You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest {noformat} I guess the missing dependencies could be manually added but still, Amazon Linux 2 is not officially supported at the moment",8 +"SERVER-32443","12/21/2017 17:45:07","Create a sys-perf task for running linkbench automatically against a replica set","Once the MongoDB linkbench implementation is relatively stable and is utilizing the new transactions API, we should work on integrating an automated benchmark into sys-perf. It would probably be good to run this benchmark against both a 1-node and 3-node replica set, at least.",1 +"SERVER-32642","01/10/2018 20:52:53","Return raw command response in the validate JS hook","Modify {{CollectionValidator.validateCollections}} to return the raw command response to make the class more flexible.",2 +"SERVER-32825","01/22/2018 05:53:35","Add the infrastructure for upgrade/downgrade of V2Unique indexes","Addition of new index format allows use of both V2 and V2Unique format. Initially we would add a gating variable to select the unique index format. Switch to FCV as the deciding factor for which format should be used. * Setting FCV=4.2 will cause all unique indexes to be updated to V2Unique. Do not update the content of the indexes. * Having FCV=4.0 would create older V2 format indexes",3 +"SERVER-32883","01/24/2018 20:32:07","Enhanced FSM testing for reading from secondaries","1. Change the {{secondary_reads_passthrough.yml}} test suite which was added as part of SERVER-34384 to use the ""forceSyncSourceCandidate"" failpoint as a server parameter to force secondary #2 to sync from secondary #1. 2. Add a new version of the {{concurrency_replication.yml}} test suite that uses a 5-node replica set with each secondary syncing in succession of each other (i.e. a linear chain), writeConcern=\{w: 1\}, readConcern=\{level: ""local"", afterClusterTime: ...\}, and readPreference=\{mode: ""secondary""\}. We'll also likely want to make a wrapper around a {{Mongo}} connection object to the primary and to a specific secondary so that an individual worker thread talks to a particular secondary all the time rather than some secondaries potentially never being read from. {quote} I think there's some additional complexity here because we want FSM worker thread to do reads from different secondary. (We'll probably pin it to a particular secondary similar to how we ""round-robin"" when using multiple mongos processes.) It seems like we'll want to have a Mongo connection object implemented in JavaScript that for commands which are present in [this list|https://github.com/mongodb/mongo/blob/6841ce738419923002958acc760e150769b6f615/jstests/libs/override_methods/set_read_preference_secondary.js#L10-L23] are routed via a direct connection to the secondary and commands not present in that list are routed via a direct connection to the primary. I think the existing ""connection cache"" in the concurrency framework makes it relatively straightforward to have direct connections to other nodes in the cluster. {quote} In creating this wrapper around two separate {{Mongo}} connection objects, we may also want to change how SERVER-34383 was implemented to construct a wrapper around a secondary's connection from the connection cache instead of creating a replica set connection for the worker thread. h6. Original description As part of SERVER-32606 it turned out that our testing of tailing the oplog on secondaries, including the case of chained replication, is light, while the code paths for secondary reads have gotten quite different now from reads on primaries. We should have a passthrough test where we test these behaviors. This is related to SERVER-32606, but was too big a task to do as part of that ticket.",8 +"SERVER-32997","01/30/2018 06:27:53","Mobile SE: Design and implement multi-reader or single-writer concurrency","SERVER-32675 resolved Mobile SE's some of the major issues with concurrency. I still see a few tests hitting either write conflicts or DB locked. These tests need to be investigated and a fix made accordingly. This ticket will track that effort.",13 +"SERVER-32999","01/30/2018 14:06:01","Platform Support: remove Debian 7","[Debian 7 is going EOL soon|https://wiki.debian.org/LTS]. Opening a ticket to deprecate and ultimately remove this platform.",3 +"SERVER-33002","01/30/2018 14:21:41","Platform Support: add MacOS 10.13 (High Sierra)","1. Add image and distro to evergreen 2. Run test build 3. Communicate availability to Storage 4. Once functional, we want to put this in rotation. 5. Review performance and open subsequent investigation ticket if makespan is substantially different",8 +"SERVER-33146","02/06/2018 19:28:00","mongod.service does not source system-wide environment variables as stated in the documentation","Following the [Kerberos tutorial|https://docs.mongodb.com/manual/tutorial/control-access-to-mongodb-with-kerberos-authentication/#krb5-ktname] I am unable to configure the keytab environment variable for the rpm installed mongod. The [service file|https://github.com/mongodb/mongo/blob/master/rpm/mongod.service] does not source /etc/sysconfig/mongod As a workaround I modified the service file to include {code} Environment=""KRB5_TRACE=/path/to/krb5.log"" Environment=""KRB5_KTNAME=/path/to/keytab"" {code} ",3 +"SERVER-33149","02/06/2018 22:05:07","createIndexes fails to report an error when index is not created with the specified name","The createIndexes command fails to report an error when an index is not created with the specified name because an index already exists with the same keys but with a *different* name.",3 +"SERVER-33340","02/14/2018 21:21:18","Turn on shared cache for non-shipping (non-push) builders","We originally planned to test for two weeks - but we may be able to shorten this since we began testing with: SERVER-33278 linux-64-repeated-execution linux-64-duroff linux-64-lsm enterprise-rhel-62-64-bit-inmem linux-64-ephemeralForTest ubuntu1404-rockdb ubuntu1604-debug-asan ubuntu1604-asan enterprise-rhel-62-64-bit-coverage",8 +"SERVER-33342","02/14/2018 21:23:38","Turn on shared scons cache for shipping builders.","enterprise-linux-64-amazon-ami ubuntu1204 ubuntu1404 ubuntu1604 amazon rhel62 rhel70 enterprise-ubuntu1204-64 enterprise-ubuntu1404-64 enterprise-ubuntu1604-64 enterprise-suse12-64 suse12 enterprise-suse11-64 suse11 enterprise-debian71-64 enterprise-debian81-64 debian71 debian81",3 +"SERVER-33427","02/21/2018 20:35:36","improve detectability of test failing because ShardingTest/ReplSetTest not shut down","{quote} > Max Hirschhorn Kevin Albertson, I noticed that failing to shut down a ShardingTest/ReplSetTest doesn't cause the test to log a ""failed to load"" line or a javascript stack trace (which makes sense, since which line would you error on?). As an outcome of SERVER-25777, the mongo shell could already exit with a non-zero return code without printing a ""failed to load"" message. > The line that _is_ logged (""a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test"") also isn't/can't be logged at LogSeverity::Error, since it's not logged by a server process (and which makes the log line contain "" E "", which is another thing I typically look for when a test fails without ""failed to load""). > > It took some confusion and additional scrolling through the logs for me to realize why my new test was reporting failure when it seemed like the test ran to completion successfully. Just a thought, in case there's something that can be done to make this failure easier to detect. Esha Maharishi, I think your confusion is understandable. The goal of the message was to make it more obvious to the user what the remediation ought to be. Since that message isn't being surfaced clearly enough, we should change the logic in the mongo shell so that it is. I don't see a reason that the mongo shell must use {{cout}} for logging the ""exiting with a failure due to unterminated processes"" message, so we could replace it with a call to {{severe()}} instead (and prefix the log message with 'F'). Do you think that would be sufficient for your purposes? Would you mind filing a new SERVER ticket for this improvement request? > For example, even just moving the ""a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test"" just before/after the ""Summary: 1 test(s) ran in 35.86 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)"" could help. Those messages are logged by two different processes (the mongo shell with the former and resmoke.py with the latter) so that isn't really something we'd consider. A related feature in resmoke.py would be to have special handling around certain exit codes from known processes. This case in the mongo shell would be one, but a memory leak detected by ASan/LSan would be another. {quote} See comment thread on SERVER-25640; one good idea from that thread is to make the mongo shell log an error message at a more severe log level.",2 +"SERVER-33470","02/23/2018 19:22:50","Log archival message, even if successful, in hook_test_archival.py","The archival message is only logged if there is an [error submitting files for archive|https://github.com/mongodb/mongo/blob/e3f361769cd13ba88aa24c1c0a71c76b187f64dd/buildscripts/resmokelib/testing/hook_test_archival.py#L115-L116]. We should have a logger.info message even on success, as there could be files that were skipped during the tar process.",1 +"SERVER-33641","03/02/2018 22:02:38","Call checkOplogs when checkReplicatedDataHashes fails","We should do the following to improve the relevance of diagnostics we have in the face of data inconsistency issues: # Update {{ReplSetTest#stopSet()}} to call {{ReplSetTest#checkOplogs()}} [in addition to {{ReplSetTest#checkReplicatedDataHashes()}}|https://github.com/mongodb/mongo/blob/r3.7.7/src/mongo/shell/replsettest.js#L2174-L2189]. Care should be taken to ensure that tests do not run significantly longer because they need to verify a large oplog when shutting down the replica set. # Update the {{PeriodicKillSecondaries}} hook to run the {{CheckReplOplogs}} hook [in addition to the {{CheckReplDBHash}} and {{ValidateCollections}} hooks|https://github.com/mongodb/mongo/blob/r3.7.7/buildscripts/resmokelib/testing/hooks/periodic_kill_secondaries.py#L137-L147]. h6. Original description We now save all of the data files, but it would be great if the test could check the oplogs automatically and note any differences.",3 +"SERVER-33651","03/05/2018 00:30:42","Mobile SE: Use full synchronous mode for SQLite writes","SQLite allows some startup configuration options. The defaults should work for most of our use cases, but might still need some fine tuning. This ticket is to study the available options, and come up with any non default that might better suit us. Also, at the conclusion of the ticket, update the design doc to specify these setting we come up with.",1 +"SERVER-33848","03/05/2018 18:40:43","Update compile flags for sys-perf and performance projects","MongoDB Community Server has had SSL support since 2.6 or 3.0, yet we weren't compiling with SSL support in sys-perf tests. Julian will fix that as part of his work. We should review our compile code to make sure it reflects what is actually shipped to users. (Of course, we may have debug symbols and other differences, if they are intentional.)",2 +"SERVER-33695","03/06/2018 16:49:02","Include the loop name in the before and after recovery files in powertest.py","Powercycle rsyncs the data files before and after a recovery runs (mongod started after power cycle event). We should name the resulting directory to also include the loop number, i.e., {{beforerecovery-1}}. We can still use rsync, and then rename the directory.",2 +"SERVER-33740","03/08/2018 04:47:50","Add Evergreen task for running powercycle against mobile storage engine","We should create a {{powercycle_mobile}} Evergreen task that performs powercycle testing while running against the mobile storage engine. It should be as straightforward as copy [the definition for the {{powercycle}} task|https://github.com/mongodb/mongo/blob/789f74a3837c0daf799be2b8296f339977c551b8/etc/evergreen.yml#L3862-L3879] and specifying {{\-\-storageEngine=mobile}} in the {{mongod_extra_options}} parameter to the ""run powercycle test"" function, although some care would need to be taken to disable the FSM clients if we do this ticket before resolving SERVER-32993. {code:yaml} - name: powercycle_mobile exec_timeout_secs: 7200 # 2 hour timeout for the task overall depends_on: - name: compile commands: - func: ""do setup"" - func: ""set up remote credentials"" vars: <<: *powercycle_remote_credentials - func: ""set up EC2 instance"" vars: <<: *powercycle_ec2_instance - command: expansions.update <<: *powercycle_expansions - func: ""run powercycle test"" vars: <<: *powercycle_test mongod_extra_options: --mongodOptions=\""--setParameter enableTestCommands=1 --storageEngine mobile\"" {code}",2 +"SERVER-33787","03/09/2018 19:23:01","Platform Support: remove Debian 7 builds","It will be EOLed in May.",2 +"SERVER-33817","03/12/2018 15:24:58","Powercycle test using kill mongod","Create a new powercycle task which has a {{crashOption}} to kill the monogd instead of crashing the remote host.",3 +"SERVER-33853","03/13/2018 17:40:24","Define a new test tag to temporarily disable a test","When engineers need to temporarily disable a JavaScript test, they have to update the YAML file for each suite the test runs under and explicitly blacklist it. We should define a new tag (e.g. ""temporarily_disabled"") that can be added to a test to quickly prevent it from running in all suites. The new tag exclusion could be specified in all the suites configuration files (and the tag would be no different than any other tag) or could be implemented in resmoke. ",1 +"SERVER-33936","03/15/2018 14:11:35","3.6 nightly builds not available for download","At https://www.mongodb.com/download-center#development there's a menu item for 3.6 nightly but the download link doesn't work. Also the [all binaries page|https://www.mongodb.org/dl/osx?_ga=2.203632720.1133726881.1521122429-282575863.1477067354&_gac=1.144900480.1520974309.EAIaIQobChMI_47T6Zbq2QIVRx6GCh2T0QmgEAAYASAAEgJkJPD_BwE] has links for 3.2-latest and 3.4-latest builds but no 3.6-latest.",3 +"SERVER-33926","03/16/2018 15:12:07","Unattended installation fails when deselecting Compass","h3. Summary * Unattended installation documentation does not explicitly mention that Compass will be installed when using ADDLOCAL=""all"". * Unattended installation does not appear to work unless Compass is selected. h3. Details The Install on Windows page of the mongodb manual provides a section describing [unattended installations|https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/#unattended-installation]. When using ADDLOCAL=""all"", the installer downloads and installs Compass. I do not want Compass, so I uninstalled both Compass and mongo CE before re-installing using ADDLOCAL=""Server,Client"". This causes the installer to fail and roll back the entire installation. {noformat} MSI (s) (68:E0) [10:11:35:725]: Executing op: ActionStart(Name=InstallCompassScript,Description=Installing MongoDB Compass... (this may take a few minutes),) MSI (s) (68:E0) [10:11:35:738]: Executing op: CustomActionSchedule(Action=InstallCompassScript,ActionType=1025,Source=BinaryData,Target=WixQuietExec64,CustomActionData=""C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe"" -NoLogo -NoProfile -NonInteractive -ExecutionPolicy Bypass -Command ""& '' ; exit $($Error.Count)"") MSI (s) (68:7C) [10:11:35:759]: Invoking remote custom action. DLL: C:\windows\Installer\MSI3154.tmp, Entrypoint: WixQuietExec64 MSI (s) (68:A0) [10:11:35:760]: Generating random cookie. MSI (s) (68:A0) [10:11:35:769]: Created Custom Action Server with PID 7948 (0x1F0C). MSI (s) (68:78) [10:11:35:826]: Running as a service. MSI (s) (68:78) [10:11:35:830]: Hello, I'm your 32bit Impersonated custom action server. WixQuietExec64: The expression after '&' in a pipeline element produced an object that was not valid. It must result in a command WixQuietExec64: name, a script block, or a CommandInfo object. WixQuietExec64: At line:1 char:3 WixQuietExec64: + & '' ; exit $($Error.Count) WixQuietExec64: + ~~ WixQuietExec64: + CategoryInfo : InvalidOperation: (:String) , RuntimeException WixQuietExec64: + FullyQualifiedErrorId : BadExpression WixQuietExec64: WixQuietExec64: Error 0x80070001: Command line returned an error. WixQuietExec64: Error 0x80070001: QuietExec64 Failed WixQuietExec64: Error 0x80070001: Failed in ExecCommon method CustomAction InstallCompassScript returned actual error code 1603 (note this may not be 100% accurate if translation happened inside sandbox) MSI (s) (68:E0) [10:11:36:320]: Note: 1: 2265 2: 3: -2147287035 MSI (s) (68:E0) [10:11:36:322]: User policy value 'DisableRollback' is 0 MSI (s) (68:E0) [10:11:36:322]: Machine policy value 'DisableRollback' is 0 Action ended 10:11:36: InstallFinalize. Return value 3. {noformat}",13 +"SERVER-33978","03/19/2018 16:11:50","References to sudo in evergreen.yml should use ${set_sudo}","The {{generate compile expansions}} and {{umount shared scons directory}} functions reference sudo: {code} sudo umount /efs || umount /efs || true {code} It should use the following form: {code} ${set_sudo} $sudo umount /efs || true {code} ",1 +"SERVER-34144","03/20/2018 19:18:28","Powercycle output improvements","Two requests. I apologize if these should be separate tickets: # Please log exactly what database and collection is being queried against for the canary checks. Also include the exact query and the criteria being used to verify the document is in the correct state (I think canary documents are insert only so the verification is an existence check). # Please rename the powercycle replset name to `powercycle`. It is currently misspelled as `powercyle`. Bringing up a node as a replica set member that can be queried requires an exact string match on the `--replSet` name. It's easy to read the replset name as `powercycle` to only later learn it's misspelled.",1 +"SERVER-34075","03/22/2018 20:31:00","powercycle_replication* must run replication recovery to observe canary documents","SERVER-29213 will break the powercycle_replication tests ability to query for the canary document after a crash. As such, that patch is temporarily disabling them. Specifically, after SERVER-29213, bringing a node up in standalone may result in stale data relative to what the node has accepted. The node has not lost the data, but simply, replication recovery needs to be done for the data to be queryable. The powercycle tests bring a node back up to check for the canary document in standalone mode and the node is brought up on a different port than is used when running as a replica set member. We suspect SERVER-34070 will make it easier to make the required changes to re-enable the powercycle_replication* tests. What's problematic is that running replication recovery requires starting the node up with the {{\-\-replSet}} option. However, a node running with {{\-\-replSet}} on a different port than in the replset config will not come up as a PRIMARY nor SECONDARY and thus not service reads.",3 +"SERVER-34150","03/27/2018 20:18:15","Create a passthrough that does clean shutdowns","Recoverable rollback does work specifically to make fastcount correct across clean shutdown. A passthrough that does clean shutdowns on primaries and secondaries could catch some bugs here and around general data consistency.",5 +"SERVER-34155","03/27/2018 20:28:14","Add clean shutdowns to kill_secondaries and kill_primaries passthroughs","Clean shutdowns leave the server in a different state then unclean shutdowns with respect to recover to a stable timestamp and are interesting by themselves. We do not have a lot of coverage around clean shutdowns and replication.",2 +"SERVER-34198","03/29/2018 19:35:57","Update content-type for gzip files","The content-type for the gzip files we make available for download is {{application/x-gzip}}. This causes problems with some browsers that don't understand this content type. According to [RFC6648|https://tools.ietf.org/html/rfc6648] the {{x-}} types are deprecated, and [RFC6713|https://tools.ietf.org/html/rfc6713] says the right type for gzip files is {{application/gzip}}. The work in this ticket has two parts: * Update evergreen.yml to use the right content type for new files * Update the existing files to have the right content type",3 +"SERVER-34241","03/30/2018 22:56:44","Remove the skipValidationNamespaces for config.transactions when WT-3998 is fixed","Remove skipValidationNamespaces for config.transactions for the ValidateCollections hook in replica_sets_kill_primary_jscore_passthrough.yml",1 +"SERVER-34242","03/30/2018 23:07:24","Enable causal consistency in concurrency_replication suite","Either enable causal consistency in concurrency_replication, or create a separate suite with causal consistency enabled. Causal consistency would allow us to run multi_statement_transaction_simple.js with varying writeConcerns, since we could use causal consistency to ensure that the worker threads read the writes performed during setup.",5 +"SERVER-34258","04/02/2018 21:13:00","Error from mount_drives.sh on Windows","The {{setfacl: No such file or directory}} error is observed when running mount_drives.sh on a windows remote instance: {noformat} [2018/04/02 12:27:48.862] Return code: 0 for command ['buildscripts/remote_operations.py', '--verbose', '--userHost', 'Administrator@10.122.9.230', '--sshConnectionOptions', '-i /cygdrive/c/data/mci/d50ca8d39e058ec1000e48604476e076/powercycle.pem -o GSSAPIAuthentication=no -o CheckHostIP=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=20 -o ConnectionAttempts=20', '--retries', '10', '--commands', "" bash mount_drives.sh -d 'd' -t ntfs -l 'e' -u Administrator:None; ls -ld /data/db /log; df; mount""] [2018/04/02 12:27:48.862] Warning: Permanently added '10.122.9.230' (ECDSA) to the list of known hosts. [2018/04/02 12:27:48.862] Looking for drive 'd' to mount data [2018/04/02 12:27:48.862] Looking for drive 'd' to mount data [2018/04/02 12:27:48.862] Looking for drive 'd' to mount data [2018/04/02 12:27:48.862] Looking for drive 'd' to mount data [2018/04/02 12:27:48.862] Found drive [2018/04/02 12:27:48.862] Junction created for c:\data <<===>> d:\data [2018/04/02 12:27:48.862] setfacl: No such file or directory [2018/04/02 12:27:48.862] ls: cannot access '/data/db': No such file or directory [2018/04/02 12:27:48.862] ls: cannot access '/log': No such file or directory [2018/04/02 12:27:48.862] Filesystem 1K-blocks Used Available Use% Mounted on [2018/04/02 12:27:48.862] C:/cygwin 67106812 49203748 17903064 74% / [2018/04/02 12:27:48.862] D: 104855548 95076 104760472 1% /cygdrive/d [2018/04/02 12:27:48.862] C:/cygwin/bin on /usr/bin type ntfs (binary,auto) [2018/04/02 12:27:48.862] C:/cygwin/lib on /usr/lib type ntfs (binary,auto) [2018/04/02 12:27:48.862] C:/cygwin on / type ntfs (binary,auto) [2018/04/02 12:27:48.862] C: on /cygdrive/c type ntfs (binary,posix=0,user,noumount,auto) [2018/04/02 12:27:48.862] D: on /cygdrive/d type ntfs (binary,posix=0,user,noumount,auto) {noformat} ",5 +"SERVER-34298","04/04/2018 15:31:51","PeriodicKillSecondaries will still run after_suite following an after_test failure","After the PeriodicKillSecondaries hook runs it resets its {{_start_time}} variable but when the underlying test fails, this step is bypassed causing the following {{after_suite}} that checks the variable to run.",1 +"SERVER-34306","04/04/2018 17:38:28","validate_collections.js hook should report node that failed validation","When reporting a failure, it can be hard to trace back which node the validation failed on. E.g: {noformat} [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.107+0000 connecting to: mongodb://localhost:23500,localhost:23501,localhost:23502/?replicaSet=rs [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.125+0000 Collection validation failed with response: { [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.134+0000 ""ns"" : ""config.transactions"", [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.134+0000 ""nInvalidDocuments"" : NumberLong(0), [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.163+0000 ""nrecords"" : 8, [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.164+0000 ""nIndexes"" : 1, [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.187+0000 ""keysPerIndex"" : { [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.189+0000 ""config.transactions.$_id_"" : 9 [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.191+0000 }, [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.208+0000 ""indexDetails"" : { [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.210+0000 ""config.transactions.$_id_"" : { [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.211+0000 ""valid"" : false [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.214+0000 } [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.232+0000 }, [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.236+0000 ""valid"" : false, [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.249+0000 ""warnings"" : [ ], [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.249+0000 ""errors"" : [ [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.256+0000 ""one or more indexes contain invalid index entries."" [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.261+0000 ], [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.300+0000 ""advice"" : ""A corrupt namespace has been detected. See http://dochub.mongodb.org/core/data-recovery for recovery steps."", [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.315+0000 ""ok"" : 1, [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.320+0000 ""operationTime"" : Timestamp(1522339379, 8), [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.331+0000 ""$clusterTime"" : { [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.357+0000 ""clusterTime"" : Timestamp(1522339379, 8), [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.374+0000 ""signature"" : { [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.379+0000 ""hash"" : BinData(0,""AAAAAAAAAAAAAAAAAAAAAAAAAAA=""), [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.391+0000 ""keyId"" : NumberLong(0) [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.401+0000 } [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.431+0000 } [ValidateCollections:job14:explain5:ValidateCollections] 2018-03-29T16:03:00.440+0000 } {noformat}",1 +"SERVER-34371","04/06/2018 22:36:51","Stop ignoring errors when the test fixture fails to delete data files","The standalone test fixture [attempts to delete data files|https://github.com/mongodb/mongo/blob/73cf755e6e4cf5e0e3f43e0d98954c583ed00060/buildscripts/resmokelib/testing/fixtures/standalone.py#L53] before starting a node. We ignore errors when deleting data files, so we don't know if the deletion was successful. An example of when a deletion could fail is on Windows when another process is keeping the file open. When we fail to delete data files, tests can fail because they expect to start up with clean data files. We should add logging to understand when we fail to delete data files.",2 +"SERVER-34374","04/08/2018 00:51:55","resmoke.py uses bytestrings for representing pathnames, leading to silently failing to clear the dbpath on Windows","https://bugs.python.org/issue24672 describes an issue in Python where {{shutil.rmtree()}} fails to delete files with non-ASCII pathnames when a bytestring (i.e. a {{str}} instance in Python 2). [The ntpath.py module in Python preserves type of its argument|https://github.com/python/cpython/blob/6a336f6484a13c01516b6bfc3b767075cc2cb4f7/Lib/ntpath.py#L398-L401] so it sufficient to use a {{unicode}} instance instead in order to have Python call the W-suffixed Win32 APIs that return Unicode strings. I've verified on a Windows spawn host that the following patch to config.py addresses this issue. The change to parser.py is to just do the same if someone were to specify {{\-\-dbpathPrefix}} when trying to reproduce a failure outside of Evergreen. {code:diff} diff --git a/buildscripts/resmokelib/config.py b/buildscripts/resmokelib/config.py index 66753c389d..2f13c2df96 100644 --- a/buildscripts/resmokelib/config.py +++ b/buildscripts/resmokelib/config.py @@ -34,7 +34,7 @@ DEFAULT_BENCHMARK_MIN_TIME = datetime.timedelta(seconds=5) # Default root directory for where resmoke.py puts directories containing data files of mongod's it # starts, as well as those started by individual tests. -DEFAULT_DBPATH_PREFIX = os.path.normpath(""/data/db"") +DEFAULT_DBPATH_PREFIX = os.path.normpath(u""/data/db"") # Names below correspond to how they are specified via the command line or in the options YAML file. DEFAULTS = { diff --git a/buildscripts/resmokelib/parser.py b/buildscripts/resmokelib/parser.py index d9f40da3e9..1353f899fd 100644 --- a/buildscripts/resmokelib/parser.py +++ b/buildscripts/resmokelib/parser.py @@ -352,7 +352,7 @@ def update_config_vars(values): # pylint: disable=too-many-statements _config.ARCHIVE_LIMIT_TESTS = config.pop(""archive_limit_tests"") _config.BASE_PORT = int(config.pop(""base_port"")) _config.BUILDLOGGER_URL = config.pop(""buildlogger_url"") - _config.DBPATH_PREFIX = _expand_user(config.pop(""dbpath_prefix"")) + _config.DBPATH_PREFIX = unicode(_expand_user(config.pop(""dbpath_prefix""))) _config.DBTEST_EXECUTABLE = _expand_user(config.pop(""dbtest_executable"")) _config.DRY_RUN = config.pop(""dry_run"") _config.EXCLUDE_WITH_ANY_TAGS = _tags_from_list(config.pop(""exclude_with_any_tags"")) {code} However, I'm not sure if more special handling on Linux platforms is necessary as the changes from https://github.com/pypa/setuptools/commit/5ad13718686bee04a93b4e86929c1bb170f14a52 suggest we shouldn't use Unicode string literals if {{sys.getfilesystemencoding() == 'ascii'}}. We currently set the {{LANG=C}} environment variable on all of Ubuntu 16.04 builders (SERVER-31717, SERVER-33184) so it isn't clear why we'd even be able to create files with non-ASCII pathnames. CC [~mark.benvenuto] {noformat} $ LANG=C python -c 'import sys; print(sys.getfilesystemencoding())' ANSI_X3.4-1968 {noformat}",2 +"SERVER-34380","04/09/2018 12:48:49","system_perf.yml: Remove the compile_proxy task","In SERVER-33513 I added a ""compile_proxy"" task in system_perf.yml as a layer of indirection. This was a workaround due to not being able to use {{depends_on}} on a variant level. That has now been implemented in EVG-2923. We should therefore use that instead. Note: While the compile_proxy task is only used for master, the end result of this ticket should also be backported to stable branches, so that system_perf.yml is as consistent as possible across the branches.",2 +"SERVER-34405","04/10/2018 15:58:36","Add sys-perf move_chunk_waiting task for WT. ","Master branch only",1 +"SERVER-34420","04/11/2018 20:44:52","Set the idle event in the stepdown thread even if the thread exits in stepdown.py","In [stepdown.py|https://github.com/mongodb/mongo/blob/c6af07af7922c58293e86992ff9ef0a9ad77d398/buildscripts/resmokelib/testing/hooks/stepdown.py#L157], if the stepdown thread exits before setting the idle event, we endlessly keep waiting to pause the stepdown thread. {noformat} [2018/04/11 00:18:17.596] Thread 140674028517120: [2018/04/11 00:18:17.596] File ""/opt/mongodbtoolchain/v2/lib/python2.7/threading.py"", line 774, in __bootstrap [2018/04/11 00:18:17.596] self.__bootstrap_inner() [2018/04/11 00:18:17.596] File ""/opt/mongodbtoolchain/v2/lib/python2.7/threading.py"", line 801, in __bootstrap_inner [2018/04/11 00:18:17.597] self.run() [2018/04/11 00:18:17.597] File ""/opt/mongodbtoolchain/v2/lib/python2.7/threading.py"", line 754, in run [2018/04/11 00:18:17.597] self.__target(*self.__args, **self.__kwargs) [2018/04/11 00:18:17.597] File ""/data/mci/bfe6b97b5e63a6c8969ac499179caac2/src/buildscripts/resmokelib/testing/job.py"", line 45, in __call__ [2018/04/11 00:18:17.597] self._run(queue, interrupt_flag) [2018/04/11 00:18:17.597] File ""/data/mci/bfe6b97b5e63a6c8969ac499179caac2/src/buildscripts/resmokelib/testing/job.py"", line 83, in _run [2018/04/11 00:18:17.597] self._execute_test(test) [2018/04/11 00:18:17.597] File ""/data/mci/bfe6b97b5e63a6c8969ac499179caac2/src/buildscripts/resmokelib/testing/job.py"", line 115, in _execute_test [2018/04/11 00:18:17.597] self._run_hooks_after_tests(test) [2018/04/11 00:18:17.597] File ""/data/mci/bfe6b97b5e63a6c8969ac499179caac2/src/buildscripts/resmokelib/testing/job.py"", line 168, in _run_hooks_after_tests [2018/04/11 00:18:17.597] self._run_hook(hook, hook.after_test, test) [2018/04/11 00:18:17.597] File ""/data/mci/bfe6b97b5e63a6c8969ac499179caac2/src/buildscripts/resmokelib/testing/job.py"", line 121, in _run_hook [2018/04/11 00:18:17.597] hook_function(test, self.report) [2018/04/11 00:18:17.597] File ""/data/mci/bfe6b97b5e63a6c8969ac499179caac2/src/buildscripts/resmokelib/testing/hooks/stepdown.py"", line 75, in after_test [2018/04/11 00:18:17.597] self._stepdown_thread.pause() [2018/04/11 00:18:17.597] File ""/data/mci/bfe6b97b5e63a6c8969ac499179caac2/src/buildscripts/resmokelib/testing/hooks/stepdown.py"", line 157, in pause [2018/04/11 00:18:17.597] self._is_idle_evt.wait() [2018/04/11 00:18:17.597] File ""/opt/mongodbtoolchain/v2/lib/python2.7/threading.py"", line 614, in wait [2018/04/11 00:18:17.597] self.__cond.wait(timeout) [2018/04/11 00:18:17.597] File ""/opt/mongodbtoolchain/v2/lib/python2.7/threading.py"", line 340, in wait [2018/04/11 00:18:17.597] waiter.acquire() {noformat} This could be fixed in [stepdown.py|https://github.com/mongodb/mongo/blob/c6af07af7922c58293e86992ff9ef0a9ad77d398/buildscripts/resmokelib/testing/hooks/stepdown.py#L181-L185]: {noformat} def _step_down_all(self): self._is_idle_evt.clear() try: for rs_fixture in self._rs_fixtures: self._step_down(rs_fixture) finally: self._is_idle_evt.set() {noformat}",2 +"SERVER-34451","04/12/2018 23:43:11","MongoDB installation on Windows error: setup wizard ended prematurely","There are few instances where users are failing to install MongoDB on Windows (e.g. win10) with a setup error message {{setup wizard ended prematurely}} The general solution seems to just uncheck MongoDB Compass installation. This is likely to be caused by one or more of: * Firewall/Antivirus blocking access to {{https://compass.mongodb.com/api/v2/download/latest}} * Server has no access to the Internet * PowerShell execution policy [Set ExecutionPolicy|https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.security/set-executionpolicy?view=powershell-6] preventing the script to be executed. ",5 +"SERVER-34456","04/13/2018 06:44:44","Add comprehensive testing for KeyString length decoding","Unique index key format would be changed to enable PIT reads from secondary. As a result, unique indexes could have both old and new format keys after an upgrade. Reading keys from mixed format index requires distinguishing old and new format keys. Index keys are stored as KeyString objects. To read keys correctly from mixed format indexes, a function was written to decode the KeyString and calculate the size of key. This ticket aims to add comprehensive test for this new KeyString length decoding function.",5 +"SERVER-34486","04/15/2018 03:51:29","Set transactionLifetimeLimitSeconds=1 in the fuzzer suites that run with replication enabled","{code:yaml} mongod_options: set_parameters: transactionLifetimeLimitSeconds: 1 {code} should be added to the following test suites in order to avoid having the fuzzer trigger spurious Evergreen timeouts when it goes to wait for itself to be able to take a non-intent lock after having started a transaction. * {{jstestfuzz_interrupt_replication.yml}} * {{jstestfuzz_replication.yml}} * {{jstestfuzz_replication_initsync.yml}} * {{jstestfuzz_replication_session.yml}} * {{jstestfuzz_sharded_causal_consistency.yml}} (uses replica set shards) * {{jstestfuzz_sharded_continuous_stepdown.yml}} (uses replica set shards)",1 +"SERVER-34488","04/15/2018 21:53:16","hang_analyzer.py fails because ptrace protection is not disabled","{noformat} [2018/04/11 23:57:55.068] Return code: 0 for command ['buildscripts/remote_operations.py', '--verbose', '--userHost', 'ubuntu@10.122.4.143', '--sshConnectionOptions', '-i /data/mci/70130a2c6fcea1306c5dce5ebe2ec512/powercycle.pem -o GSSAPIAuthentication=no -o CheckHostIP=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=20 -o ConnectionAttempts=20', '--retries', '10', '--commands', 'PATH=""/opt/mongodbtoolchain/gdb/bin:$PATH"" /opt/mongodbtoolchain/v2/bin/python2 buildscripts/hang_analyzer.py -c -o file -o stdout -p dbtest,java,mongo,mongod,mongos,python,_test -g bsondump,mongodump,mongoexport,mongofiles,mongoimport,mongoreplay,mongorestore,mongostat,mongotop', '--commandDir', '/log/powercycle'] ... [2018/04/11 23:57:55.068] Found 1 interesting processes [(2858, 'mongod')] [2018/04/11 23:57:55.068] Debugger /opt/mongodbtoolchain/gdb/bin/gdb, analyzing mongod process with PID 2858 [2018/04/11 23:57:55.068] Dumping core to dump_mongod.2858.core ... [2018/04/11 23:57:55.068] ['/opt/mongodbtoolchain/gdb/bin/gdb', '--quiet', '--nx', '-ex', 'set interactive-mode off', '-ex', 'set print thread-events off', '-ex', 'file mongod', '-ex', 'attach 2858', '-ex', 'info sharedlibrary', '-ex', 'info threads', '-ex', 'set python print-stack full', '-ex', 'echo \\nWriting raw stacks to debugger_mongod_2858_raw_stacks.log.\\n', '-ex', 'set logging redirect on', '-ex', 'set logging file debugger_mongod_2858_raw_stacks.log', '-ex', 'set logging on', '-ex', 'thread apply all bt', '-ex', 'set logging off', '-ex', 'source /log/powercycle/buildscripts/gdb/mongo.py', '-ex', 'source /log/powercycle/buildscripts/gdb/mongo_printers.py', '-ex', 'source /log/powercycle/buildscripts/gdb/mongo_lock.py', '-ex', 'mongodb-uniqstack mongodb-bt-if-active', '-ex', 'set scheduler-locking on', '-ex', 'gcore dump_mongod.2858.core', '-ex', 'mongodb-dump-locks', '-ex', 'mongodb-show-locks', '-ex', 'mongodb-waitsfor-graph debugger_waitsfor_mongod_2858.gv', '-ex', 'mongodb-javascript-stack', '-ex', 'set confirm off', '-ex', 'quit'] [2018/04/11 23:57:55.068] Reading symbols from mongod...Reading symbols from /log/powercycle/mongod.debug...done. [2018/04/11 23:57:55.068] done. [2018/04/11 23:57:55.068] Attaching to program: /log/powercycle/mongod, process 2858 [2018/04/11 23:57:55.068] ptrace: Operation not permitted. {noformat}",3 +"SERVER-34497","04/16/2018 18:22:02","Remove CheckPrimary hook","It's unnecessary due to SERVER-31670.",1 +"SERVER-34539","04/18/2018 16:48:58","Re-enable sharded mapReduce concurrency testing and only use a single mongos","Concurrent sharded mapReduce testing was disabled as part of SERVER-20057. However, it appears the bug in this ticket only occurs when there are multiple mongoses. I believe this testing should be re-enabled, but only use one mongos, since there may be other concurrent sharded mapReduce related issues, such as SERVER-33538, that can be found from this test coverage.",2 +"SERVER-34548","04/18/2018 19:35:14","Make FSM workloads able to be run via burn_in_tests.py (with --repeat=2)","Individual FSM workloads are not designed to clean up after themselves - rather, they expect the runners to take care of that. This can be problematic when you create or modify a workload, as that is picked up by burn_in_tests, which runs that workload several times without evident cleanup between runs. As a result, that test can conflict with itself (e.g. trying to create a database that already exists by the second run).",3 +"SERVER-34555","04/18/2018 22:42:35","Migrate concurrency_sharded_with_stepdowns{,_and_balancer}.yml test suites to run directly via resmoke.py","The changes from SERVER-19630 make it so FSM workloads run as individual test cases in the {{concurrency_sharded_causal_consistency\{,_and_balancer\}.yml}} and {{concurrency_sharded_replication\{,_and_balancer\}.yml}} test suites. The {{concurrency_sharded_with_stepdowns\{,_and_balancer\}.yml}} test suites weren't migrated to the new-style because there are parts of setting up the environment to run the FSM workloads under that aren't prepared to have the primary of the CSRS or replica set shard stepped down. Rather than trying to get the all the retry logic correct (e.g. [by handling the {{ManualInterventionRequired}} when attempting to shard the collection|https://github.com/mongodb/mongo/blob/53c378f137bc4f577f6c92f71f47ede70ec93456/jstests/libs/override_methods/mongos_manual_intervention_actions.js]), we should instead delay when resmoke.py's {{StepdownThread}} actually runs after the FSM workload has started. A sketch of the interactions between [the {{_StepdownThread}} class|https://github.com/mongodb/mongo/blob/14d03a79f55d69ccdd27bb4a08906a4be5eb4a8e/buildscripts/resmokelib/testing/hooks/stepdown.py#L98] and {{resmoke_runner.js}} via the filesystem is described in the appropriate place of the {{runWorkloads()}} function below. {code:diff} diff --git a/jstests/concurrency/fsm_libs/resmoke_runner.js b/jstests/concurrency/fsm_libs/resmoke_runner.js index d94fd4e31c..af0afca2bb 100644 --- a/jstests/concurrency/fsm_libs/resmoke_runner.js +++ b/jstests/concurrency/fsm_libs/resmoke_runner.js @@ -104,6 +104,15 @@ cleanup.push(workload); }); + // After the $config.setup() function has been called, it is safe for the stepdown + // thread to start running. The main thread won't attempt to interact with the cluster + // until all of the spawned worker threads have finished. + // + // TODO: Call writeFile('./stepdown_permitted', '') function to indicate that the + // stepdown thread can run. It is unnecessary for the stepdown thread to indicate that + // it is going to start running because it will eventually after the worker threads have + // started. + // Since the worker threads may be running with causal consistency enabled, we set the // initial clusterTime and initial operationTime for the sessions they'll create so that // they are guaranteed to observe the effects of the workload's $config.setup() function @@ -128,17 +137,34 @@ } try { - // Start this set of worker threads. - threadMgr.spawnAll(cluster, executionOptions); - // Allow 20% of the threads to fail. This allows the workloads to run on - // underpowered test hosts. - threadMgr.checkFailed(0.2); + try { + // Start this set of worker threads. + threadMgr.spawnAll(cluster, executionOptions); + // Allow 20% of the threads to fail. This allows the workloads to run on + // underpowered test hosts. + threadMgr.checkFailed(0.2); + } finally { + // Threads must be joined before destruction, so do this even in the presence of + // exceptions. + errors.push(...threadMgr.joinAll().map( + e => new WorkloadFailure( + e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' ')))); + } } finally { - // Threads must be joined before destruction, so do this even in the presence of - // exceptions. - errors.push(...threadMgr.joinAll().map( - e => new WorkloadFailure( - e.err, e.stack, e.tid, 'Foreground ' + e.workloads.join(' ')))); + // Until we are guaranteed that the stepdown thread isn't running, it isn't safe for + // the $config.teardown() function to be called. We should signal to resmoke.py that + // the stepdown thread should stop running and wait for the stepdown thread to + // signal that it has stopped. + // + // TODO: Call removeFile('./stepdown_permitted') so the next time the stepdown + // thread checks to see if it should keep running that it instead stops stepping + // down the cluster and creates a file named ""./stepdown_off"". + // + // TODO: Call the ls() function inside of an assert.soon() / assert.soonNoExcept() + // and wait for the ""./stepdown_off"" file to be created. assert.soonNoExcept() + // should probably be used so that an I/O-related error from attempting to list the + // contents of the directory while the file is being created doesn't lead to a + // JavaScript exception that causes the test to fail. } } finally { // Call each workload's teardown function. After all teardowns have completed check if {code}",5 +"SERVER-34567","04/19/2018 16:56:51","Remove the ""build new tools"" step from the compile benchmark task","We only specify {{\-\-use-new-tools}} to SCons in the {{$\{task_compile_flags\}}} expansions that need to, so building new tools in the compile_benchmarks task is unnecessary.",1 +"SERVER-34579","04/19/2018 21:45:51","Do not populate indexDetails for mobile storage engine","This causes apitest_dbcollection.js to fail on mongoe. Reproduce by: {noformat} python buildscripts\resmoke.py --suites=core --mongod=./mongoe jstests/core/apitest_dbcollection.js {noformat} {noformat} 2018-04-19T16:44:24.586-0400 E QUERY [js] Error: [0] != [0] are equal : a_1 exists in indexDetails but contains no information: { ""ns"" : ""test.apttest_dbcollection"", ""size"" : 33, ""count"" : 1, ""avgObjSize"" : 33, ""storageSize"" : 33, ""capped"" : false, ""nindexes"" : 2, ""indexDetails"" : { ""a_1"" : { } }, ""totalIndexSize"" : 22, ""indexSizes"" : { ""_id_"" : 16, ""a_1"" : 6 }, ""ok"" : 1 } : doassert@src/mongo/shell/assert.js:18:14 assert.neq@src/mongo/shell/assert.js:207:9 checkIndexDetails@.\jstests\core\apitest_dbcollection.js:217:1 @.\jstests\core\apitest_dbcollection.js:226:1 @.\jstests\core\apitest_dbcollection.js:151:2 {noformat} ",5 +"SERVER-34587","04/20/2018 15:26:39","Update signing key to 4.0","We're using 3.8 key right now, should switch to 4.0",1 +"SERVER-34593","04/20/2018 17:05:16","resmoke.py should be able to run multiple instances of a single test in parallel","If you try to run a single Javascript test with resmoke.py using a combination of the {{--repeat=N}} flag and the {{-j=M}} flag, it will still run the test sequentially. e.g. {noformat} python buildscripts/resmoke.py --repeat=100 -j10 sometest.js {noformat} Ideally it could parallelize repeated execution of a single test. For example, if {{--repeat=100}} and {{-j=10}}, it would run 10 instances of the test in parallel, that would each execute 10 times. This could be very helpful for quickly trying to reproduce a particular test failure locally.",3 +"SERVER-34598","04/20/2018 20:46:13","Add millisecond-granularity wallclock times for the various metrics in replSetGetStatus's optimes subdocument","The response to {{replSetGetStatus}} includes a subdocument named {{optimes}}, which contains the OpTime for various important oplog events, including {{lastCommittedOpTime}}, {{readConcernMajorityOpTime}}, {{appliedOpTime}} and {{durableOpTime}}. As of MongoDB 3.6, the actual oplog entries corresponding to these OpTimes have a wall clock time with milliseconds resolution recorded in them. We should extend {{replSetGetStatus}} to report the wall clock times corresponding to these optimes, so that we can (usually) get millisecond-granularity measurements of replication lag and back-to-back majority read-modify-write latencies. The work for this ticket is split into SERVER-40080, SERVER-40078, and SERVER-40353. SERVER-34598 is an umbrella ticket with no work items.",5 +"SERVER-34614","04/23/2018 15:54:31","parallelTester should use a different connection for each new test","Because each test uses the same connection, tests can share a set of authenticated users, and can interfere with the state of getLastError. Each new test should get its own connection.",2 +"SERVER-34624","04/23/2018 21:50:58","Remove C++ 14 builder from 3.4","Remove the C++14 builder, which happens to be the only DEBUG builder run on the ""test""-sized VMs, from the 3.4 branch to avoid spurious timeouts. *Original Description* Reduce number of jobs for all DEBUG builders using the strategy described in SERVER-29355",1 +"SERVER-34647","04/24/2018 19:36:00","Write test for transaction that opens multiple cursors","We want to write a test that exercises a transaction that opens multiple cursors and reads data from them. We should make sure that the results returned from multiple cursors inside a transaction all return data from the same snapshot. We should also test this for multiple cursors on the same collection and multiple cursors that span different collections. Additionally, we should verify that killing multiple cursors that open inside a transaction behaves correctly.",3 +"SERVER-34652","04/24/2018 20:17:59","Write tests for transactions that write to a collection that is concurrently dropped","We want to test the interaction between transactions and collection drops. We should test the following cases: # A transaction writes to a collection on one session. On a separate session, that a drop is attempted on that collection. The collection drop should block until either the transaction commits or maxTimeMS expires. We should also test this for dropDatabase. # Create collections A and B. Start a transaction T by reading from a collection A. In a separate session, drop collection B. Then try to write to collection B in transaction T. Verify that this write fails, since the collection was dropped.",3 +"SERVER-34654","04/24/2018 20:36:40","Write test for a transaction that writes to a collection that is created concurrently","We want to verify that transactions interact with collection creation operations correctly. To verify this, we should test the following case: # A transaction writes to a collection C that doesn't exist. In a different session, collection C is then created. Verify that the transaction can then write to collection C and commit. # A transactions writes to a collection C that exists. A client in a different session that tries to create C should fail. *Note: when writing these test cases, we should verify any behavior that differs from what is described above and make sure it matches the desired/expected behavior.",3 +"SERVER-34680","04/25/2018 20:43:01","jsCore_mobile task fails if bypass compile is triggered","The jsCore_mobile task can fail if bypass compile is triggered because the mongoe binary is not generated during compile. For an example, see: https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_required_mobile_jsCore_mobile_patch_00f32ac53c595f098ea200ab7b9d7278be4a5193_5ae0cc70c9ec44641fc7335c_18_04_25_18_44_14 ",3 +"SERVER-34703","04/26/2018 19:50:07","Write test for transactions with concurrent index drops and creates","We want to test the interaction of transaction writes and index creates and drops. We should test this for a transaction that writes to some documents covered by an index that is created concurrently, and similarly an index that is dropped concurrently. We should verify that the drop/createIndex blocks until the transaction commits or until maxTimeMS expires. We may consider just doing this and SERVER-34704 as part of the same test.",1 +"SERVER-34704","04/26/2018 19:53:58","Write test for transactions on collections that are renamed concurrently","We want to test the interaction of transactions and collection renames. We should test cases where a transactions writes to a collection A that is concurrently renamed to B. We should verify the renameCollection blocks until the transaction commits or until maxTimeMS expires. We should test this when A and B are in the same database and when they are in different databases.",1 +"SERVER-34706","04/26/2018 20:21:13","Write unit test to verify transactions oplog entries are created correctly","We should verify that transactions oplog entries are created in the proper format. This can likely be tested in {{op_observer_impl_test.cpp}}. We should also be able to remove any logic from JS tests that explicitly check oplog entry formats once this unit test is added.",3 +"SERVER-34711","04/27/2018 05:32:35","Enable burn_in_tests to understand Evergreen task selectors","After adopting task selector/tag approach to address SERVER-33647, the only task failed in the patch build is {{burn_in_tests}}. It looks the {{buildscripts/ciconfig/evergreen.py}} script doesn't understand task selectors. Link to the patch build: https://evergreen.mongodb.com/version/5ae1639dc9ec44641fd52543 {noformat} [2018/04/26 19:52:37.263] $python buildscripts/burn_in_tests.py --branch=master --buildVariant=$build_variant --testListOutfile=jstests/new_tests.json --noExec $burn_in_args [2018/04/26 19:52:40.420] Traceback (most recent call last): [2018/04/26 19:52:40.420] File ""buildscripts/burn_in_tests.py"", line 403, in [2018/04/26 19:52:40.420] main() [2018/04/26 19:52:40.420] File ""buildscripts/burn_in_tests.py"", line 354, in main [2018/04/26 19:52:40.420] evergreen_conf = evergreen.EvergreenProjectConfig(values.evergreen_file) [2018/04/26 19:52:40.420] File ""C:\data\mci\77bf60c033fb9b50b3d52f01979f8e40\burn_in_tests_clonedir\buildscripts\ciconfig\evergreen.py"", line 26, in __init__ [2018/04/26 19:52:40.420] for variant_dict in self._conf[""buildvariants""] [2018/04/26 19:52:40.420] File ""C:\data\mci\77bf60c033fb9b50b3d52f01979f8e40\burn_in_tests_clonedir\buildscripts\ciconfig\evergreen.py"", line 111, in __init__ [2018/04/26 19:52:40.420] for t in conf_dict[""tasks""] [2018/04/26 19:52:40.420] File ""C:\data\mci\77bf60c033fb9b50b3d52f01979f8e40\burn_in_tests_clonedir\buildscripts\ciconfig\evergreen.py"", line 188, in __init__ [2018/04/26 19:52:40.420] Task.__init__(self, task.raw) [2018/04/26 19:52:40.420] AttributeError: 'NoneType' object has no attribute 'raw' [2018/04/26 19:52:40.429] /cygdrive/c/data/mci/77bf60c033fb9b50b3d52f01979f8e40/burn_in_tests_clonedir [2018/04/26 19:52:40.429] Command failed: command [pid=1640] encountered problem: exit status 1 [2018/04/26 19:52:40.429] Task completed - FAILURE. {noformat} cc: [~max.hirschhorn]",3 +"SERVER-34738","04/27/2018 20:12:45","mongo_lock.py graph should display lock type for LockManager locks","When a thread is waiting on a lock in the lock graph, it would be useful to know what lock mode it is waiting on.",2 +"SERVER-34778","05/01/2018 21:55:23","Add support for specifying atClusterTime to the dbhash command","This makes it possible to detect transient data inconsistency failures (e.g. related to timestamping differences between the primary and secondary of a replica set) that have been resolved by the time we've finished waiting for all operations to have replicated. This requires changing the ""dbhash"" command to [call {{getMinimumVisibleSnapshot()}}, etc. as {{AutoGetCollectionForRead}} does currently|https://github.com/mongodb/mongo/blob/r3.7.7/src/mongo/db/db_raii.cpp#L135-L167].",3 +"SERVER-34779","05/01/2018 21:56:01","Check the dbhash periodically in a new version of the replica_sets_jscore_passthrough.yml test suite","We should create another version of {{jstests/libs/override_methods/run_check_repl_dbhash.js}} and possibly of {{ReplSetTest#checkReplicatedDataHashes()}} that *doesn't* require (1) flushing background indexes with collMod operations, (2) fsync+locking the primary, and (3) call {{ReplSetTest#awaitReplication()}}. A background thread inside of resmoke.py should then run the ""dbhash"" command periodically via the hook file and cause the test to be marked as a failure if a data inconsistency is detected.",8 +"SERVER-34788","05/02/2018 16:09:34","Improve error message when assert.commandWorked/Failed gets an unexpected type","When {{assert.commandWorked()}} or {{assert.commandFailed()}} is passed a non-object, we throw: {code:js} function _assertCommandWorked(res, msg, {ignoreWriteErrors, ignoreWriteConcernErrors}) { _validateAssertionMessage(msg); if (typeof res !== ""object"") { doassert(""unknown response given to commandWorked""); } {code} This leads to stack traces that are hard to reason about. {code} [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.654+0000 2018-04-29T18:58:39.653+0000 E QUERY [js] Error: unknown response given to commandWorked : [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.654+0000 doassert@src/mongo/shell/assert.js:18:14 [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.654+0000 _assertCommandWorked@src/mongo/shell/assert.js:485:13 [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.654+0000 assert.commandWorked@src/mongo/shell/assert.js:594:16 [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.654+0000 CollectionValidator/this.validateNodes/<@jstests/hooks/validate_collections.js:128:17 [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.655+0000 CollectionValidator/this.validateNodes@jstests/hooks/validate_collections.js:127:13 [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.655+0000 @jstests/hooks/run_validate_collections.js:36:5 [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.655+0000 @jstests/hooks/run_validate_collections.js:5:2 [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.655+0000 failed to load: jstests/hooks/run_validate_collections.js [ValidateCollections:job0:b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041:ValidateCollections] 2018-04-29T18:58:39.658+0000 Full collection validation after running 'b23b-mdb_793e-ent_7007-qa_a6ce-1525027831965-041' failed {code} We could, at the very least, include what the type of {{res}} was in the assertion error message.",1 +"SERVER-34793","05/02/2018 16:59:09","Add call to BF suggestion server on failed task completion","Add a call to the BF suggestion server task registration API during the post phase for tasks that have failed tests. The API call must not alter the task execution result.",1 +"SERVER-34826","05/03/2018 20:59:15","Write targeted FSM workload for read repeatability in transactions ","We should write an FSM workload that verifies _read repeatability_ of transactions. This workload can presumably have each thread be in either a _Read_ or _Update_ state, where the _Read_ state executes multiples reads sequentially, expecting to see the same result set for each read. The _Update_ state could update some random subset of documents in a collection. This test would be good at verifying repeatability under higher concurrency and load than our targeted tests. Eventually we may also add a repeatability test that runs against all our existing FSM workloads, but in lieu of that, this could be a valuable targeted workload to exercise a key property of transactions under snapshot isolation.",3 +"SERVER-34865","05/07/2018 16:39:50","Test archival fails when temporary files are removed","The following error occurred when archiving a failed test: {noformat}[2018/05/07 11:27:26.351] [executor:fsm_workload_test:job0] 2018-05-07T15:27:26.349+0000 Archiving data files for test jstests/concurrency/fsm_workloads/yield_group.js from /data/db/job0/resmoke [2018/05/07 11:27:26.356] [executor:fsm_workload_test:job0] 2018-05-07T15:27:26.355+0000 Encountered an error during test execution. [2018/05/07 11:27:26.356] Traceback (most recent call last): [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/testing/job.py"", line 45, in __call__ [2018/05/07 11:27:26.356] self._run(queue, interrupt_flag) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/testing/job.py"", line 83, in _run [2018/05/07 11:27:26.356] self._execute_test(test) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/testing/job.py"", line 113, in _execute_test [2018/05/07 11:27:26.356] self.archival.archive(self.logger, test, success) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/testing/hook_test_archival.py"", line 78, in archive [2018/05/07 11:27:26.356] self._archive_test(logger, test, success) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/testing/hook_test_archival.py"", line 69, in _archive_test [2018/05/07 11:27:26.356] self._archive_hook_or_test(logger, test_name, test) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/testing/hook_test_archival.py"", line 105, in _archive_hook_or_test [2018/05/07 11:27:26.356] s3_bucket, s3_path) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/utils/archival.py"", line 157, in archive_files_to_s3 [2018/05/07 11:27:26.356] s3_bucket, s3_path) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/utils/archival.py"", line 245, in _archive_files [2018/05/07 11:27:26.356] if file_list_size(input_files) > free_space(temp_file): [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/utils/archival.py"", line 39, in file_list_size [2018/05/07 11:27:26.356] file_bytes += directory_size(ifile) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/src/buildscripts/resmokelib/utils/archival.py"", line 52, in directory_size [2018/05/07 11:27:26.356] dir_bytes += os.path.getsize(full_name) [2018/05/07 11:27:26.356] File ""/data/mci/6612d9aa5374fb14abe6c091b3ffcf03/venv/lib/python2.7/genericpath.py"", line 57, in getsize [2018/05/07 11:27:26.356] return os.stat(filename).st_size [2018/05/07 11:27:26.356] OSError: [Errno 2] No such file or directory: '/data/db/job0/resmoke/shard1/node0/WiredTiger.turtle.set' {noformat} The code doe not handle the case where a temporary file is in a directory list and then subsequently deleted before it is examined: {code}def directory_size(directory): """"""Return size (in bytes) of files in 'directory' tree."""""" dir_bytes = 0 for root_dir, _, files in os.walk(unicode(directory)): for name in files: full_name = os.path.join(root_dir, name) try: dir_bytes += os.path.getsize(full_name) except OSError: # Symlinks generate an error and are ignored. if os.path.islink(full_name): pass else: raise return dir_bytes {code} The {{OSError}} should handle this case.",2 +"SERVER-34867","05/07/2018 18:12:25","Run powercycle tests with `storage.recovery` logging set to 2","There are test failures where debugging would be aided by having recovery logging turned on. E.g: [the stable timestamp a node takes checkpoints at|https://github.com/mongodb/mongo/blob/c0d6b410b15227051ca96dc54f8d6c1df77630cf/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L242-L243].",1 +"SERVER-35036","05/17/2018 04:11:04","Remove database and collection cleanup from $config.teardown functions","As mentioned in [this comment|https://jira.mongodb.org/browse/SERVER-34548?focusedCommentId=1868113&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1868113] of SERVER-34548, having FSM workloads drop their (unique) database or collection in the {{$config.teardown()}} function undermines our data consistency checks as the contents will be deleted before the resmoke.py hooks run. We should leave it to the {{CleanupConcurrencyWorkloads}} hook from SERVER-30204 to drop all the databases and collections after the data consistency checks run.",3 +"SERVER-35042","05/17/2018 17:16:58","Mobile builders should not be enterprise","The mobile builders are currently enterprise builds for historical reasons, but they don't need to be. They also probably shouldn't be, since we plan to add some additional build variants in the near future for which we don't want to require the enterprise code to build.",2 +"SERVER-35071","05/18/2018 16:19:38","Split MMAPv1 tasks into separate variants in sys-perf","* Split mmapv1 tasks to a separate variant * Remove _WT and _MMAPv1 from task names * Use anchors to collapse task lists * Schedule mmap variants to run every 7 days * BUILD ticket to copy history correctly * backports",3 +"SERVER-35100","05/18/2018 22:48:17","Do not log a Python stack trace when a hook dynamic test fails","Currently, when a dynamic test (run in a hook) fails, the exception that was thrown gets logged. The stack trace in the logs is unrelated to the cause of the failure and only adds noise. It should be removed. The log statement is [here|https://github.com/mongodb/mongo/blob/6ab1592260c9b21d802aa65a11d268c0a97b11a7/buildscripts/resmokelib/testing/hooks/interface.py#L79].",1 +"SERVER-35154","05/22/2018 17:52:41","Exceptions that escape a ScopedThread should fail the test","If you start a ScopedThread in a test and it throws an exception, that exception is swallowed and does not error the test, even if the main test thread calls join() and returnData() on the ScopedThread. This can lead to subtle problems with tests, and issues where tests are broken and no longer testing what they are supposed to, but no one notices.",5 +"SERVER-35160","05/22/2018 18:28:19","ScopedThreads should automatically inherit TestData from their parent thread","For a test that uses ScopedThread to pass in the auth passthrough tests, the test author needs to remember to manually pass TestData into the spawned thread so it can inherit the proper auth credentials. startParallelShell automatically copies the TestData to the new shell, ScopedThread should behave the same.",2 +"SERVER-35165","05/22/2018 19:09:30","Disable and re-enable update_test_lifecycle Evergreen task on the 4.0 branch","{{git log \-\-since=28.days \-\-pretty=format:%H}} returns commits from prior to when we created the 4.0 branch and therefore prior to when we created the {{mongodb\-mongo\-v4.0}} Evergreen project.",1 +"SERVER-35195","05/23/2018 20:07:40","Remove Python linting rule requiring docstring for magic functions","Add an exception for rule d105 in pydocstyles and remove redundant magic method docstrings. docstrings for magic functions don't provide much value. They're mostly useful for describing parameters, which is not checked by d105. ",1 +"SERVER-35197","05/23/2018 21:31:30","Change CleanEveryN to CleanupConcurrencyWorkloads in concurrency_replication_causal_consistency suite","Given the fact that CleanUpConcurrencyWorkloads hook caused a bunch of test failures in my [patch|https://evergreen.mongodb.com/version/5b05b067e3c3314cc0af1572], I am going to switch back to CleanEveryN and put a TODO comment with this ticket number there.",1 +"SERVER-35203","05/24/2018 10:43:26","Unittests accept --logLevel","It might be nice to add an optional {{--logLevel=}} parameter to unittest_main, so that when locally debugging unittest failures it's possible to ramp up any debugging log output from the main code that's called by the tests.",1 +"SERVER-35233","05/25/2018 18:00:36","Powercycle remote collection validation does not skip views","The remote collection validation can fail if a view exists.",1 +"SERVER-35250","05/25/2018 21:39:44","save dbtest debug symbols in debug_symbols tar","I would like to be able to symbolize stack traces generated by dbtest in BF's.",3 +"SERVER-35261","05/27/2018 20:17:13","Add CheckReplDBHashInBackground hook to concurrency_replication.yml test suite","Apply the following patch and run it several times in Evergreen to see if there are any failures before turning it on. {code:diff} diff --git a/buildscripts/resmokeconfig/suites/concurrency_replication.yml b/buildscripts/resmokeconfig/suites/concurrency_replication.yml index 7ae625b..05dfa72 100644 --- a/buildscripts/resmokeconfig/suites/concurrency_replication.yml +++ b/buildscripts/resmokeconfig/suites/concurrency_replication.yml @@ -16,6 +16,7 @@ selector: executor: archive: hooks: + - CheckReplDBHashInBackground - CheckReplDBHash - ValidateCollections tests: true @@ -26,7 +27,7 @@ executor: # The CheckReplDBHash hook waits until all operations have replicated to and have been applied # on the secondaries, so we run the ValidateCollections hook after it to ensure we're # validating the entire contents of the collection. - # + - class: CheckReplDBHashInBackground # TODO SERVER-26466: Add CheckReplOplogs hook to the concurrency suite. - class: CheckReplDBHash - class: ValidateCollections {code} *Note*: Unlike SERVER-34555, there shouldn't need to be any additional synchronization as the {{CheckReplDBHashInBackground}} hook is safe to run while the {{$config.setup()}} and {{$config.teardown()}} functions are being run.",2 +"SERVER-35262","05/27/2018 20:18:46","Add concurrency_simultaneous_replication.yml test suite","This would increase the variety of concurrent operations we exercise against a replica set. The existing {{concurrency_simultaneous.yml}} test suite is limited in that we cannot run FSM workloads which use transactions. The {{concurrency_simultaneous_replication}} Evergreen task should be added to all build variants we currently run the {{concurrency_simultaneous}} Evergreen task against.",3 +"SERVER-35263","05/27/2018 20:51:30","Add FSM workloads for testing atomicity and isolation of updates inside a transaction across multiple collections and databases","Extend [the {{multi_statement_transaction_atomicity_isolation.js}} FSM workload|https://github.com/mongodb/mongo/blob/r4.1.0/jstests/concurrency/fsm_workloads/multi_statement_transaction_atomicity_isolation.js] from SERVER-34293 to support running the updates and consistency checks against collections or databases specified via {{$config.data}}. The {{multi_statement_transaction_atomicity_isolation.js}} FSM workload should continue to only run against the {{db[collName]}} collection provided by the concurrency framework.",2 +"SERVER-35313","05/31/2018 16:47:09","CleanupConcurrencyWorkloads resmoke hook needs to handle the balancer","If the balancer is enabled, then the CleanupConcurrencyWorkloads hook should stop it before cleaning up the DBs and collections, and then restart it when finished.",2 +"SERVER-35383","06/04/2018 19:41:17","Increase electionTimeoutMillis for the ContinuousStepdown hook used in stepdown suites","The {{electionTimeoutMillis}} parameter for the {{ContinuousStepdown}} hook, used in the concurrency stepdown suites, is set to 5000. We should increase this per the captured discussion: {quote} > > On 2018/05/30 22:09:12, maxh wrote: > > > [note] As mentioned in SERVER-34666, I don't think we should shorten the > > > election timeout as it can lead to an election happening that isn't > initiated > > by > > > the StepdownThread due to heartbeats being delayed. I'm okay with keeping it > > > as-is for now because it is consistent with the replica set configuration > the > > > JavaScript version would have used; however, I'd like for there to be a > > > follow-up SERVER ticket to change it. > > > > > > > > > https://jira.mongodb.org/browse/SERVER-34666?focusedCommentId=1873407&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1873407 > > > > For the followup ticket, do we just want to remove this value and use the > > default, or set it to a higher timeout? > > I'm not sure - I'd like to get some input from Judah on it. I'm currently > wondering if we really need to avoid setting the election timeout to 24 hours > when all_nodes_electable=true. We're going to use the replSetStepUp command in > the Python version of the StepdownThread to cause one of the secondaries to run > for election anyway. If for some reason the replSetStepUp command fails, then > the former primary will try and step back up after 10 seconds on its own anyway. > > https://github.com/mongodb/mongo/blob/r4.1.0/buildscripts/resmokelib/testing/fixtures/replicaset.py#L149-L154 If you only want elections to come from the StepdownThread, then I'd recommend setting the election timeout to 24 hours. The replSetStepUp command should still work, and if it fails for some reason, then no other node will try to run for election. There's no real difference between the default 10 seconds and the current 5 seconds except for the amount of flakiness you'd expect (not the existence of flakiness that we're trying to remove completely). {quote}",2 +"SERVER-35398","06/05/2018 06:49:00","Mobile SE: Remove code for capped collection","Remove the capped collection code after SERVER-33605 gets checked in",2 +"SERVER-35473","06/07/2018 04:48:21","Mobile SE: Fix writeConflictRetry loop with map-reduce jstests","With fixes for SERVER-32997 I still see issues with map-reduce jstests in {{concurrency/fsm_workloads/}}. {{concurrency}} and {{concurrency_simultaneous}} test suites stay disabled for mobile SE waiting on fixing map-reduce (and validate). This ticket tracks the work needed to fix map-reduce concurrency issues.",5 +"SERVER-35506","06/08/2018 15:43:30","The Powercycle wait_for_mongod_shutdown function should ensure the mongod process is no longer running","The {{wait_for_mongod_shutdown}} function waits until the {{mongod.lock}} file is deleted. It should just check that mongod process is no longer running.",2 +"SERVER-36019","06/11/2018 19:11:08","Create script to collect resource utilization of Android application","The script should assume that the application is already running on the device and shouldn't concern itself with how the application was started (i.e. it'll be the responsibility of some other part of this mobile testing framework which deals with that). The script should run continuously and collect the resource utilization at some configurable frequency until the it is signaled to exit. It should define functions like {{start()}} and {{stop()}} so that the script can also be used as a library and controlled programmatically. * CPU consumption: {{python systrace.py}} ** Reference: https://developer.android.com/studio/command-line/systrace * Memory consumption: {{adb shell dumpsys meminfo}} ** Refence: https://developer.android.com/studio/command-line/dumpsys#meminfo * Battery consumption: {{adb shell dumpsys batterystats}} ** Reference: https://developer.android.com/studio/command-line/dumpsys#battery",5 +"SERVER-35537","06/11/2018 20:42:20","Create version of benchRun() which can be used with embedded","This makes it possible use our existing https://github.com/mongodb/mongo-perf tests as performance tests for embedded. We can likely get away with creating separate executables for each mongo-perf test case that uses {{ServiceEntryPointEmbedded}}, {{embedded::initialize()}}, and {{DBDirectClient}} (instead of [{{DBClientConnection}}|https://github.com/mongodb/mongo/blob/5a1bdde940b6a91e1133b64ee5365ce595b23e3a/src/mongo/shell/bench.cpp#L1363]) to perform operations. *Note*: We'll also need to remove or split out the ""check"" function as it requires linking in the JavaScript engine.",8 +"SERVER-35559","06/12/2018 18:07:58","Update transaction retry functions to not call abort after commit","After the changes in SERVER-35094 to disallow calling {{abortTransaction()}} after {{commitTransaction()}} The {{withTxnAndAutoRetry}} helper function and retry logic in the background dbhash hook will need to be changed to not call {{abortTransaction()}} if the failure error comes from {{commitTransaction()}}",2 +"SERVER-35588","06/13/2018 21:19:53","powertest.py should call replSetReconfigure command only after successful replSetGetConfig","replSetReconfig is not safe to retry when there's an AutoReconnect error. If it succeeded before the network was disconnected, it will likely fail the retry. I believe it is safe to retry the whole else-clause starting with replSetGetConfig https://github.com/mongodb/mongo/blob/1d89d2c88bcb39045701b87612b866ae2eb49378/pytests/powertest.py#L1440 In this case, if the previous reconfig succeeded, the ""if"" at line 1457 will prevent attempting to reconfig again. ",2 +"SERVER-35737","06/22/2018 05:45:16","install_compass fails on MacOS ","Using MacOS 10.13.3 and Python 2.7.3. When trying to run install_compass after downloading the 4.0.0-rc7 tarball: {noformat} $ ./4.0.0-rc7/bin/install_compass Downloading Compass... 729% Installing the package... Traceback (most recent call last): File ""./4.0.0-rc7/bin/install_compass"", line 173, in download_and_install_compass() File ""./4.0.0-rc7/bin/install_compass"", line 161, in download_and_install_compass install_mac(pkg) File ""./4.0.0-rc7/bin/install_compass"", line 69, in install_mac '-mountpoint', tmp, dmg], stdout=fnull, stderr=fnull) File ""/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py"", line 186, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['hdiutil', 'attach', '-nobrowse', '-noautoopen', '-mountpoint', '/var/folders/x4/swc0_n7n6j10frpsyvprj57m0000gp/T/tmpppWP5m', '/var/folders/x4/swc0_n7n6j10frpsyvprj57m0000gp/T/tmp9hJmu0']' returned non-zero exit status 1   {noformat}   I checked the end point it's trying to hit (defined on line 156 of install_compass): [https://compass.mongodb.com/api/v2/download/latest/compass-community/stable/osx] It's displaying this message: {noformat} TypeError: Invalid Version: untagged-c4d5921acf05ae6d4201    at new SemVer (/compass/node_modules/nuts-serve/node_modules/semver/semver.js:293:11)    at compare (/compass/node_modules/nuts-serve/node_modules/semver/semver.js:569:10)    at Function.gt (/compass/node_modules/nuts-serve/node_modules/semver/semver.js:598:10)    at compareVersions (/compass/node_modules/nuts-serve/lib/versions.js:56:16)    at Array.sort (native)    at /compass/node_modules/nuts-serve/node_modules/lodash/index.js:12092:23    at Function.tap (/compass/node_modules/nuts-serve/node_modules/lodash/index.js:5921:19)    at baseWrapperValue (/compass/node_modules/nuts-serve/node_modules/lodash/index.js:2845:30)    at LodashWrapper.wrapperValue (/compass/node_modules/nuts-serve/node_modules/lodash/index.js:6112:14)    at /compass/node_modules/nuts-serve/lib/versions.js:78:14{noformat} Can we handle end point errors more gracefully?",2 +"SERVER-35800","06/26/2018 02:05:45","resmoke.py should retry getting a build_id and test_id from logkeeper","The changes from SERVER-35472 made it so that resmoke.py would exit if it couldn't communicate with logkeeper. This has lead to setup failures in Evergreen that are caused by the logkeeper application server not responding with a build_id or test_id quickly enough. It might be that retrying would succeed that we should make 10 attempts and fail if we still don't get a build_id or test_id. *Note*: Retrying the request to get a build_id or new test_id is safe as [it simply inserts a new document|https://github.com/evergreen-ci/logkeeper/blob/e83432bd04ba111c72907af7f3fa50a52ea531b6/views.go#L219]. The only quirk is that the ""Job logs"" tab may show extra entries in the case that resmoke.py never received a response from the logkeeper application server but the database still eventually did the work.",2 +"SERVER-35852","06/27/2018 22:12:16","Convert backup_restore.js blacklist to use a YAML based list for transaction tests","Having the test understand existing blacklists will reduce the likelihood of a test not being added to the blacklist for {{backup_restore.js}}.",2 +"SERVER-36010","07/08/2018 01:31:13","Change log messages for Windows stacktraces to use error() or severe() rather than log()","This would make it more obvious to our users and potentially easier for automated tools to detect that these messages are process-fatal. * https://github.com/mongodb/mongo/blob/026f69dbf4f98e91b499bde5cb4ce73c332e9549/src/mongo/util/exception_filter_win32.cpp#L137-L138 * https://github.com/mongodb/mongo/blob/026f69dbf4f98e91b499bde5cb4ce73c332e9549/src/mongo/util/exception_filter_win32.cpp#L160 * https://github.com/mongodb/mongo/blob/026f69dbf4f98e91b499bde5cb4ce73c332e9549/src/mongo/util/exception_filter_win32.cpp#L163 * https://github.com/mongodb/mongo/blob/026f69dbf4f98e91b499bde5cb4ce73c332e9549/src/mongo/util/exception_filter_win32.cpp#L174",2 +"SERVER-36043","07/10/2018 10:45:05","systemd unit for mongod starts before multi.user target","The {{mongod}} service unit is configured to be part of the multi-user.target, but the unit configuration also has the following parameter: {{After=network.target}} Basically, this makes {{mongod}} start just after the network is up and not during multi-user (one of the latest targets to trigger). This causes problems on those servers where there are network storage services or authentication services (like AD) still pending to start. {{mongod}} would not be able to start if it depends on them. Removing the {{After}} parameter would make systemd to start {{mongod}} as part of the correct target. If an {{After}} is needed for some reason, {{After=multi-user.target}} would avoid most of the problems with dependencies between services. Affects all native packages (RPM and DEB) for systemd based Linux distros.",1 +"SERVER-36067","07/11/2018 15:40:38","Upload artifacts from running install-mobile-test target in Evergreen to S3","We currently upload the artifacts from running the {{install\-mobile\-dev}} target in Evergreen to S3 as {noformat} ${project}/embedded-sdk/${build_variant}/${revision}/${version}.tgz ${project}/embedded-sdk/${build_variant}-latest.tgz {noformat} We should do something similar for the {{install\-mobile\-test}} target so that the {{mongoebench}} binary from SERVER-35537 can be run on an Android device. The following S3 paths were proposed by [~acm]: {noformat} ${project}/embedded-sdk-test/${build_variant}/${revision}/${version}.tgz ${project}/embedded-sdk-test/${build_variant}-latest.tgz {noformat}",2 +"SERVER-36069","07/11/2018 15:51:15","Vendor mongoebench-compatible JSON config files from mongodb/mongo-perf into src/third_party","The JSON config files should live in a directory called {{src/third_party/mongo-perf/mongoebench/}}. It should be possible to rerun the vendoring script and automatically update them to pick up on new and modified mongo-perf test cases. We'll likely want to filter out test cases which rely on capped collections or server-side JavaScript.",3 +"SERVER-36073","07/11/2018 17:30:00","Save stats from BenchRunner::finish() to a JSON file in mongoebench","We can add a new command line {{\-\-output}} as a path for where to save the benchRun stats.",1 +"SERVER-36076","07/11/2018 19:34:15","Create new resmoke.py test suite for running mongoebench on a desktop","It should run {{mongoebench}} with the various JSON config files that live in the {{src/third_party/mongo-perf/mongoebench/}} directory that have been vendored into the source tree as part of the changes from SERVER-36069. This involves creating a new {{buildscripts/resmokelib/testing/testcases/mongoebench_test.py}} test case that executes {{mongoebench}} with the appropriate arguments. For example, the value for the {{\-\-benchmarkMinTimeSecs}} command line option should be forwarded as the {{\-\-time}} command line option to {{mongoebench}}. This also involves creating a new hook similar to [the {{CombineBenchmarkResults}} hook|https://github.com/mongodb/mongo/blob/r4.0.0/buildscripts/resmokelib/testing/hooks/combine_benchmark_results.py] that parses the JSON stats file specified as the {{\-\-output}} command line option (from SERVER-36073) to {{mongoebench}}. The new hook should accumulate benchmark results of all the test cases we run as part of the test suite and serialize them as a JSON file (taking its name from the {{\\--perfReportFile}} command line option) that can be used for the {{json.send}} Evergreen command to display the performance results. The test case should also handle the {{\-\-benchmarkRepetitions}} command line option (in Python, as there is no equivalent option to forward to {{mongoebench}}) and accumulate the benchmark results of multiple executions. We may find it beneficial to define separate test suites that each run a subset of the test cases [similar to what is done in the performance Evergreen project when these test cases are run with benchrun.py|https://evergreen.mongodb.com/build/performance_linux_wt_standalone_80c7c825a44cf99b17e81f4233445c7ab1927706_18_07_11_01_45_09] to avoid having an Evergreen task run for a long time.",5 +"SERVER-36077","07/11/2018 19:41:04","Create new resmoke.py test suite for running mongoebench on an Android device","This should build on top of the work from SERVER-36076 to run {{mongoebench}} as a statically-linked binary on an Android device. The {{mongoebench}} and JSON config file can likely be copied over the to device and then run using some combination of the {{adb push}} and {{adb shell}} commands. *Note*: Integrating the {{buildscripts/mobile/adb_monitor.py}} utility into this test suite should happen as part of SERVER-36078.",3 +"SERVER-36078","07/11/2018 19:41:34","Integrate adb resource monitor into mongoebench test suite for Android","This involves creating a new hook that calls the \{{start()}} and \{{stop()}} methods before and after each test, respectively. The generated files from the resource monitor should be packaged into subdirectories based on the (test case, execution) pair which ran.",2 +"SERVER-36090","07/12/2018 15:41:33","install_compass fails on MacOS due to SSL version","{noformat} monkey101:Downloads$ ./mongodb-osx-x86_64-4.0.0/bin/install_compass Traceback (most recent call last): File ""./mongodb-osx-x86_64-4.0.0/bin/install_compass"", line 173, in download_and_install_compass() File ""./mongodb-osx-x86_64-4.0.0/bin/install_compass"", line 157, in download_and_install_compass pkg = download_pkg(link, pkg_format=pkg_format) File ""./mongodb-osx-x86_64-4.0.0/bin/install_compass"", line 58, in download_pkg res = urllib.urlretrieve(link, filename=tmpf[1], reporthook=download_progress) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py"", line 98, in urlretrieve return opener.retrieve(url, filename, reporthook, data) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py"", line 245, in retrieve fp = self.open(url, data) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py"", line 213, in open return getattr(self, name)(url) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py"", line 443, in open_https h.endheaders(data) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py"", line 1049, in endheaders self._send_output(message_body) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py"", line 893, in _send_output self.send(msg) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py"", line 855, in send self.connect() File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py"", line 1274, in connect server_hostname=server_hostname) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py"", line 352, in wrap_socket _context=self) File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py"", line 579, in __init__ self.do_handshake() File ""/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py"", line 808, in do_handshake self._sslobj.do_handshake() IOError: [Errno socket error] [SSL: UNSUPPORTED_PROTOCOL] unsupported protocol (_ssl.c:590) {noformat} The link for the compass download is behind the load balancer, which has disabled older TLS versions. There are a couple of possible solutions we should consider.",8 +"SERVER-36129","07/13/2018 21:23:35","Concurrency stepdown suites should wait for replication of workload setups before starting stepdown thread","The concurrency stepdown suites [wait until after setup has been called for each workload before starting the stepdown thread|https://github.com/mongodb/mongo/blob/a291ec89affd9e849ac62ad55a736bfb940a0bb6/jstests/concurrency/fsm_libs/resmoke_runner.js#L101-L111] because the setup methods don't run with overriden majority read/write concern. The effects of each setup are not guaranteed to be majority committed at this point though, so an immediate stepdown can still roll back some of the setup, like the creation of the TTL index in [indexed_insert_ttl.js|https://github.com/mongodb/mongo/blob/a291ec89affd9e849ac62ad55a736bfb940a0bb6/jstests/concurrency/fsm_workloads/indexed_insert_ttl.js#L30-L31]. A fix for this would be waiting for replication on all shards and the config server before starting the stepdown thread.",3 +"SERVER-36162","07/17/2018 16:38:31","Powercycle - ensure internal crash command has been executed on the remote host","It's possible that due to an ssh connection error, the remote command to internally crash a server will never run. The {{powertest.py}} script expects that the crash command will fail, as the ssh connection will be terminated. However, it should examine the output of the crash command to determine it it was actually run on the remote host. Here's a case where the remote command failed to execute: {noformat} [2018/07/15 16:11:38.976] 2018-07-15 20:10:47,078 INFO Crashing server in 46 seconds [2018/07/15 16:11:38.976] 2018-07-15 20:11:37,188 INFO Inserting canary document {'x': 1531685447.025} to DB power Collection cycle [2018/07/15 16:11:38.976] ssh -o ServerAliveCountMax=10 -o ServerAliveInterval=6 -o StrictHostKeyChecking=no -o ConnectTimeout=10 -o ConnectionAttempts=20 -i /cygdrive/c/data/mci/3ab7f95ff9a32d5ea1ad8ffe3e1a09fd/powercycle.pem -o GSSAPIAuthentication=no -o CheckHostIP=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 -o ConnectionAttempts=20 10.122.5.210 /bin/bash -c ""$'source venv_powercycle/Scripts/activate; python -u powertest.py --remoteOperation --sshUserHost 10.122.5.210 --sshConnection \'-i /cygdrive/c/data/mci/3ab7f95ff9a32d5ea1ad8ffe3e1a09fd/powercycle.pem -o GSSAPIAuthentication=no -o CheckHostIP=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 -o ConnectionAttempts=20\' --rsync --rsyncExcludeFiles diagnostic.data/metrics.interim* --backupPathBefore /log/powercycle/beforerecovery --backupPathAfter /log/powercycle/afterrecovery --validate local --canary local --docForCanary None --seedDocNum 10000 --crashOption \'notmyfault/notmyfaultc64.exe -accepteula crash 1\' --instanceId i-093c2bc45b5317756 --crashWaitTime 45 --jitterForCrashWaitTime 5 --numCrudClients 10 --numFsmClients 10 --rootDir /log/powercycle-mongodb_mongo_v3.6_windows_64_2k8_ssl_powercycle_syncdelay_WT_f1bcba35cefd0c5c0402e32575327a77507ac03e_18_07_14_22_41_33 --mongodbBinDir /log/powercycle --dbPath /data/db --logPath /log/powercycle/mongod.log --mongodUsablePorts 20000 20001 --mongodOptions \'--setParameter enableTestCommands=1 --syncdelay 10 --storageEngine wiredTiger\' --remotePython \'source venv_powercycle/Scripts/activate; python -u\' crash_server'"" [2018/07/15 16:12:29.518] 2018-07-15 20:12:16,477 INFO Connection timed out during banner exchange {noformat} ",5 +"SERVER-36169","07/17/2018 21:55:57","Resmoke: bare raise outside except in the stepdown hook","The stepdown hook code contains three misplaced bare raise statement, outside an except block: [here|https://github.com/mongodb/mongo/blob/99d3436094d31de348edfac9fe0e40e60b28391e/buildscripts/resmokelib/testing/hooks/stepdown.py#L409], [here|https://github.com/mongodb/mongo/blob/99d3436094d31de348edfac9fe0e40e60b28391e/buildscripts/resmokelib/testing/hooks/stepdown.py#L423] and [here|https://github.com/mongodb/mongo/blob/99d3436094d31de348edfac9fe0e40e60b28391e/buildscripts/resmokelib/testing/hooks/stepdown.py#L442].",1 +"SERVER-36230","07/20/2018 21:45:24","Waits-for graph no longer being generated by hang_analyzer.py script","In BF-9986, we saw many threads waiting on DB locks, but the hang analyzer cannot generate the graph for us. The lock manager [dumps the information|https://logkeeper.mongodb.org/lobster/build/6321d9a6e900f89055816cb177b89f73/test/5b4f50b3f84ae847db0364c1#bookmarks=0%2C178236%2C180602&f=10ReplicaSetFixture%3Ajob0%3Aprimary] though. {noformat} [2018/07/18 12:40:44.821] warning: target file /proc/17238/cmdline contained unexpected null characters [2018/07/18 12:40:44.821] Saved corefile dump_mongod.17238.core [2018/07/18 12:41:07.364] Running Hang Analyzer Supplement - MongoDBDumpLocks [2018/07/18 12:41:07.364] Not generating the digraph, since the lock graph is empty [2018/07/18 12:41:07.364] Running Print JavaScript Stack Supplement [2018/07/18 12:41:07.364] Detaching from program: /data/mci/d8572849d2ad8ce1953117503927f065/src/mongod, process 17238 [2018/07/18 12:41:07.491] Done analyzing mongod process with PID 17238 [2018/07/18 12:41:07.491] Debugger /opt/mongodbtoolchain/gdb/bin/gdb, analyzing mongod process with PID 17241 {noformat} ",2 +"SERVER-36233","07/20/2018 22:21:02","Prohibit running the ""profile"" command from secondary read override test suites.","Running the {{profile}} command can cause system.profile to be created. Since system.profile is an unreplicated collection, there is no point in testing it in secondary read override test suites. It can also cause other tests to pick up the unreplicated collection and fail. We should prevent the {{profile}} command from being run in the {{set_read_preference_secondary.js}} override file.",2 +"SERVER-36409","08/02/2018 01:59:51","Install v3.2 from yum repository errors out on RHEL 7Server","The yum repositories for version 3.2 has an issue that prevents install on RHEL 7Server: {code} ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: mongodb-org x86_64 3.2.20-1.el7 mongodb-org-3.2 5.8 k Installing for dependencies: make x86_64 1:3.82-23.el7 base 420 k mongodb-org-mongos x86_64 3.2.20-1.el7 mongodb-org-3.2 5.7 M mongodb-org-server x86_64 3.2.20-1.el7 mongodb-org-3.2 13 M mongodb-org-shell x86_64 3.2.20-1.el7 mongodb-org-3.2 6.8 M mongodb-org-tools x86_64 3.2.20-1.el7 mongodb-org-3.2 4.1 M openssl x86_64 1:1.0.2k-12.el7 base 492 k Transaction Summary ================================================================================ Install 1 Package (+6 Dependent packages) Total download size: 30 M Installed size: 204 M Downloading packages: warning: /var/cache/yum/x86_64/7/base/packages/make-3.82-23.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY Public key for make-3.82-23.el7.x86_64.rpm is not installed warning: /var/cache/yum/x86_64/7/mongodb-org-3.2/packages/mongodb-org-3.2.20-1.el7.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID ea312927: NOKEY Public key for mongodb-org-3.2.20-1.el7.x86_64.rpm is not installed https://repo.mongodb.org/yum/redhat/7Server/mongodb-org/3.2/x86_64/RPMS/mongodb-org-tools-3.2.20-1.el7.x86_64.rpm: [Errno 14] curl#63 - ""Callback aborted"" Trying other mirror. Error downloading packages: mongodb-org-tools-3.2.20-1.el7.x86_64: [Errno 256] No more mirrors to try. {code} {code} [mongodb-org-3.2] name=MongoDB Repository baseurl=https://repo.mongodb.org/yum/redhat/7Server/mongodb-org/3.2/x86_64/ gpgcheck=1 enabled=1 gpgkey=https://www.mongodb.org/static/pgp/server-3.2.asc {code}",1 +"SERVER-36431","08/03/2018 15:50:08","Powercycle should check for existence of a process before accessing it's attributes","It's possible then when iterating over a list of processes that the process could finish before it's accessed: {code} for proc in psutil.process_iter(): if proc.name() == self.name: self.pids.append(proc.pid) {code} An additional check should be added {code} if psutil.pid_exists(proc.pid) and proc.name() == self.name {code}",1 +"SERVER-36451","08/03/2018 22:34:50","ContinuousStepdown with killing nodes can hang due to not being able to start the primary","The replica_sets_kill_primary_jscore_passthrough tests occasionally timeout due waiting for a primary to be selected. The tests increase the election timeout to 24 hours to have control over which node is the leader. However, this can lead to a situation where the leader has been killed and both secondaries were unable to take over due to having stale oplogs. When the server is brought back up and attempts to stepup, there is a chance it has not yet heard back heartbeats from the other nodes in the cluster and assumes they are down. This means the stepup fails and another election is not attempted causing the test to eventually timeout. A possible solution, in the event of a failure would be to retry the stepup after some delay. This would allow the secondaries more time to respond to the heart beat request.",2 +"SERVER-36507","08/07/2018 20:56:20","Downgrading WT from FCV 4.2 -> 4.0 requires an ""acquiesced"" system","The 4.0 -> 3.6 FCV downgrade path in storage would acquiesce the system by [closing/re-opening the WT connection|https://github.com/mongodb/mongo/blob/aa0062e8aaa5a8273bb33a2afaf8c9cdf5fbede7/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L665-L670]. That was originally done to facilitate changing table logging settings. 4.2 development [optimistically removed the close/open along with the table logging changes|https://github.com/mongodb/mongo/commit/013b82bf5f58bd7de8ae2f4d28d24f82afa22e64#diff-8fd4ad8935bb2bf3f91bb01f4785c544L659] under the assumption that changing the file compatibility on reconfigure (which remains in the upgrade/downgrade path) did not need to be acquiesced to the same degree.",1 +"SERVER-36530","08/08/2018 17:35:37","Run the agg expression fuzzer in Evergreen","Add Evergreen tasks for the new agg expression fuzzer.",1 +"SERVER-36615","08/13/2018 20:50:07","Add linux repo package testing steps to the server projects","* Add chef recipes to configure repos and install mongod * Add Inspec test and kitchen configuration * Add logic to push task to test and rebuild repos ",8 +"SERVER-36622","08/13/2018 22:06:32","Package tests fail for newer Ubuntu","Our package tests expect a failure for Ubuntu 18.04 for the install_compass script since it was previously hard coded to only work on 14.04 and 16.04. It has been changed to work on >= 14.04 so our package tests should expect a success on newer Ubuntu's like 18.04",3 +"SERVER-36698","08/16/2018 17:41:56","Add suite for agg expr fuzzer optimized vs unoptimized","Acceptance Criteria: * Optimized agg expression fuzzer is running and green in Evergreen",3 +"SERVER-36751","08/17/2018 22:26:43","Prevent concurrent dropDatabase commands in the concurrency_simultaneous_replication suite","*Problem* In the {{concurrency_simultaneous_replication}} test suite, we run 10 operations in parallel on the same database, there's a small chance (e.g. 5% for some workloads) that an operation could be a {{dropDatabase}}. For slower build variants, a single {{dropDatabase}} command can take multiple minutes to finish if there is heavy activity from other workloads that are happening in parallel. Our tests will [retry an operation for up to 10 minutes|https://github.com/mongodb/mongo/blob/17686781044525b9c3fbdf06ca326c8f4fb383ba/jstests/libs/override_methods/implicitly_retry_on_database_drop_pending.js#L148] if {{DatabaseDropPending}} errors are encountered. After seeing the error, [A {{getLastError}} command is used|https://github.com/mongodb/mongo/blob/17686781044525b9c3fbdf06ca326c8f4fb383ba/jstests/libs/override_methods/implicitly_retry_on_database_drop_pending.js#L25] to wait for the {{dropDatabase}} command to be committed. There is variability in the order that {{getLastError}} returns from different workload clients, which may cause certain workload clients to always be stuck behind other clients that are doing more {{dropDatabase}} commands. When this happens, the client will receive another {{DatabaseDropPending}} error. But the client is [unable to distinguish|https://github.com/mongodb/mongo/blob/17686781044525b9c3fbdf06ca326c8f4fb383ba/jstests/libs/override_methods/implicitly_retry_on_database_drop_pending.js#L141] whether the error is caused by the same dropDatabase command or a new one, causing the new wait to continue eat into the 10 minute timeout. There is a small probability that this cycle will happen for a handful of times in a row, which when combined with slow multi-minute {{dropDatabase}} commands, will exceed the 10 minute timeout. *Solution* The solution is to avoid retrying {{dropDatabase}} commands when it returns a {{DatabaseDropPending}} error. This will cause the workload to transition to a new state and continue to do so until the new state is no longer a {{dropDatabase}} call. Then it will wait on the ongoing {{dropDatabase}} call. When the database is finally dropped, it's guaranteed that none of the clients waiting on it would be another drop database, so they should all be able to proceed. There might be edge cases where one client is able to execute multiple commands and one of those commands is another {{dropDatabase}}, but the likelihood of this happening 5 times in a row should be much smaller if not negligible. From a correctness perspective, this change will make some {{dropDatabase}} implicitly into no-ops, which should not cause loss of test coverage, as databases can't be dropped in parallel in the first place. The tests that run parallel dropDatabases also all randomized tests and don't expect these operations to all succeed when there are parallel clients operating on the same database. -We should also write a dedicated regression test that does a high number of collection DDL operations while dropping and creating databases to simulate the timeout failures we've seen, the changes from this ticket should prevent the test from failing.- The -new test and the- changes to not retry {{dropDatabase}} should be limited to affect only the {{concurrency_simultaneous_replication}} suite, as we have not seen this failure elsewhere so far.",1 +"SERVER-36756","08/18/2018 15:01:13","Log the githash of the 10gen/jstestfuzz repository when the fuzzer's self-tests fail","(This came up when looking at a BF ticket with [~sviatlana.zuiko].) There is currently no information about the version of the 10gen/jstestfuzz repository we are running [when it fails|https://github.com/mongodb/mongo/blob/d0b0d782a14e9c0ac5724e35fb0bc2e20abcca67/etc/evergreen.yml#L1699-L1712] and the Build Baron ends up going off of the timestamp of when the Evergreen task ran. The following is the information that [we log about the 10gen/jepsen repository|https://github.com/mongodb/mongo/blob/d0b0d782a14e9c0ac5724e35fb0bc2e20abcca67/etc/evergreen.yml#L1575-L1577]. {noformat} branch=$(git symbolic-ref --short HEAD) commit=$(git show -s --pretty=format:""%h - %an, %ar: %s"") echo ""Git branch: $branch, commit: $commit"" {noformat}",1 +"SERVER-36757","08/18/2018 15:02:28","Generate and extract mongoebench-compatible JSON config files to consistent locations","The script from SERVER-36069 writes the mongoebench-compatible JSON config files to a {{src/third_party/mongo-perf/mongoebench/}} directory. The ""fetch benchmark embedded files"" added to Evergreen as part of SERVER-36076 extracts the mongoebench-compatible JSON config files to a top-level {{benchrun_embedded/}} directory. The {{benchrun_embedded*.yml}} test suites similiarly run the mongoebench-compatible JSON config files from a top-level {{benchrun_embedded/}} directory. We should have these directories be consistent with each other so that resmoke.py can be used to run the tests, regardless of whether the mongoebench-compatible JSON config files were generated locally or downloaded from S3. We should also add an entry for the directory to a {{.gitignore}} file because we made a decision to not include the mongoebench-compatible JSON config files in the source tree.",2 +"SERVER-36780","08/21/2018 12:55:17","Debian packages for MongoDB 4.0.1 missing","MongoDB 4.0.1 was released [6 August 2018|https://docs.mongodb.com/manual/release-notes/4.0/#aug-6-2018] and the installation documentation refers to version 4.0.1 under [Install a specific release of MongoDB|https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/#install-a-specific-release-of-mongodb], but this version does not appear to exist in the APT repository: {{$ apt-cache policy mongodb-org}} {{mongodb-org:}} {{  Installed: (none)}} {{  Candidate: 4.0.0}} {{  Version table:}} {{     4.0.0 500}} {{        500 [http://repo.mongodb.org/apt/debian] stretch/mongodb-org/4.0/main amd64 Packages}} This appears to apply to both Debian Stretch and Debian Jessie. Looking at the APT release file for [Stretch|http://repo.mongodb.org/apt/debian/dists/stretch/mongodb-org/4.0/Release] and [Jessie|http://repo.mongodb.org/apt/debian/dists/jessie/mongodb-org/4.0/Release], they do not appear to have been updated since the end of June.  ",1 +"SERVER-36812","08/22/2018 23:07:14","Log obvious details when resmoke observes killed processes","In BF-10349, the shell crashed due to segfault, but the shell didn't print out stack trace on exit. Resmoke logged the test exited with -11. However there are 10 mongo shells, it's not clear which one crashed. It's also not clear that's the shell who crashed. We have core dumps in this case, which have sufficient stack trace for debugging. It will be great if the error message can indicate that core dump is available and which process the developer should look into. {noformat} [2018/08/19 15:49:33.497] [executor:js_test:job0] 2018-08-19T19:49:33.495+0000 Received a StopExecution exception: JSTest jstestfuzz/out/jstestfuzz-6828-ent_fe14-qa_a6ce-1534707044622-33.js failed. [2018/08/19 15:49:33.684] [executor] 2018-08-19T19:49:33.684+0000 Summary: 67 test(s) ran in 1040.76 seconds (66 succeeded, 41 were skipped, 1 failed, 0 errored) [2018/08/19 15:49:33.684] The following tests failed (with exit code): [2018/08/19 15:49:33.684] jstestfuzz/out/jstestfuzz-6828-ent_fe14-qa_a6ce-1534707044622-33.js (-11) {noformat} Resmoke may also start mongods, I'm not sure if their exit error messages are clear. It would be great it's obvious who observed the crash and the error message from resmoke is consistent with that from the shell (e.g. {{ReplSetTest}}). ",2 +"SERVER-36816","08/23/2018 04:40:05","Avoid reloading the view catalog on the primary and secondaries during the dbhash check","The changes from SERVER-25640 made it so the {{listCollections}} command is run with a {{$in}} query containing the names of the collections returned by the {{dbHash}} command. * [against the primary|https://github.com/mongodb/mongo/blob/2bed54b084995f2c2dd048b6a70b6fd678e1ac30/src/mongo/shell/replsettest.js#L1855-L1858] * [against the secondary|https://github.com/mongodb/mongo/blob/2bed54b084995f2c2dd048b6a70b6fd678e1ac30/src/mongo/shell/replsettest.js#L1907-L1910] The query leads to the view catalog being reloaded because [a very special filter must be used|https://github.com/mongodb/mongo/blob/2bed54b084995f2c2dd048b6a70b6fd678e1ac30/src/mongo/db/commands/list_collections_filter.cpp#L37-L42] to prevent that behavior. There is logic in the {{checkDBHashesForReplSet()}} function that's only enabled for the fuzzer test suites to skip [checking the dbhash when reloading the view catalog fails due to an invalid view definition|https://github.com/mongodb/mongo/blob/2bed54b084995f2c2dd048b6a70b6fd678e1ac30/src/mongo/shell/replsettest.js#L1860-L1865]; however, it seems more worthwhile to avoid reloading the view catalog as we've found that an {{InvalidNamespace}} error response may be returned for certain patterns involving null bytes. We should instead use the very special filter to prevent the view catalog from being reloaded on the server during the {{listCollections}} command and do the actual filtering on the client-side. {code:javascript} // Don't run validate on view namespaces. let filter = {type: 'collection'}; if (jsTest.options().skipValidationOnInvalidViewDefinitions) { // If skipValidationOnInvalidViewDefinitions=true, then we avoid resolving the view // catalog on the admin database. // // TODO SERVER-25493: Remove the $exists clause once performing an initial sync from // versions of MongoDB <= 3.2 is no longer supported. filter = {$or: [filter, {type: {$exists: false}}]}; } {code}",2 +"SERVER-36817","08/23/2018 05:03:10","replSetFreeze command run by stepdown thread may fail when server is already primary","As part of the changes to address SERVER-35383 and based on [this comment|https://jira.mongodb.org/browse/SERVER-35124?focusedCommentId=1916761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1916761] from SERVER-35124, the stepdown thread in resmoke.py runs the {{\{replSetFreeze: 0\}}} command to make the former primary electable in the next round of stepdowns. Since the primary is only stepped down [for 10 seconds (by default)|https://github.com/mongodb/mongo/blob/2bed54b084995f2c2dd048b6a70b6fd678e1ac30/buildscripts/resmokelib/testing/hooks/stepdown.py#L28], it is possible for enough time to have passed for the primary to have tried to step back up on its own before the {{\{replSetFreeze: 0\}}} command is run. We either need to handle the {{OperationFailure: cannot freeze node when primary or running for election. state: Primary}} exception or prevent it from occurring.",1 +"SERVER-36819","08/23/2018 06:06:01","Enterprise RHEL 7.1 PPC64LE builder attempts to run concurrency_simultaneous_replication on rhel72-zseries-build distro","This leads to system failures in Evergreen because we're attempting to run binaries compiled for PowerPC on a zSeries platform. This has come up not too long ago in SERVER-35416 that it might be audit the other {{concurrency*}} tasks which were added recently. {noformat} [2018/08/11 13:41:06.435] sh: line 34: mongodb-linux-ppc64le-enterprise-rhel71-4.1.1-306-gd6b5625/bin/mongo: cannot execute binary file {noformat} {noformat} - name: enterprise-rhel-71-ppc64le display_name: Enterprise RHEL 7.1 PPC64LE ... tasks: ... - name: concurrency_simultaneous_replication distros: - rhel72-zseries-build {noformat} https://github.com/mongodb/mongo/blob/2bed54b084995f2c2dd048b6a70b6fd678e1ac30/etc/evergreen.yml#L11568-L11570",1 +"SERVER-36897","08/27/2018 22:25:59","OplogReader.hasNext can return false -> true, confusing `checkOplogs`","{{`ReplSetTest.checkOplog`}} will establish a reverse table scanning cursor on the oplog from each node, then ""BFS"" to compare the oplog for consistency. It will first get the latest entry in each oplog and perform a comparison, then advance all of the cursors and compare again. It allows the oplogs to have a different number of entries, so long as they match on the latest entries. To do so, {{checkOplog}} relies on [{{OplogReader.hasNext}}|https://github.com/mongodb/mongo/blob/2145028db135b539c51713acad6952ef36e646cf/src/mongo/shell/replsettest.js#L2105] to always return false after it has done so for the first time. However, if the call that executes the query receives a {{CappedPositionLost}} (thus [not instantiating the shell's internal {{_cursor}}|https://github.com/mongodb/mongo/blob/2145028db135b539c51713acad6952ef36e646cf/src/mongo/shell/query.js#L112], a follow-up {{OplogReader.hasNext}} can return true, having re-issued the find and receiving a batch from the oplog starting at the latest entry.",2 +"SERVER-36960","08/31/2018 17:39:46","Stepdown thread should handle AutoReconnect exceptions when executing replSetStepUp","The stepdown thread can terminate when issuing a {{replSetStepUp}} command, as {{stepdown.py}} does not handle {{AutoReconnect}} [exceptions|https://github.com/mongodb/mongo/blob/2704d7a89e64167fcff7356ada111b313146474e/buildscripts/resmokelib/testing/hooks/stepdown.py#L319-L330].",2 +"SERVER-36976","09/04/2018 18:32:32","Run new agg-fuzzer in evergreen","The aggregation fuzzer evergreen task should be replaced with the new agg-fuzzer. ",2 +"SERVER-36980","09/04/2018 21:25:35","Remove old aggregation fuzzer from evergreen","The aggregation fuzzer is being revamped. We should remove the previous aggregation fuzzer from evergreen.",1 +"SERVER-37074","09/11/2018 16:49:33","Validation hook should continue downgrading if a downgrade was interrupted","SERVER-36718 made the changes so that we can use {{forceValidationWithFeatureCompatibilityVersion}} to upgrade the servers before validating the collections. But BF-10462 is a case when a downgrade in the test was interrupted and there is no way for the validation hook to start the upgrade in the middle of a downgrade. Therefore, the validation hook should first downgrade and then upgrade in this case.",2 +"SERVER-37101","09/12/2018 19:25:45","Add optimization mode aggregation (pipeline) fuzzer to evergreen","Add run the aggregation (pipeline) fuzzer in optimization mode in evergreen. This was previously attempted and the change can be seen here: https://evergreen.mongodb.com/version/5b9923622fbabe77fd9660b1",3 +"SERVER-37120","09/13/2018 15:23:42","Turn off linux-replSet-initialsync-logkeeper Build Variant for 3.4 and 3.6 branches","These tests no longer run as the mongod versions are no longer compatible with FCV version in the snapshot.",1 +"SERVER-37143","09/14/2018 19:21:19","Retry on Interrupted errors in the background DB hash hook","Some tests kill all active sessions, which can include the session used by the background dbhash hook. This can cause the dbhash command to return an {{Interrupted}} error. The dbhash hook should handle it in a similar way to transient transaction errors.",2 +"SERVER-37228","09/20/2018 18:39:26","Escape double quotes in hang analyzer's waitsfor graph","Double quotes are not always escaped right now. See [this failure|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_replica_sets_jscore_passthrough_patch_6818230171cb12727892802c608ba9247815ef06_5ba32958e3c331286f2645a2_18_09_20_05_00_33##%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%25226818230171cb12727892802c608ba9247815ef06%2522%257D%252C%257B%2522hash%2522%253A%25226818230171cb12727892802c608ba9247815ef06%2522%257D%255D%257D] for an example. {noformat}# Legend: # Thread 1 -> Lock 1 indicates Thread 1 is waiting on Lock 1 # Lock 2 -> Thread 2 indicates Lock 2 is held by Thread 2 # No cycle detected in the graph digraph ""mongod+lock-status"" { """"conn41"" (Thread 0x7f63bf32b700 (LWP 71045))"" -> ""Lock 0x7f63df74db00 (mongo::MODE_X)""; ""Lock 0x7f63df74db00 (mongo::MODE_X)"" -> """"rsSync-0"" (Thread 0x7f63bc818700 (LWP 69184))""; """"repl writer worker 10"" (Thread 0x7f63b460b700 (LWP 69197))"" -> ""Lock 0x7f63df74db00 (mongo::MODE_X)""; """"clientcursormon"" (Thread 0x7f63c2f31700 (LWP 68966))"" -> ""Lock 0x7f63df74db00 (mongo::MODE_X)""; """"conn41"" (Thread 0x7f63bf32b700 (LWP 71045))"" [label=""\""conn41\"" (Thread 0x7f63bf32b700 (LWP 71045))"" ] ""Lock 0x7f63df74db00 (mongo::MODE_X)"" [label=""Lock 0x7f63df74db00 (mongo::MODE_X)"" ] """"rsSync-0"" (Thread 0x7f63bc818700 (LWP 69184))"" [label=""\""rsSync-0\"" (Thread 0x7f63bc818700 (LWP 69184))"" ] """"repl writer worker 10"" (Thread 0x7f63b460b700 (LWP 69197))"" [label=""\""repl writer worker 10\"" (Thread 0x7f63b460b700 (LWP 69197))"" ] """"clientcursormon"" (Thread 0x7f63c2f31700 (LWP 68966))"" [label=""\""clientcursormon\"" (Thread 0x7f63c2f31700 (LWP 68966))"" ] } {noformat}",1 +"SERVER-37270","09/21/2018 20:53:06","Remove foreground index build functionality","This work will cause the background:true index option to be ignored, thus making all index builds run through the new hybrid index build path.",20 +"SERVER-37272","09/21/2018 20:55:56","Disable hybrid index builds on FCV 4.0","Because of the interaction hybrid index builds have with prepared transactions (see SERVER-38588), hybrid index builds are dependent on the two-phase behavior of simultaneous index builds, which will only be enabled in FCV 4.2. For this reason, hybrid index builds should be disabled unless a node is in FCV 4.2.",5 +"SERVER-37289","09/24/2018 16:12:33","Use authenticated client to run the refreshLogicalSessionCacheNow command in resmoke sharded cluster fixture","To be able to use PyMongo 3.6+ with resmoke, we need to use an authenticated client to run the {{refreshLogicalSessionCacheNow}}. Since PyMongo 3.6+ will use implicit sessions, if the command is run unauthenticated then we hit SERVER-34820 and fail with a ""there are no users authenticated"" message.",1 +"SERVER-37301","09/24/2018 20:05:02","Add Ubuntu 18.04 zSeries build variant","Add community and enterprise builds for Ubuntu 18.04 zSeries",3 +"SERVER-37359","09/27/2018 20:36:07","Update the test lifecycle script to use the new Evergreen test stats endpoint","The [lifecycle_test_failures.py|https://github.com/mongodb/mongo/blob/82b62cf1e513657a0c35d757cf37eab0231ebc9b/buildscripts/lifecycle_test_failures.py] script should be updated to use the new Evergreen API endpoint for test execution statistics when it becomes available.",3 +"SERVER-37373","09/28/2018 16:11:22","Fully qualified files in suite YML do not run in Burn_in tests on Windows","The burn_in task does not run a fully specified file in the suite YML on Windows",2 +"SERVER-37387","09/28/2018 20:19:28","mongo_lock.py graph displays lock request type for LockManager locks","Changes for SERVER-34738 now display the lock requester's mode. We need to display the lock holder's mode as well. We should change the following {code} print(""MongoDB Lock at {} ({}) held by {} waited on by {}"".format( lock_head, lock_request[""mode""], lock_holder, lock_waiter)) {code} to {code} # Code to set the lock_mode needs to be added print(""MongoDB Lock at {} ({}) held by {} waited on by {} ({})"".format( lock_head, lock_mode, lock_holder, lock_waiter, lock_request[""mode""])) {code} Other references to {{lock_request[""mode""]}} should be modified as well, such that the mode is properly associated to holder.",1 +"SERVER-37428","10/02/2018 20:33:36","Sys-perf: linux builds using enterprise bits","Note: this only affects internal testing systems.  The system_perf.yml file support for enterprise modules is doing the wrong thing for non-enterprise builds. There are three compile tasks in one variant, but they all get the enterprise module. The fix seems to be making the enterprise build more like the wtdevelop build, by checking out the enterprise module to a directory outside the mongo source directory, and actively copying it in for the enterprise build as is done in ""use wiredtiger develop"" function. ",2 +"SERVER-37467","10/04/2018 02:25:10","Have collect_resource_info.py recover from transient errors.","Its output is useful! It's a missed opportunity when the output file doesn't contain all the data it could. {noformat} [2018/09/18 17:31:29.082] Traceback (most recent call last): [2018/09/18 17:31:29.082] File ""buildscripts/collect_resource_info.py"", line 90, in [2018/09/18 17:31:29.082] main() [2018/09/18 17:31:29.082] File ""buildscripts/collect_resource_info.py"", line 40, in main [2018/09/18 17:31:29.082] response = requests.get(""http://localhost:2285/status"") [2018/09/18 17:31:29.082] File ""/opt/mongodbtoolchain/v2/lib/python2.7/site-packages/requests/api.py"", line 72, in get [2018/09/18 17:31:29.082] return request('get', url, params=params, **kwargs) [2018/09/18 17:31:29.082] File ""/opt/mongodbtoolchain/v2/lib/python2.7/site-packages/requests/api.py"", line 58, in request [2018/09/18 17:31:29.083] return session.request(method=method, url=url, **kwargs) [2018/09/18 17:31:29.083] File ""/opt/mongodbtoolchain/v2/lib/python2.7/site-packages/requests/sessions.py"", line 508, in request [2018/09/18 17:31:29.083] resp = self.send(prep, **send_kwargs) [2018/09/18 17:31:29.083] File ""/opt/mongodbtoolchain/v2/lib/python2.7/site-packages/requests/sessions.py"", line 618, in send [2018/09/18 17:31:29.083] r = adapter.send(request, **kwargs) [2018/09/18 17:31:29.083] File ""/opt/mongodbtoolchain/v2/lib/python2.7/site-packages/requests/adapters.py"", line 490, in send [2018/09/18 17:31:29.083] raise ConnectionError(err, request=request) [2018/09/18 17:31:29.083] requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine(""''"",)) {noformat}",1 +"SERVER-37478","10/04/2018 21:29:07","Run agg fuzzer more on Linux 64","Run the fuzzer more on Linux 64 and use it for any fuzzer patch builds. This should allow us to catch more bugs found by new fuzzer improvements before they slip through into mainline evergreen.",1 +"SERVER-37490","10/05/2018 16:31:21","Increase the ConnectTimeout for powercycle","We should increase the {{ConnectTimeout}} used in powercycle tests from 10 to 30 seconds.",1 +"SERVER-37555","10/10/2018 21:51:52","An abort of collMod need not refreshEntry for an index on rollback","When a collMod operation aborts its WriteUnitOfWork, it need not call refreshEntry on rollback. An index catalog change via collMod already invoke refreshEntry. The refreshEntry itself should restore its state on rollback as it registers an IndexRemoveChange to do that.",3 +"SERVER-37599","10/12/2018 18:01:48","Log exit code of shell-spawned processes","We should print {{res.exitCode}} [here|https://github.com/mongodb/mongo/blob/c6bceb292246721c5a0950e84d6b71ee1bc04bdf/src/mongo/shell/servers.js#L1254] when the shell fails to spawn a process to assist with debugging.",1 +"SERVER-37639","10/15/2018 21:26:24","Add checkIfCommitQuorumIsSatisfied() to the replication interface to check whether a given commit quorum is satisfied by a given set of commit ready members.","A simultaneous index build will have a commitQuorum setting, set by the user via createIndexes, which will dictate how many members of the replica set must be ready for commit before the primary will commit the index. Each index build will track which members are ready and must check whether the commitQuorum is satisfied. commitQuorum is the same type and takes the same settings as writeConcern.w: an integer number reflecting a number of replica set members; majority; or a replica tag set. The function should take a list of host:port pairs, which we are using to uniquely identify replica set members, along with the commitQuorum. It should return whether or not quorum is satisfied, leveraging the writeConcern checking machinery if we can, and/or the topology coordinator's member config; and probably error if the quorum can never be satisfied.",13 +"SERVER-37643","10/15/2018 21:46:37","add createIndexes command logic to the index build interface","The index builder interface established is established in SERVER-37636. This ticket will add a Threadpool and move all the instances of MultiIndexBlock (index builder class) that are all over the place behind the interface and running on the Threadpool. We should be able to register index builds via the interface and then wait upon a condition variable to hear back on the Status result. Keep in mind SERVER-37644, which is to make index builds joinable via the createIndexes command. The condition variable setup must be such that we can have multiple waiters who can all hear back about the same result. Maybe an interface internal helper function to get something to wait upon for a Status result.",20 +"SERVER-37644","10/15/2018 21:55:32","Make the createIndexes command join already in-progress index builds","Depends on SERVER-37643 to move all index builds behind the index build interface established in SERVER-37636. The createIndexes command should check whether the index(es) is already being built and wait upon it if so. A new waiting function must be added to the index build interface. An appropriate error message should be returned if: commitQuorum does not match that of the in-progress index build; the indexes and specs do not match identically those in a single index builder. Note that there can be multiple indexes with the same [key pattern but different collations|https://github.com/mongodb/mongo/blob/9f363b489585124afa1e26412e19f6728763e1ad/src/mongo/db/catalog/index_catalog_impl.cpp#L749-L768] (SERVER-24239)",5 +"SERVER-37645","10/15/2018 22:11:08","Add parsing for new index build fields in index catalog entries ","Probably should upgrade the parsing to an IDL while we're at it. *_Update: or not, because that actually sounds like a potential black hole given how much we pass around BSON elsewhere in the index building layer._* // Defaults to false in-memory if absent runTwoPhaseIndexBuild: , // Defaults to ""scanning"" if absent. Can only be set to ""scanning"", ""verifying"" or ""committing"" buildPhase: , // Defaults to 1 if absent. versionOfBuild: , // No default if absent. Should have a bool function to say whether it is present. buildConstraintViolationsFile: , newWritesInterceptorTable: (pick a field name that seems suitable for this) Check in with the design before finalizing.",5 +"SERVER-37663","10/19/2018 14:42:19","Add support for running genny via resmoke.py locally","To aid the local development experience of writing new performance workloads, we should add a new ""test kind"" for running genny tests through resmoke.py. The latency and other metrics collected won't be particularly interesting because all of the processes will be running on the same machine, but we'll be able to ensure the mechanics of the new performance workloads are sound before submitting them to Evergreen and running on a distributed cluster. We should have resmoke.py YAML suite file configurations for * stand-alone mongod {{MongoDFixture}} * 1-node replica set ({{ReplicaSetFixture}} with {{num_nodes=1}}) * 3-node replica set ({{ReplicaSetFixture}} with {{num_nodes=3}} and {{all_nodes_electable=true}}) * sharded cluster (({{ShardedClusterFixture}} with {{configsvr_options.num_nodes=3}}, {{num_mongos=3}}, {{num_shards=3}}, and {{num_rs_nodes_per_shard=3}}) in order to match the configurations the {{genny_workloads}} Evergreen task runs in as part of the dsi Evergreen project. *Note*: There isn't a need to wire up genny's output format and resmoke.py's {{\-\-perfReportFile}} because this mode is only intended for local development and not for running in Evergreen.",3 +"SERVER-37664","10/19/2018 14:42:47","Add support for doing resmoke.py process management through jasper","https://github.com/mongodb/jasper is a library for doing process management through commands over a socket. Having process management available as a service means (1) we can consolidate the various implementations we have through the {{subprocess}} / {{subprocess32}} Python packages and the mongo shell's {{shell_utils_launcher.cpp}} C++ code, and (2) we can allow tests to interact with the cluster in a potentially destructive way. #2 enables tools such as genny to be able to run performance workloads that measure the latency of operations after restarting a mongod or mongos process. MAKE-497 exposed jasper through the {{curator}} binary.",5 +"SERVER-37668","10/19/2018 18:01:02","Disable the aggregation fuzzer on Windows 2008R2 DEBUG","It is failing due to issues such as SERVER-37429 and others that still need to be investigated.",1 +"SERVER-37678","10/19/2018 21:35:34","Update linter to enforce SSPL in header files","SERVER-37651 changed the license from AGPL to SSPL; would be nice if the linter enforced the new license in new files automatically.",2 +"SERVER-37694","10/22/2018 16:43:12","Coverity analysis defect 105088: Redundant test","Test always evaluates the same Defect 105088 (STATIC_C) Checker DEADCODE (subcategory redundant_test) File: {{/src/mongo/db/storage/biggie/store.h}} Function {{mongo::biggie::RadixStore, std::allocator>, std::__cxx11::basic_string, std::allocator>>::_merge3Helper(mongo::biggie::RadixStore, std::allocator>, std::__cxx11::basic_string, std::allocator>>::Node *, const mongo::biggie::RadixStore, std::allocator>, std::__cxx11::basic_string, std::allocator>>::Node *, const mongo::biggie::RadixStore, std::allocator>, std::__cxx11::basic_string, std::allocator>>::Node *, std::vector, std::allocator>, std::__cxx11::basic_string, std::allocator>>::Node *, std::allocator, std::allocator>, std::__cxx11::basic_string, std::allocator>>::Node *>> &, std::vector>&)}} /src/mongo/db/storage/biggie/store.h, line: 1337 {color:red}At condition ""baseNode"", the value of ""baseNode"" cannot be ""NULL"".{color} {code:first-line=1337} } else if (baseNode && (!otherNode || (otherNode && baseNode != otherNode))) { {code} /src/mongo/db/storage/biggie/store.h, line: 1312 {color:red}Condition ""baseNode"", taking true branch. Now the value of ""baseNode"" is not ""NULL"".{color} {code:first-line=1312} if (!node && !baseNode && !otherNode) {code} /src/mongo/db/storage/biggie/store.h, line: 1337 {color:red}The condition ""baseNode"" must be true.{color} {code:first-line=1337} } else if (baseNode && (!otherNode || (otherNode && baseNode != otherNode))) { {code} ",1 +"SERVER-37767","10/26/2018 17:36:14","Platform Support: Remove Debian 8 x64","Platform Support: Remove Debian 8 x64 - Only in 4.2, not in previous releases: we’ll apply the latest/latest-1 policy for new releases - For existing MongoDB releases we’ll follow the vendor ",2 +"SERVER-37768","10/26/2018 17:36:32","Platform Support: Add Community & Enterprise Debian 10 x64","Platform Support: Add Community & Enterprise Debian 10 x64 ",3 +"SERVER-37769","10/26/2018 17:36:39","Platform Support: Add Community & Enterprise SLES 15 x64","Platform Support: Add Community & Enterprise SLES 15 x64",3 +"SERVER-37770","10/26/2018 17:36:46","Platform Support: Add Community and Enterprise Ubuntu 18.04 ARM64","Platform Support: Add Community Ubuntu 18.04 ARM64",5 +"SERVER-37771","10/26/2018 17:36:56","Platform Support: Add Enterprise Ubuntu 18.04 PPCLE","Platform Support: Add Enterprise Ubuntu 18.04 PPCLE",3 +"SERVER-37772","10/26/2018 17:37:05","Platform Support: Add Community & Enterprise RHEL 8 x64","Platform Support: Add Community & Enterprise RHEL 8 x64 ",3 +"SERVER-37778","10/26/2018 17:38:22","Platform Support: Add Community & Enterprise Ubuntu 18.04 (zSeries)","Platform Support: Add Community & Enterprise Ubuntu 18.04 (zSeries)",3 +"SERVER-37789","10/26/2018 21:22:09","Add --genny flag to resmoke.py","I'd like to be able to specify a genny executable without modifying a committed file. People shouldn't need to use a locally compiled version of genny and they shouldn't need to stash between git branch operations.",1 +"SERVER-37926","11/05/2018 19:18:53","Excess allocated memory associated with cursors due to WT ""modify"" operations","In SERVER-37795 we observed excess allocated memory associated with cursors due to WT ""modify"" operations in 3.6.8. This should not occur in 4.0 because improved cursor caching should limit the number of cursors, but we still see excess allocated memory: !comparison.png|width=100%! The excess memory in 4.0.3 is more than double that of 3.6.8. Note that the rate of updates is somewhat higher in 4.0.3 than 3.6.8 (probably SERVER-36221), but that doesn't appear to be enough larger to account for the >2x larger excess memory allocated in 4.0.3. Possibly related: ""cached cursor count"" seems to grow indefinitely, although we only have that information for the oplog collection and not globally. Notes: * Verified that the memory is associated with cursors by disabling cursor caching in 3.6.8 and the excess memory disappeared. * Uesed the heap profiler to confirmed that the excess allocated memory is coming from the same allocation sites in 4.0.3 as 3.6.8. Repro: {code} load(""/home/bdlucas/mongodb/git/mongo/jstests/libs/parallelTester.js"") size = 1000000 bunch = 100 // start a bunch of threads inserting to a bunch of collections // in 3.6.8 this creates a big bunch of cursors (bunch^2, maybe) // but not in 4.0.3 threads = [] for (var t=0; t npm_test-${task_id}.log 2>&1 {noformat} https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_enterprise_rhel_62_64_bit_majority_read_concern_off_jstestfuzz_interrupt_replication_0_enterprise_rhel_62_64_bit_majority_read_concern_off_patch_41c44d02cf39ef581888bed68c547e4ed9b5a323_5bff8490e3c33123cb9e7dfa_18_11_29_06_19_05/0?type=T {noformat} [2018/11/29 06:56:17.137] npm test > npm_test-mongodb_mongo_master_enterprise_rhel_62_64_bit_majority_read_concern_off_jstestfuzz_interrupt_replication_0_enterprise_rhel_62_64_bit_majority_read_concern_off_patch_41c44d02cf39ef581888bed68c547e4ed9b5a323_5bff8490e3c33123cb9e7dfa_18_11_29_06_19_05.log 2>&1 [2018/11/29 06:56:17.137] sh: line 16: npm_test-mongodb_mongo_master_enterprise_rhel_62_64_bit_majority_read_concern_off_jstestfuzz_interrupt_replication_0_enterprise_rhel_62_64_bit_majority_read_concern_off_patch_41c44d02cf39ef581888bed68c547e4ed9b5a323_5bff8490e3c33123cb9e7dfa_18_11_29_06_19_05.log: File name too long {noformat}",1 +"SERVER-38323","11/29/2018 21:15:48","Create an index builds interface for the embedded version of the server","The IndexBuildsCoordinator is for managing index builds across a replica set. It links in networking and has a ThreadPool on which to run index builds. The embedded server cannot link to networking code, nor can it use ThreadPools. Therefore, we must make a simple class that builds indexes via the IndexBuildsManager. There is no need for asynchronous threads on an embedded server, which is effectively a standalone without networking or asynchronously running threads. I haven't looked into whether it is possible to just make a separate class to link into embedded, or inheritance is necessary, necessitating splitting the existing IndexBuildsCoordinator into and interface and implementation. I'd guess inheritance is necessary, since commands in the standalone library are probably all included in the greater repl inclusive libraries? Shims are also a potential tool, I haven't explored that idea, either.",5 +"SERVER-38336","11/30/2018 18:59:03","Coverity analysis defect 105145: Copy without assign","Class has user-written copy constructor but no user-written assignment operator Defect 105145 (STATIC_C) Checker COPY_WITHOUT_ASSIGN (subcategory none) File: {{/src/mongo/db/storage/kv/temporary_kv_record_store.h}} Parse Warning (no function name available) /src/mongo/db/storage/kv/temporary_kv_record_store.h, line: 46 {color:red}Class ""mongo::TemporaryKVRecordStore"" has a user-written copy constructor ""mongo::TemporaryKVRecordStore::TemporaryKVRecordStore(mongo::TemporaryKVRecordStore &&)"" but no corresponding user-written assignment operator.{color} {code:first-line=46} class TemporaryKVRecordStore : public TemporaryRecordStore { {code} /src/mongo/db/storage/kv/temporary_kv_record_store.h, line: 54 {color:red}User-written copy constructor.{color} {code:first-line=54} TemporaryKVRecordStore(TemporaryKVRecordStore&& other) noexcept {code} ",1 +"SERVER-38395","12/04/2018 18:21:07","Python global logger is polluted when importing certain resmokelib modules","It was discovered that the global {{logging}} in {{buildscripts/update_test_lifecycle.py}} gets overriden by {{buildscripts/mobile/adb_monitor.py}} because of the resmokelib imports. The file {{buildscripts/mobile/benchrun_embedded_setup_android.py}} also has similar issue. The {{logging}} setting should not be done in the global scope but in the {{main}} function: {code}logging.basicConfig(format=""%(asctime)s %(levelname)s %(message)s"", level=logging.INFO) {code}",1 +"SERVER-38396","12/04/2018 19:15:29","Improve the IndexBuildsCoordinator unit testing after it was made into an interface with two implementations","SERVER-38323 added an embedded implementation, and the original unit test only tests the original implementation. This task is to cover both implementations with unit testing.",8 +"SERVER-38477","12/07/2018 22:39:36","Index build lock acquisitions should be interruptible","Index build avoids interruptions in several places, especially for background index build. They will conflict with prepared transactions on stepdown and shutdown. We can either make index build interruptible, or use IX or IS locks instead of X or S locks. Here's a list of all occurrences of UninterruptibleLockGuard for index build. [src/mongo/db/catalog/multi_index_block_impl.cpp:156|https://github.com/mongodb/mongo/blob/86fab3ee0e1570c6743b314fddc0af418bba9015/src/mongo/db/catalog/multi_index_block_impl.cpp#L156] [src/mongo/db/commands/create_indexes.cpp:325|https://github.com/mongodb/mongo/blob/86fab3ee0e1570c6743b314fddc0af418bba9015/src/mongo/db/commands/create_indexes.cpp#L325] [src/mongo/db/index_builder.cpp:195|https://github.com/mongodb/mongo/blob/86fab3ee0e1570c6743b314fddc0af418bba9015/src/mongo/db/index_builder.cpp#L195] [src/mongo/db/index_builder.cpp:299|https://github.com/mongodb/mongo/blob/86fab3ee0e1570c6743b314fddc0af418bba9015/src/mongo/db/index_builder.cpp#L299] ",13 +"SERVER-38478","12/07/2018 22:47:20","Remove UninterruptibleLockGuard in query yield","restoreLockState() is used by [query yielding|https://github.com/mongodb/mongo/blob/a66a5578d5b006cef85b16eac05c96b58c877ebe/src/mongo/db/query/query_yield.cpp#L92] and [transaction reaper|https://github.com/mongodb/mongo/blob/a66a5578d5b006cef85b16eac05c96b58c877ebe/src/mongo/db/transaction_reaper.cpp#L165]. To make sure they don't conflict with prepared transactions on stepdown and shutdown, we need to guarantee they only restore IS or IX locks or they restore locks that won't conflict with transactions.",8 +"SERVER-38509","12/10/2018 21:05:45","Handle degraded mode for test history in generate_resmoke_suites","Evergreen is going to implement a ""degraded"" mode if it cannot respond to test history queries due to load. In that mode, queries to the test history will return HTTP 503 (See https://jira.mongodb.org/browse/EVG-5633). We should detect this condition when attempting to split up test suites and divide up the suites randomly (we may want to define an expansion on the project of how much to divide up in this situation, that would allow us to change the value without needing to commit new code).",2 +"SERVER-38531","12/11/2018 16:41:37","Increase parallelism in test lifecycle update script and perform more rollups server-side","The {{update_test_lifecycle.py}} script only parallelizes the requests across tasks for a given batch of tests. We should be able to parallelize all requests to improve the overall run time. Additionally we can take advantage of the {{group_num_days}} parameters of the Evergreen API to fetch stats results already aggregated for the reliable and unreliable periods.",3 +"SERVER-38532","12/11/2018 18:01:44","Add index ns and name to ""build index done"" log line","A lack of final log details make it hard to reconstruct index build timelines from {{mongod}} logs using {{grep}} etc. When a build starts we get a log like: {code} migtest34-shard-00-02-sokhy.mongodb.net/27017/mongodb/mongodb.log.2018-12-11T08-01-04:2018-12-10T23:13:22.544+0000 I INDEX [repl index builder 13] build index on: trev.historicTrackingEvent properties: { v: 2, key: { createdDateTime: 1 }, name: ""createdDateTime"", expireAfterSeconds: 10368000, ns: ""trev.historicTrackingEvent"", background: true } {code} but when it ends, only: {code} migtest34-shard-00-02-sokhy.mongodb.net/27017/mongodb/mongodb.log.2018-12-11T08-01-04:2018-12-10T23:49:52.020+0000 I INDEX [repl index builder 13] build index done. scanned 50567515 total records. 2189 secs {code} Can we add at least the {{ns}} and index name to the ""build index done"" so that we can use grep for analysis? ",1 +"SERVER-38562","12/12/2018 14:57:45","Implement IndexBuildsCoordinator::voteCommitIndexBuilds","Consider moving the [Client::setLastOpToSystemLastOpTime|https://github.com/mongodb/mongo/blob/597b4748fc36210c61cf4d6c086d364013df740a/src/mongo/db/commands/vote_commit_index_builds_command.cpp#L77-L80] logic into the function logic. A flag must be set in-memory on the index build whether or not to proceed without voting or end the thread after voting successfully or finding the flag set after the fact. commitIndexBuild, if the build has already reached the 'committing' phase, will set the flag and start a new asynchronous thread for the commit; else, the in-memory flag will be set such that the index build discovers it later and bypasses voting, proceeding straight to commit. This is necessary because stalling commit, and thereby stalling replication on the secondary, cannot be permitted to take as long as a network call can potentially take -- a matter of seconds, presumably. The alternative would be to make the voteCommitIndexBuild command sent by the secondary have a short enough timeout that we don't mind stalling replication for that amount of time: but this is risky given that determining a reasonable time for all network and replication latencies might be impossible. The index build thread would exist after finishing voting, as opposed to waiting on a condition variable for the commitIndexBuild signal, as ReplIndexBuildState is currently expecting with its condition variable already set up and waiting to be used -- need to change that. The flag must also be initialized correctly on index build recovery, depending on the persisted state.",5 +"SERVER-38589","12/13/2018 00:18:59","service mongod stop may produce No /usr/bin/mongod found running; none killed.","A fix should be applied to https://github.com/mongodb/mongo/blob/6c8dc5e004bf2c91df10975adef861bcf00af6cd/debian/init.d to prevent this error when stopping mongod.",2 +"SERVER-38615","12/13/2018 19:11:16","The psutil module should be installed on all platforms","We are now using {{psutil}} in {{buildscripts/tests/test_evergreen_resmoke_job_count.py}} and this runs on every platform (see SERVER-38115).",1 +"SERVER-38667","12/17/2018 14:42:10","Notify IndexBuildsCoordinator of replica set member stepup and stepdown","Add step-up and step-down hooks for the IndexBuildsCoordinator. There's a field on the mongod implementation that indicates primary/secondary state https://github.com/mongodb/mongo/blob/5eca4a77da863bd4e68bf4eb7c2d0c920982f8b9/src/mongo/db/index_builds_coordinator_mongod.h#L137 And there are interface functions for setting primary and secondary, which look like they've been implemented in the mongod and embedded already. https://github.com/mongodb/mongo/blob/5eca4a77da863bd4e68bf4eb7c2d0c920982f8b9/src/mongo/db/index_builds_coordinator.h#L197-L198 We'll need to call the state change functions in here https://github.com/mongodb/mongo/blob/4d09b2e0a605aefd7adefda28e01e309bbf30883/src/mongo/db/repl/replication_coordinator_external_state_impl.cpp#L483 and it looks like stepdown hooks go in here maybe https://github.com/mongodb/mongo/blob/4d09b2e0a605aefd7adefda28e01e309bbf30883/src/mongo/db/repl/replication_coordinator_impl.cpp#L2800 -- check in with someone from repl to find out their preferences for new hook additions.",5 +"SERVER-38710","12/19/2018 20:27:53","Support dependencies when generating evergreen tasks","Support depends_on and requires when generating tasks",2 +"SERVER-38749","12/21/2018 20:28:24","Concurrent stepdown suites on 3.6 branch still use 5-second election timeout","The changes from [3aa3155|https://github.com/mongodb/mongo/commit/3aa315557bef775c5291068e365a59a3a810fc41] as part of SERVER-30642 were ineffective at increasing the election timeout for the {{concurrency_sharded_with_stepdowns*.yml}} test suites because the JavaScript version of the stepdown thread reconfigures the replica set and sets a 5 second election timeout by default. We should additionally additional set {{electionTimeoutMS=1 day}} as part of the stepdown options specified to [the {{ContinuousStepdown.configure()}} function|https://github.com/mongodb/mongo/blob/r3.6.9/jstests/concurrency/fsm_libs/cluster.js]. *Note*: This is no longer an issue for the 4.0 or master branches because they've switched to using the Python version of the stepdown thread, which doesn't reconfigure the replica set.",1 +"SERVER-38779","12/27/2018 03:06:14","Build a mechanism to periodically cleanup old WT sessions from session cache","The way session cache is maintained, idle sessions keep accumulating in the session cache. If the workload doesn't use all the idle sessions, the oldest sessions stay open forever. In some cases these sessions might hold some resources inside WiredTiger, which can cause problems. eg: dhandles that never close in WiredTiger. This ticket is to build a mechanism around the session cache, to cleanup old sessions that have been idle for too long. More details in the linked tickets.",8 +"SERVER-38816","01/03/2019 14:25:27","Use generate.tasks for required tasks over target runtime","Apply generate.tasks to resmoke tasks on required builders that have an average runtime greater than the target runtimes. The target runtimes are: * RHEL 6.2: 10 minutes * Enterprise Windows 2008R2: 20 mins * Linux DEBUG: 15 mins",2 +"SERVER-38817","01/03/2019 14:26:42","Use generate.tasks on all resmoke tasks","Migrate any resmoke tasks that are not using generate.tasks to use generate.tasks.",3 +"SERVER-38818","01/03/2019 14:29:08","Better handle dependencies between generated tasks","With generated tasks, dependencies between tasks gets complicated. The dependencies should exist on the generator tasks and when generating a task should query the evergreen api to determine which dependent tasks were generated and add all of them as dependencies on the tasks being created.",3 +"SERVER-38822","01/03/2019 16:09:46","Linux Repeated Execution variant does not repeat the tests","The change for SERVER-36613 introduced a second definition of the {{test_flags}} expansions for {{linux-64-repeated-execution}} buildvariant: {code} expansions: compile_flags: -j$(grep -c ^processor /proc/cpuinfo) --variables-files=etc/scons/mongodbtoolchain_gcc.vars --enable-free-mon=off --enable-http-client=off test_flags: --repeatSuites=10 --shuffle scons_cache_scope: shared test_flags: --excludeWithAnyTags=requires_http_client tooltags: """" build_mongoreplay: true {code} The second definition overrides the first.",1 +"SERVER-38886","01/08/2019 16:27:46","refactor RecordStore::validate implementations","All storage engines currently have one that's almost identical. The storage-engine independent iteration should be factored out, and only the storage-engine specific part should be left. There also is an implementation in {{RecordStoreValidateAdaptor::traverseRecordStore}}.",1 +"SERVER-38927","01/10/2019 15:53:05","Cache collection 'temp' status on Collection object","We cache things like validators and 'capped' status on the Collection object so we do not need to go all the way to storage and open up a storage transaction every time we want to check the value. As far as I know, 'temp' cannot change with 'collMod', but it can with 'renameCollection', and on rollback. This is needed for SERVER-38139 to ban temporary collections in transactions without having to consult the storage engine an extra time for every transaction statement.",3 +"SERVER-38931","01/10/2019 18:13:13","Apply relevant changes to snapshot_read_kill_operations.js to 4.0 branch","SERVER-37009 fixed an error in the [{{snapshot_read_kill_operations.js}}|https://github.com/mongodb/mongo/blob/0ffb6bc78dc1219692b294215c97d48a7e9f1fdd/jstests/concurrency/fsm_workloads/snapshot_read_kill_operations.js] test that caused the {{killSessions}} portion of the test to silently fail to kill the session if the corresponding session document did not exist. This fix uncovered other issues with the test that have been fixed in master but not backported to 4.0. All relevant fixes to this test, including the fix in SERVER-37009, should be applied to the 4.0 version of this test. One possible way to do this would be to replace the 4.0 version of the test with the corresponding version in master, being mindful of any changes made to the test that are specific to master.",3 +"SERVER-39004","01/15/2019 08:03:33","Introduce a quota mechanism for the overflow file","We don't currently have a quota mechanism to prevent {{WiredTigerLAS.wt}} from growing and eventually running out of disk space. It would help to have such a configuration in place so that once a file reaches a configured size, we reboot mongod process, which will effectively clean up the {{WiredTigerLAS.wt}} file",5 +"SERVER-39007","01/15/2019 16:48:28","Switch to use rhel62-large distro for concurrency* tasks on Enterprise RHEL 6.2 (InMemory) builder","We're seeing OOM failures with the InMemory storage engine [occur consistently|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_inmem_concurrency_replication_2f67f3c66271e724b48afa2db88e8b6c3317f6ab_19_01_11_18_02_54] after the changes from [2f67f3c|https://github.com/mongodb/mongo/commit/2f67f3c66271e724b48afa2db88e8b6c3317f6ab] as part of SERVER-33161. Changing to the {{rhel62\-large}} distro for the {{concurrency*}} tasks on the Enterprise RHEL 6.2 (InMemory) builder is a stopgap for getting the build back to being green until whether the increased memory consumption can be declared as ""expected"".",1 +"SERVER-39064","01/17/2019 13:54:24","Storage interface changes for specifying durable_timestamp","The storage interface must allow specifying a {{durable_timestamp}} when committing a prepared transaction.",3 +"SERVER-39068","01/17/2019 17:51:40","Replication of simultaneous index builds startIndexBuild and commitIndexBuild oplog entries","A temporary command, twoPhaseCreateIndexes, already exists. SERVER-39066 sets up the OpObserver and oplog.cpp. Wait for SERVER-37643 to set up builders in the IndexBuildsCoordinator/Manager. Then, set up a code path into the Coordinator/Manager that will do a two phase index build, and have the {{twoPhaseCreateIndexes}} command call it. The [{{twoPhaseIndexBuild}}|https://github.com/mongodb/mongo/blob/e990d25622d96897d78e72b362db61f2a4f9d99c/src/mongo/db/repl_index_build_state.h#L88] flag in the {{ReplIndexBuildState}} object should be set. A startIndexBuild oplog entry should optionally (based on the Coordinator's twoPhaseIndexBuild setting) be written in the same WUOW as the index catalog entry initialization write: this should parallel the oplog write on commit seen [here|https://github.com/mongodb/mongo/blob/e990d25622d96897d78e72b362db61f2a4f9d99c/src/mongo/db/catalog/multi_index_block.cpp#L664-L666] and [here|https://github.com/mongodb/mongo/blob/e990d25622d96897d78e72b362db61f2a4f9d99c/src/mongo/db/commands/create_indexes.cpp#L402-L406]. The {{startIndexBuild}} oplog entry should start an index build, which I think is [already hooked up|https://github.com/mongodb/mongo/blob/e990d25622d96897d78e72b362db61f2a4f9d99c/src/mongo/db/repl/oplog.cpp#L280-L292], just inactive and not tested. The commitIndexBuild oplog entry should optionally be swapped out with the createIndexes oplog entry currently written on index commit, based on the {{twoPhaseCreateIndexes}} setting. Secondaries don't do anything on receipt of commitIndexBuild, and we will leave that to implement in a separate patch.",8 +"SERVER-39085","01/18/2019 17:54:24","move secondary oplog application logic for index creation into IndexBuildsCoordinator","During secondary oplog application, we should delegate index creation to the IndexBuildsCoordinator.",8 +"SERVER-39086","01/18/2019 17:55:51","Move startup recovery index creation logic into IndexBuildsCoordinator","Index creation during startup recovery should be delegated to the IndexBuildsCoordinator",8 +"SERVER-39087","01/18/2019 17:57:00","move initial sync index creation logic into IndexBuildsCoordinator","Index creation during initial sync should be delegated to the IndexBuildsCoordinator",13 +"SERVER-39094","01/18/2019 19:45:59","Update jasper_process.py in resmoke to reflect Jasper RPC changes","The current implementation for [{{jasper_process.py}}|https://github.com/mongodb/mongo/blob/master/buildscripts/resmokelib/core/jasper_process.py] has [some lines of code|https://github.com/mongodb/mongo/blob/master/buildscripts/resmokelib/core/jasper_process.py#L63-L67] for handling RPC errors from Jasper when it does things like signal a process that has already terminated somehow. This is not reflective of how Jasper actually handles such errors after the completion of MAKE-525, and therefore should be changed to check for these conditions in the [{{val}} case|https://github.com/mongodb/mongo/blob/master/buildscripts/resmokelib/core/jasper_process.py#L59-L62]. Not doing this causes a test failure in {{replica_sets_kill_secondaries_jscore_passthrough}}, in which resmoke exhibits inconsistent behavior where it sends {{SIGKILL}}/{{SIGTERM}} to processes, and then expects the same processes to be alive at a later time. The full description of this problem can be found at [this comment|https://jira.mongodb.org/browse/MAKE-523?focusedCommentId=2117281&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-2117281].",1 +"SERVER-39106","01/18/2019 23:08:01","GlobalLock acquisition should throw when ticket acquisition times out if there is a max lock timeout and no deadline","If global lock acquisition times out acquiring a ticket, then the constructor will not throw, but the resource will be unlocked. This leads to invariant failure when we attempt to acquire the global lock [here|https://github.com/mongodb/mongo/blob/89c3502129303b41b8d35bf5d64eb0a242f061da/src/mongo/db/transaction_participant.cpp#L788] then call {{canAcceptWritesForDatabase()}} [here|https://github.com/mongodb/mongo/blob/89c3502129303b41b8d35bf5d64eb0a242f061da/src/mongo/db/transaction_participant.cpp#L800], which invariants that the lock is held [here|https://github.com/mongodb/mongo/blob/89c3502129303b41b8d35bf5d64eb0a242f061da/src/mongo/db/repl/replication_coordinator_impl.cpp#L1947]. If the caller did not provide a deadline, then they are not checking for lock acquisition failure, so the lock acquisition should throw. Consider applying the following patch: {noformat}diff --git a/src/mongo/db/concurrency/lock_state.cpp b/src/mongo/db/concurrency/lock_state.cpp index 11a4028..e7040b8 100644 --- a/src/mongo/db/concurrency/lock_state.cpp +++ b/src/mongo/db/concurrency/lock_state.cpp @@ -353,6 +353,12 @@ LockResult LockerImpl::_lockGlobalBegin(OperationContext* opCtx, LockMode mode, dassert(isLocked() == (_modeForTicket != MODE_NONE)); if (_modeForTicket == MODE_NONE) { auto acquireTicketResult = _acquireTicket(opCtx, mode, deadline); + uassert(ErrorCodes::LockTimeout, + str::stream() << ""Unable to acquire ticket with mode '"" << mode + << ""' within a max lock request timeout of '"" + << _maxLockTimeout.get() + << ""' milliseconds."", + acquireTicketResult == LOCK_OK || !_maxLockTimeout); if (acquireTicketResult != LOCK_OK) { return acquireTicketResult; } {noformat}",3 +"SERVER-39127","01/22/2019 18:41:40","Use generate.tasks for already converted tasks on all variants","Several tasks already use generate.tasks on required builders. These should be switched to use generate.tasks on all builders and the non-generate.tasks tasks removed.",2 +"SERVER-39224","01/28/2019 16:22:08","Explore why queryoptimizer3.js fails using the IndexBuildsCoordinator, then fix it","""So that test has one thread repeatedly recreate a collection with indexes, bulk insert a bunch of docs and do a single index table scan. And a second thread drop the collection repeatedly, to check that index table scan cursors get aborted on dropCollection.""",5 +"SERVER-39225","01/28/2019 16:26:17","Update kill_rooted_or.js / IndexBuildsCoordinator to gracefully handle index build already in-progress errors","""The Coordinator registers each index spec by name that comes in, before the indexes are set up in the index catalog as in-progress. Then subsequent requests with the same index names hit an error in the Coordinator. Without the Coordinator, spec requests are normally automatically filtered out if they're already found to be built or building via the index catalog. The concurrency test has 10 threads, potentially all running the same createIndexes requests at the same time, so ""There's already an index with name..."" errors make sense -- previously they'd get filtered out and the redundant commands return OK. Perhaps a new error code IndexBuildAlreadyInProgressForName to fix tests, and we can consider the final behavior changes later, and easily identify the changes by error code if we want to undo them. This is related to our problem regarding spec checking being scattered all over the place at different levels of the code.""",3 +"SERVER-39239","01/28/2019 22:25:50","Two-phase index builds on secondaries will wait for the commitIndexBuild oplog entry before committing.","The secondary's index build thread should spin in a loop before finishing up and committing the index. The secondary will do nothing while spinning in a loop. It should drop all locks. SERVER-39458 will add functionality to the loop, of periodically reacquiring the lock and running side table draining. On receipt of the commitIndexBuild oplog entry, the secondary should signal the index build to commit with a timestamp passed in the oplog entry. The oplog applier thread will need to drop locks to allow the index build to take the X lock for index commit -- I believe this should be safe, since 'c' oplog entries are applied serially, and so dropping the lock is OK. IndexBuildsCoordinator::commitIndexBuild will need to fetch a Future from the index build to return to the oplog applier thread to wait upon for commit completion. SERVER-39533 will be done after this ticket and will hook up the abortIndexBuild oplog entry so that alternatively the index build on the secondary can be aborted while it's spinning in that loop waiting for commit. In case that's a design consideration. Do not try to act on abort/commit signals earlier in the index build on secondaries than that spinning loop that waits for commit/abort. This is the simple first iteration implementation, we'll make it fancier in a subsequent ticket.",5 +"SERVER-39279","01/29/2019 23:14:55","Race between ServiceContext::ClientDeleter and waitForClientsToFinish()","{{ServiceContext::ClientDeleter::operator()(Client* client)}} accesses the serviceContext outside the serviceContext's mutex to call the registered observers on {{onDestroy()}}, while the serviceContext may have already been destroyed by [service_context_test_fixture|https://github.com/mongodb/mongo/blob/e12dcc7fdbdb44fb7806dfb42a49bd740f361d82/src/mongo/db/service_context_test_fixture.cpp#L52], since the client vector is empty, checked by [waitForClientsToFinish|https://github.com/mongodb/mongo/blob/cdf319123d8e5d3cd169e2a11aec6aea0b951bf1/src/mongo/db/service_context.cpp#L344]. {code:c++} void ServiceContext::ClientDeleter::operator()(Client* client) const { ServiceContext* const service = client->getServiceContext(); { stdx::lock_guard lk(service->_mutex); invariant(service->_clients.erase(client)); if (service->_clients.empty()) { service->_clientsEmptyCondVar.notify_all(); } } // The serviceContext may have already been destroyed. onDestroy(client, service->_clientObservers); delete client; } {code}",2 +"SERVER-39304","01/31/2019 15:08:42","Add new required variant linux-64-required-duroff to evergreen.yml","In order to ensure that new tests are correctly tagged, there are currently 3 build variants which execute burn_in_tests in a distinct configuration, using the expansion macro ${burn_in_tests_build_variant}: * enterprise-rhel-62-64-bit-required-majority-read-concern-off * enterprise-rhel-62-64-bit-required-inmem * rhel-62-64-bit-required-mobile To catch tests requiring the requires_journaling tag, we can add another build variant, {{linux-64-required-duroff}}, in evergreen.yml, similar to enterprise-rhel-62-64-bit-required-majority-read-concern-off. ",1 +"SERVER-39305","01/31/2019 15:14:03","Update resmoke to support new repeatTests options","Add the following new options to control test repetition: * --repeatTestsTimeSecs * --repeatTestsMax These options help repeat a test until the {{\-\-repeatTestsTimeSecs}} is reached. The default value, None, indicates no time limit specified and resmoke uses the {{\-\-repeatTests}} value. An additional parameter, {{\-\-repeatTestsMax}}, which would be used in conjunction with {{\-\-repeatTests}} (minimum number of repetitions) to bound the {{\-\-repeatTestsTimeSecs}} between these values. The test runs as follows: * At least {{\-\-repeatTests}} times * Stops repeating when either the {{\-\-repeatTestsTimeSecs}} or {{\-\-repeatTestsMax}} is reached See SERVER-38911 for a proof of concept of this option.",2 +"SERVER-39307","01/31/2019 15:21:59","Update burn_in_tests.py to support new resmoke repeatTests* options","Modify burn_in_tests.py to support the following new options * --repeatTestMin (default 2): The minimum number of times a test will run * --repeatTestMax (default 1000): The maximum number of times a test will run * --repeatTestTimeSecs (default 600): The time used to compute the number of repetitions a test will have, when there is a test or task history available Note - The last 2 repeat option values will be passed to resmoke in a different ticket (SERVER-39311), which will enable burn_in testing to increase the repetition count.",2 +"SERVER-39308","01/31/2019 15:24:45","Update burn_in_tests.py to generate sub-tasks JSON file","Modify burn_in_tests.py to support the following new option * --generateTasksFile (default False): Generate the JSON tasks file which is used in generate.tasks ",3 +"SERVER-39311","01/31/2019 15:57:00","Enable burn_in repetition count","Update evergreen.yml to use burn_in_tests_gen task and update burn_in_test.py to pass the new repeatTests options to resmoke",2 +"SERVER-39313","01/31/2019 16:01:02","Create burn_in_tests metric tracking script","A script to track the effectiveness of burn_in_tests will be manually run. This script should be called buildscripts/metrics/burn_in_tests.py. The script will invoke the task history of patch and mainline builds over a user specific period (default 4 weeks). It will provide the following, for builds which ran burn_in_tests: * Number of patch builds * Number of failing tasks * Number of failing burn_in_tests tasks * Number of patch builds where only burn_in_tests failed * Number of tasks generated * Number of tests executed * Number of times task exceeded the expected run time +AWS costs+ Computing the AWS costs for associated to each burn_in task (and the sub-tasks it spawned) will be useful in understanding what additional cost there is to run burn_in more than twice. These costs can be computed using task history time for all generated tasks and the main burn_in task weighted with the type of distro the tasks ran on. Contrasting this result with prior burn_in tasks runs will provide some metric for the increased cost. The aws costs for burn_in can be computed with the following splunk query: {{index=evergreen stat = task-end-stats task = burn_in_tests* project = mongodb-mongo-master | timechart span=1h avg(cost)}}",3 +"SERVER-39355","02/01/2019 22:00:58","Collection drops can block the server for long periods","Hi, sorry but we've just had another occurrence today (still running 3.4.13) so there's still an issue here. We've modified our code to drop collection to sleep 10 sec between each deletion (to give mongo some time to recover after the ""short"" global lock and not kill the platform) but unfortunately this wasn't enough and it killed the global performance: !https://jira.mongodb.org/secure/attachment/203795/screenshot-4.png! After investigation I found that this was cause by some collection deletion. I tried to upload the diagnostic.data but the portal specified earlier doesn't accept files any more. I can upload it if you give another portal. Here is the log from the drop queries: [^mongo_drop_log.txt], we can see here that they are spaced by 10sec (+drop duration) and that the drop take A LOT of time (all these collections were empty or had 5 records at most). They had some indexes though, which are not shown here but probably had to be destroyed at the same time. I don't know if it's a checkpoint global lock issue again but it's definitely still not possible to drop collection in a big 3.4.13 mongo without killing it. For the record we have ~40k namespaces, this has not changed much since the db.stats I reported above. And before you say this is probably fixed in a more recent version, we'll need better proof than last time considering the high risk of upgrading...",1 +"SERVER-39377","02/05/2019 16:14:28","Make efficient hot backup work with enableMajorityReadConcern=false","Currently hot backup does not work with server with enableMajorityReadConcern=false. This ticket is to modify the mechanism and enable the testing for those variants.",8 +"SERVER-39413","02/07/2019 16:45:17","Write script to analyze evergreen task tag usage","A script to analyze task tag usage in evergreen.yml would be useful in watching for misuse of the tags. The scripts should provide the following information: * A list of tags being used by tasks. * Given a tag, provide a list of tasks marked with that tag.",2 +"SERVER-39414","02/07/2019 16:46:15","Use task tags in evergreen.yml","Switch evergreen.yml to use task tags for specifying which tests to run on which variant. This will involve some investigation to determine a good way of selecting which tags to use and then switching etc/evergreen.yml to use those tags. ---- As a MongoDB engineer, I want to be able to specify the build variants a task run on via tasks, so that I don't have to manually add a task to all buildvariants it should run on. ---- AC: The tasks run of each build variant should not change.",5 +"SERVER-39419","02/07/2019 18:40:13","Stepdown interrupting a running dropDatabase command could leave unclean state","As the comment in the linked BF describes, a stepdown could interrupt a  running dropDatabase and leave the dropPending true on that database.",3 +"SERVER-39428","02/07/2019 22:27:53","Record all indexing errors during simultaneous index builds for later constraint checking","Hybrid index builds only record duplicate key conflicts in a side table for later resolution. With simultaneous index builds, both primary and secondary need to record conflicts in case the secondary becomes primary, so it becomes responsible for constraint checking. Today, secondaries also ignore other types of indexing errors to maintain idempotency, and they can guarantee errors will be resolved because the primary cannot send the ""createIndexes"" oplog entry unless they are. With simultaneous indexes, secondaries *cannot* ignore indexing errors and must also record conflicts in a side table because if a secondary becomes primary, it needs to guarantee that all indexing errors are resolved.",8 +"SERVER-39451","02/08/2019 14:20:32","Add recover to a stable timestamp logic for startIndexBuild, abortIndexBuild, commitIndexBuild","Before entering rollback, [we abort all active index builds|https://github.com/mongodb/mongo/blob/6abbac58cc5b5f4b66b50ada20e70fdf96301571/src/mongo/db/repl/bgsync.cpp#L635]. After rolling back, we should add logic to [reconcileCatalogAndIdents()|https://github.com/mongodb/mongo/blob/c046a5896652acea84c9db1d9346a43b2745a37b/src/mongo/db/storage/storage_engine_impl.cpp#L324] to restart all unfinished two-phase builds.",8 +"SERVER-39452","02/08/2019 14:20:57","Add rollback via refetch logic for startIndexBuild, abortIndexBuild, commitIndexBuild","Read the relevant design documents sections for details / edge case handling.",5 +"SERVER-39455","02/08/2019 14:55:54","lint the evergreen.yml file","It can be really easy to introduce subtle bugs in the the evergreen.yml file (See SERVER-38822) as an example. Using a yaml linter could catch some of these issues. We should create an evergreen task to run one and as part of a required builder. This one https://github.com/adrienverge/yamllint seems to work and can be installed as a pip module. We would want to run it in relaxed mode. {code} yamllint -d relaxed etc/evergreen.yml {code}",2 +"SERVER-39458","02/08/2019 16:09:36","Add continuous draining on secondary's index build thread while it awaits a commitIndexBuild oplog entry","During secondary oplog application, the IndexBuildsCoordinator should periodically drain the side tables while awaiting the commitIndexBuild or abortIndexBuild oplog entries from the primary.",8 +"SERVER-39476","02/08/2019 17:33:48","Increase macOS Min Target to 10.12 For MongoDB 4.2","Our current macOS policy is that we will support \{$latest} - \{$latest-2} with each new MongoDB release. So with the upcoming MongoDB 4.2 release, macOS 10.14/Mojave is the latest and 10.12/Sierra is the minimum we will support.",1 +"SERVER-39504","02/11/2019 20:38:15","Have Database use the UUIDCatalog for name lookup","This requires that the UUIDCatalog maintains an extra map.",5 +"SERVER-39505","02/11/2019 20:39:29","Make ViewCatalog a decoration","The ViewCatalog also should own its own DurableViewCatalogImpl",3 +"SERVER-39507","02/11/2019 20:41:27","Remove Database::CollectionMap and use UUIDCatalog instead","Collection objects must now be owned by the UUIDCatalog.",5 +"SERVER-39509","02/11/2019 20:47:09","UUIDCatalog should maintain an ordered map of dbname/UUID pairs","This replaces the current {{_orderedCollections}} map that scales poorly. Also should improve iterating over a database in UUID order.",3 +"SERVER-39512","02/11/2019 20:49:54","Make the Database class thread-safe","After the earlier tickets in the epic, it should now be possible to make the Database class mostly immutable and make all methods thread-safe. This is required to allow for adding, removing and renaming collections without exclusive database locks.",3 +"SERVER-39514","02/11/2019 20:50:36","Remove the KVDatabaseCatalogEntryBase::CollectionMap class","Make the UUIDCatalog own the CollectionCatalogEntry objects instead",8 +"SERVER-39515","02/11/2019 20:52:04","Remove the KVDatabaseCatalogEntry, KVDatabaseCatalogEntryBase and DatabaseCatalogEntry classes","Change the KVStoreEngine::DBMap to be just a set of database names. Move the code from the removed classes into KVCatalog.",5 +"SERVER-39516","02/11/2019 20:54:51","Use database MODE_IX lock for creating collections","Remove the database {{MODE_X}} lock for collection creation. Add tests that verify that collection creation no longer blocks on open transactions accessing different collections in the same database.",8 +"SERVER-39517","02/11/2019 20:55:49","Only use Collection MODE_X for index creation and drop","Add tests that creating and deleting indexes on one collection doesn't block on open transactions on a different collection in the same database.",5 +"SERVER-39518","02/11/2019 20:58:06","Only use collection MODE_X locks for collection rename","Ensure ordering in canonical ResourceId ordering of the collection locks to avoid deadlock. Check that renaming a collection will not block on open transactions on different collections in the same database.",8 +"SERVER-39519","02/11/2019 20:59:21","Only use Collection MODE_X locks for view creation/drop","Test that creating/dropping views does not block on open transactions on collections in the same database.",8 +"SERVER-39520","02/11/2019 21:00:40","Only use collection MODE_X locks for collection drops","Ensure that no places rely on database intent locks to ensure a Collection pointer renames valid. Check that collection drop doesn't block on open transactions involving other collections in the same database.",13 +"SERVER-39565","02/13/2019 05:20:30","Add 'requires_document_locking' tag to read_at_cluster_time_outside_transactions.js test","The {{jstests/replsets/read_at_cluster_time_outside_transactions.js}} test currently fails against the ephemeralForTest storage engine but isn't meant to work against it anyway.",0 +"SERVER-39578","02/14/2019 04:03:37","""check binary version"" function in etc/evergreen.yml depends on PyYAML but doesn't install it","[The {{""check binary version""}} function depends on PyYAML being available|https://github.com/mongodb/mongo/blob/7951290075a7f8ecadebf789503ec05a7b10da3c/etc/evergreen.yml#L450-L470]. The Evergreen command is run [as part of the ""do setup"" function|https://github.com/mongodb/mongo/blob/7951290075a7f8ecadebf789503ec05a7b10da3c/etc/evergreen.yml#L1099] prior to [the ""run tests"" function running the command for the {{""install pip requirements""}} function|https://github.com/mongodb/mongo/blob/7951290075a7f8ecadebf789503ec05a7b10da3c/etc/evergreen.yml#L1392]. This leads to an {{ImportError}}. {noformat} [2019/02/14 01:22:06.552] Running command 'shell.exec' in ""do setup"" (step 1.5 of 2) [2019/02/14 01:22:06.572] Traceback (most recent call last): [2019/02/14 01:22:06.572] File """", line 1, in [2019/02/14 01:22:06.572] ImportError: No module named yaml [2019/02/14 01:22:06.762] python set to /data/mci/933429efe4de9f56d05d29dfd4c70afa/venv/bin/python [2019/02/14 01:22:06.762] The mongo version is 4.1.8-28-g7951290075, expected version is [2019/02/14 01:22:06.762] Command failed: command [pid=1971] encountered problem: exit status 1 [2019/02/14 01:22:06.763] Task completed - FAILURE. {noformat} https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_concurrency_replication_causal_consistency_7951290075a7f8ecadebf789503ec05a7b10da3c_19_02_14_00_07_46",1 +"SERVER-39579","02/14/2019 06:07:30","""compile mongodb"" function in sys-perf doesn't install Python dependencies","{noformat} [2019/02/14 05:05:01.706] # This script converts the generated version string into a sanitized version string for [2019/02/14 05:05:01.706] # use by scons and uploading artifacts as well as information about for the scons cache. [2019/02/14 05:05:01.706] MONGO_VERSION=$MONGO_VERSION USE_SCONS_CACHE=true /opt/mongodbtoolchain/v3/bin/python2 buildscripts/generate_compile_expansions.py --out compile_expansions.yml [2019/02/14 05:05:06.295] Traceback (most recent call last): [2019/02/14 05:05:06.295] File ""buildscripts/generate_compile_expansions.py"", line 16, in [2019/02/14 05:05:06.298] import yaml ImportError: No module named yaml [2019/02/14 05:05:06.298] Command failed: command [pid=2443] encountered problem: exit status 1 {noformat} https://evergreen.mongodb.com/task/sys_perf_compile_linux_64_amzn_compile_83336cb56b269195110253918d226cbba4377a03_19_02_14_04_10_32",1 +"SERVER-39584","02/14/2019 16:36:38","compile task in performance Evergreen project doesn't install Python dependencies","{noformat} [2019/02/14 05:49:10.131] Running command 'shell.exec' (step 3 of 7) [2019/02/14 05:49:10.137] # We get the raw version string (r1.2.3-45-gabcdef) from git [2019/02/14 05:49:10.137] MONGO_VERSION=$(git describe) [2019/02/14 05:49:10.137] git describe) [2019/02/14 05:49:10.149] git describe [2019/02/14 05:49:10.149] # If this is a patch build, we add the patch version id to the version string so we know [2019/02/14 05:49:10.149] # this build was a patch, and which evergreen task it came from [2019/02/14 05:49:10.149] if [ """" = ""true"" ]; then [2019/02/14 05:49:10.149] MONGO_VERSION=""$MONGO_VERSION-patch-performance_6089c4c1d8f166b6b61cec980672779b7cedc303"" [2019/02/14 05:49:10.149] fi [2019/02/14 05:49:10.149] # This script converts the generated version string into a sanitized version string for [2019/02/14 05:49:10.149] # use by scons and uploading artifacts as well as information about for the scons cache. [2019/02/14 05:49:10.149] MONGO_VERSION=$MONGO_VERSION USE_SCONS_CACHE=true /opt/mongodbtoolchain/v3/bin/python2 buildscripts/generate_compile_expansions.py --out compile_expansions.yml [2019/02/14 05:49:10.171] Traceback (most recent call last): [2019/02/14 05:49:10.171] File ""buildscripts/generate_compile_expansions.py"", line 16, in [2019/02/14 05:49:10.173] import yaml ImportError: No module named yaml [2019/02/14 05:49:10.173] Command failed: command [pid=3003] encountered problem: exit status 1 [2019/02/14 05:49:10.175] Task completed - FAILURE. {noformat} https://evergreen.mongodb.com/task/performance_linux_wt_standalone_compile_6089c4c1d8f166b6b61cec980672779b7cedc303_19_02_14_04_49_50",1 +"SERVER-39654","02/19/2019 04:31:27","Storage statistics not logged for a slow transaction","Slow operations extract the operation statistics from the storage engine and report them as part of logging and profiling. For a slow transaction, these storage statistics are meant to be cumulative over the operations performed in that transaction. At the moment, we do not get any storage information for the slow transaction. It looks like we are not collecting the storage stats at the correct place, we might be collecting them too late, past the point where other metrics for a transaction get accumulated. ",8 +"SERVER-39655","02/19/2019 05:06:59","Statistics retrieval from WiredTiger uses wrong type","The {{WiredTigerOperationStats::fetchStats}} method uses a {{uint32_t}} when retrieving statistics from WiredTiger, but the actual statistics are signed values, we should switch to an {{int32_t}}. This is follow on from SERVER-39026, and the [code change|https://github.com/mongodb/mongo/commit/6a9a5855048df1f4796a4032276d01318c398691] should be similar.",1 +"SERVER-39705","02/21/2019 02:44:36","IndexBuildInterceptor does not faithfully preserve multikey when a document generates no keys","IndexBuildInterceptor makes an incorrect assumption that a document [must generate keys|https://github.com/mongodb/mongo/blob/04882fa7f5210cfb14918ecddbbc5acbd88e86b6/src/mongo/db/index/index_build_interceptor.cpp#L384-L386] to be [considered multikey|https://github.com/mongodb/mongo/blob/04882fa7f5210cfb14918ecddbbc5acbd88e86b6/src/mongo/db/index/index_build_interceptor.cpp#L389-L397]. In particular, [sparse compound indexes|https://docs.mongodb.com/manual/core/index-sparse/#sparse-compound-indexes] may not generate keys, but will consider a document to be multikey\[1\]. MongoDB's validation code is strict and will compare an [index's multikey to the multikey output of every document|https://github.com/mongodb/mongo/blob/04882fa7f5210cfb14918ecddbbc5acbd88e86b6/src/mongo/db/catalog/private/record_store_validate_adaptor.cpp#L110-L117]. \[1\] Consider the index {{\{a: 1, b: ""2dsphere""\}}} ({{2dsphere}} makes an index ""auto-sparse""). Consider the document {{\{_id: 1, a: [1,2]\}}}. Because {{b}} is omitted, the sparse-ness will result in no index keys being generated. However, because {{a}} is an array, that field of the compound index will be considered to be multikey.",8 +"SERVER-39723","02/21/2019 17:50:26","Change listIndexes command behavior to show in-progress index builds","The change in SERVER-25175 intended to fix a problem with initial sync as it interacts with its sync source to procure a list of indexes for each collection to build. If a background index build is in progress on the sync source and the sync source is a primary node, the background index build may or may not complete successfully. (Some examples of how it could fail are: 1. the node steps down 2. the background index build is interrupted by a killOp command 3. the background index build discovers some collection data that violates a constraint on the index being built, if the index spec has constraints.) The initial syncing node will proceed to attempt to build this same index, which might complete successfully even if the index build on the primary does not. This could leave the initial syncing node with an index that no other node has. The initial sync process uses the listIndexes command on the sync source to obtain lists of indexes per collection. The attempt to fix this problem changed the listIndexes command to not report in-progress index builds (this will only affect background index builds, as foreground index builds lock the collection for their entirety and thus can never be observed in-progress by the listIndexes command). The fix had the intended effect for initial sync sync sources that are in primary state, but it also unintentionally changed the behavior of initial sync sync sources that are in secondary state. For such sources, background index builds are destined to complete successfully (since they already have done so on a primary node), and therefore cannot fail. Such index builds also write their oplog entries prior to completion, and therefore the only indication that an initial syncing node has to build the index is the listIndexes command response. Hiding in-progress index builds from initial sync for sync sources that are in secondary state could result in missing indexes for the initial sync. This code change restores the old behavior where listIndexes shows in-progress index builds.",1 +"SERVER-39883","02/28/2019 07:05:04","Powercycle doesn't actually wait for the mongod process to exit during shutdown_mongod","The ""shutdown_mongod"" action runs the {{\{shutdown: 1, force: true\}}} command and then (on Linux) waits for {{psutil}} to say no processes with the name ""mongod"" exist. [The {{wait_for_mongod_shutdown()}} function then sleeps an arbitrary extra 5 seconds|https://github.com/mongodb/mongo/blob/8b4f0a7893a329b0c0370385180d6a13077a8f22/pytests/powertest.py#L1481-L1483] in order to wait for any pending I/O to finish. It possible for 5 seconds to not be long enough where a file will disappear when running {{rsync}} or the mongod process will fail to start.",3 +"SERVER-39899","03/01/2019 06:21:52","Enable the initial sync fuzzer in Evergreen","Create a new {{initial_sync_fuzzer.yml}} resmoke.py YAML suite file that causes mongod processes to use an effectively infinite number of initial sync attempts. {code:yaml|title=buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml} test_kind: js_test selector: roots: - jstestfuzz/out/*.js executor: archive: tests: true config: shell_options: nodb: '' readMode: commands global_vars: TestData: # TODO: logComponentVerbosity? setParameters: numInitialSyncAttempts: 10000000 {code} Define a new {{initial_sync_fuzzer_gen}} Evergreen task based on [the existing {{rollback_fuzzer_gen}} Evergreen task|https://github.com/mongodb/mongo/blob/e33301994172d80c0f4e62bd3b01fa41f35561ec/etc/evergreen.yml#L4922-L4933]. {code:yaml} ## initial sync fuzzer ## - <<: *jstestfuzz_template name: initial_sync_fuzzer_gen commands: - func: ""generate fuzzer tasks"" vars: <<: *jstestfuzz_config_vars # TODO: The number of files should be based on how the tests themselves take to run. We # should target a time for each generated task of ~10 minutes. num_files: ?? num_tasks: 5 npm_command: initsync-fuzzer resmoke_args: --suites=initial_sync_fuzzer name: initial_sync_fuzzer {code} Configure the new {{initial_sync_fuzzer_gen}} Evergreen task to run on all of the build variants the existing {{rollback_fuzzer_gen}} Evergreen task runs on with the exception of the ""Enterprise RHEL 6.2 (inMemory)"" and ""Linux (ephemeralForTest)"" build variants. Since the initial version of the initial sync fuzzer is meant to only be targeting the interaction between initial sync and prepared transactions, we can only run it against the WiredTiger storage engine. - Enterprise RHEL 6.2 - Enterprise RHEL 6.2 (majority read concern off) - Windows 2008R2 DEBUG - macOS - Enterprise RHEL 6.2 DEBUG Code Coverage - ASAN Enterprise SSL Ubuntu 16.04 DEBUG - UBSAN Enterprise Ubuntu 16.04 DEBUG",3 +"SERVER-39929","03/01/2019 21:08:02","Drivers-nightly latest fails to compile on Windows","{code} [2019/02/28 07:44:21.460] python ./buildscripts/scons.py --ssl MONGO_DISTMOD=windows-64 --release CPPPATH=""c:/openssl/include c:/sasl/include c:/snmp/include c:/curl/include"" LIBPATH=""c:/openssl/lib c:/sasl/lib c:/snmp/lib c:/curl/lib"" -j$(( $(grep -c ^processor /proc/cpuinfo) / 4 )) --dynamic-windows --win-version-min=ws08r2 VARIANT_DIR=win32 --cache=nolinked --cache-dir='z:\data\scons-cache\36b32a55-bc89-55f9-a1e2-24b4148e7f52' core unittests MONGO_VERSION=4.1.8-227-g6403ca518b [2019/02/28 07:44:23.740] scons: Reading SConscript files ... [2019/02/28 07:44:23.740] Mkdir(""build\scons"") [2019/02/28 07:44:23.740] scons version: 3.0.4 [2019/02/28 07:44:29.800] python version: 2 7 15 'final' 0 [2019/02/28 07:44:29.800] Checking whether the C compiler works... yes [2019/02/28 07:44:31.591] Checking whether the C++ compiler works... yes [2019/02/28 07:44:44.505] Checking that the C++ compiler can link a C++ program... yes [2019/02/28 07:44:44.566] Checking if C++ compiler ""$CC"" is MSVC... yes [2019/02/28 07:44:44.657] Checking if C compiler ""cl"" is MSVC... yes [2019/02/28 07:44:44.657] Detected a x86_64 processor [2019/02/28 07:44:44.735] Checking if target OS windows is supported by the toolchain... yes [2019/02/28 07:44:44.738] adding module: enterprise [2019/02/28 07:44:44.834] Checking if C compiler is Microsoft Visual Studio 2017 15.9 or newer...no [2019/02/28 07:44:45.109] Checking if C++ compiler is Microsoft Visual Studio 2017 15.9 or newer...no [2019/02/28 07:44:45.109] ERROR: Refusing to build with compiler that does not meet requirements [2019/02/28 07:44:45.109] See C:\data\mci\dbe494bf13f1f7c042e6932354168dde\src\build\scons\config.log for details {code} See https://evergreen.mongodb.com/version/drivers_nightly_6403ca518b832a49d66352620a23606348595fac. ",3 +"SERVER-39934","03/01/2019 22:08:08","CurOp::completeAndLogOperation should not hang waiting for global lock","When logging a command (either slow or when forced logging is enabled) CurOp::completeAndLogOperation attempts to take a global lock to obtain storage statistics. If something else has a Global X lock (or an enqueued Global X lock), this lock acquisition will stall behind that operation. This introduces an undesirable dependency on the global lock for otherwise lock-free operations such as $currentOp. We should give this acquisition a very short deadline and elide the storage stats when it is not available.",8 +"SERVER-39948","03/04/2019 20:56:18","Remove some simultaneous index build related fields from createIndexes cmd logging","{code} [ReplicaSetFixture:job0:primary] 2019-03-04T15:36:46.969+0000 I COMMAND [conn1796] command test8_fsmdb7.fsmcoll7 appName: ""tid:48"" command: createIndexes { buildUUID: UUID(""e3ece2b5-e401-4530-87f0-abb8448c0dee""), buildingPhaseComplete: true, runTwoPhaseIndexBuild: false, commitReadyMembers: [ ""localhost:20000"", ""localhost:20001"", ""localhost:20002"" ], createIndexes: ""fsmcoll7"", indexes: [ { key: { b: 1.0 }, name: ""b_1"" } ], lsid: { id: UUID(""3743b6a0-b87d-44f6-8580-14c705d209a0"") }, $clusterTime: { clusterTime: Timestamp(1551713805, 57), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: ""test8_fsmdb7"" } numYields:3 reslen:239 locks:{ Global: { acquireCount: { r: 7, w: 11 } }, Database: { acquireCount: { r: 1, w: 6, W: 2 }, acquireWaitCount: { w: 2, W: 2 }, timeAcquiringMicros: { w: 46424, W: 60687 } }, Collection: { acquireCount: { r: 1, w: 5, R: 1 } }, Mutex: { acquireCount: { r: 4 } } } storage:{} protocol:op_msg 444ms {code} I think [this|https://github.com/mongodb/mongo/blob/56efcffbcba956aa24518c71d100ecffee965058/src/mongo/db/catalog/multi_index_block.cpp#L804-L824] is where the logging is coming from, but should double check.",2 +"SERVER-39952","03/04/2019 22:03:33","Switch master to use newer amazon linux 1 distro","After newer Amazon Linux 1 distro has been added to Evergreen, we need to switch master branch to use that instead of the old distro.",2 +"SERVER-39957","03/05/2019 00:41:47","Two phase drop by rename should delay the second phase until drop optime is both checkpointed and majority committed","Currently, the old two-phase drop (by rename) executes the second phase (drop of the WT table file) when majority commit point moves past drop optime. If majority commit point is ahead of checkpoint and a crash happens after the second phase drop, on restart, the server will find the metadata of the collection still in the catalog (because it loads last checkpoint) but the actual WT file gets dropped. On restart, the server can detect that this is from an unclean shutdown by examining the *mongod.lock* file. Then it can safely remove the metadata of those collections which do not have WT table files. However, instead of crashing after the second phase drop, opening up backup cursor would cause similar issue which is harder to solve: there is also an inconsistency between WT table files and the catalog. But since we don't copy *mongod.lock* during backup, then the server does not trigger the code which reconciles the catalog. Then it tries to open a WT file which does not exist and hit [this fassert|https://github.com/mongodb/mongo/blob/6f3c3df4fc0abda76fd97e970ced4a01f0c48007/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp#L666]. To fix this problem, we should delay the second phase until drop optime is checkpointed.",3 +"SERVER-39982","03/06/2019 14:32:59","modify StorageTimestampTests::InitialSyncSetIndexMultikeyOnInsert to build indexes in background","In recent weeks, the test StorageTimestampTests::InitialSyncSetIndexMultikeyOnInsert has been failing intermittently with the following invariant message: {noformat} [repl-index-builder-1] Invariant failure !mySnapshot || *mySnapshot <= commitTimestamp commit timestamp Timestamp(1, 87) cannot be older than current read timestamp Timestamp(1, 92) src/mongo/db/catalog/index_timestamp_helper.cpp 88 {noformat} A workaround is proposed in SERVER-39981. However, we should investigate why this test case does not pass consistently with background/hybrid index builds and determine if there is an underlying issue in the index build machinery.",5 +"SERVER-39988","03/06/2019 19:12:41","Remove integration_tests from the compile phase and move execution to a new on-box ! phase","We currently build the integration tests as part of the {{compile}}, then pull them into the {{artifacts.tgz}} and then run them on remote machines. Instead, we should treat them like {{dbtest}} and the unit tests and compile and run them in their own on-box {{integration_tests!}} phase which has {{compile_integration_tests}} and {{run_integration_tests}} sub-phases. ",2 +"SERVER-40034","03/07/2019 23:49:44","Set setup_group_can_fail_task to true for compile-related task groups","The Evergreen team introduced the {{setup_group_can_fail_task}} option in EVG-5759 to make it possible for commands which fail in the {{setup_group}} list to cause the task to fail. This is desirable because it is otherwise possible for a transient network error to occur when cloning the enterprise module and for the {{compile}} task to successfully build the server without the enterprise module. {noformat} [2019/01/14 00:08:14.587] + cd src [2019/01/14 00:08:14.587] + git reset --hard 58d80a26224da882cbe30d301ed295c302515c9b [2019/01/14 00:08:14.587] + set -o errexit [2019/01/14 00:08:14.587] + git clone git@github.com:10gen/mongo-enterprise-modules.git src/mongo/db/modules/enterprise [2019/01/14 00:08:44.713] Cloning into 'src/mongo/db/modules/enterprise'... [2019/01/14 00:08:44.713] ssh: Could not resolve hostname github.com: nodename nor servname provided, or not known [2019/01/14 00:08:44.714] fatal: Could not read from remote repository. [2019/01/14 00:08:44.714] Please make sure you have the correct access rights [2019/01/14 00:08:44.719] and the repository exists. [2019/01/14 00:08:44.719] HEAD is now at 58d80a2 SERVER-37775 Add Community RHEL7 s390x [2019/01/14 00:08:44.719] Command failed: problem with git command: exit status 128 [2019/01/14 00:08:44.719] Running command 'shell.exec' in ""get modified patch files"" (step 8 of 14) ... {noformat}",1 +"SERVER-40063","03/11/2019 04:43:23","jstestfuzz_sharded_continuous_stepdown.yml is running with a 1-node CSRS on the 3.6 branch","The changes from [2394d07|https://github.com/mongodb/mongo/commit/2394d07abe45037f44e0cdff7a56abb92e86f0a6] as part of backporting SERVER-30979 to the 3.6 branch didn't include the additional change made in [0aeb5ce|https://github.com/mongodb/mongo/commit/0aeb5ce7e8d4a190dac43fd110533eef149f7880#diff-15a75fc99f070098f4435d551de52f44] as part of SERVER-32468 to set {{num_nodes=3}} for the CSRS.",1 +"SERVER-40180","03/17/2019 00:24:59","resmoke.py should escape null bytes in the output of subprocesses","mongod is very happy to write null bytes to its stdout when they come from user input. Take setting {{logComponentVerbosity=\{write: 1\}}} and including a null byte in an update command as one example. mongod writing a null byte to its stdout causes resmoke.py to write a null bytes to its stdout. ([resmoke.py currently only attempts to deal with how the server doesn't necessarily write valid UTF-8 to its logs|https://github.com/mongodb/mongo/blob/r4.1.9/buildscripts/resmokelib/core/pipe.py#L55-L59], see SERVER-7506). This has an unfortunate consequence with tools like {{grep}} which treat output containing a null byte as binary rather than text. In order to make it so {{grep \-\-text}} doesn't need to be specified when engineers are filtering out log messages from the server that pass through resmoke.py, we should have resmoke.py escape {{b""\0""}} as {noformat}b""\\0""{noformat}.",1 +"SERVER-40241","03/20/2019 17:44:38","Have resmoke.py log an invocation for local usage","The changes from SERVER-28785 made it so resmoke.py writes its own command line arguments to stdout. This enables Server engineers to avoid doing mental bash evaluation to determine what command line arguments the ""run tests"" will synthesize and pass to resmoke.py. The command line arguments for [this aggregation task|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_aggregation_6f083bd87264e9d9c3d637fae62103c36a65316a_19_03_11_19_56_34] include a large number of details and metadata that are specific to how we run tests in Evergreen. {noformat} [2019/03/11 20:41:06.323] [resmoke] 2019-03-11T20:41:06.322+0000 resmoke.py invocation: buildscripts/evergreen_run_tests.py --suites=aggregation --storageEngine=wiredTiger --jobs=4 --shuffle --continueOnFailure --storageEngineCacheSizeGB=1 --tagFile=etc/test_retrial.yml --log=buildlogger --staggerJobs=on --buildId=mongodb_mongo_master_enterprise_rhel_62_64_bit_6f083bd87264e9d9c3d637fae62103c36a65316a_19_03_11_19_56_34 --distroId=rhel62-small --executionNumber=0 --projectName=mongodb-mongo-master --gitRevision=6f083bd87264e9d9c3d637fae62103c36a65316a --revisionOrderId=24937 --taskId=mongodb_mongo_master_enterprise_rhel_62_64_bit_aggregation_6f083bd87264e9d9c3d637fae62103c36a65316a_19_03_11_19_56_34 --taskName=aggregation --variantName=enterprise-rhel-62-64-bit --versionId=mongodb_mongo_master_6f083bd87264e9d9c3d637fae62103c36a65316a --archiveFile=archive.json --reportFile=report.json --perfReportFile=perf.json {noformat} A more compact form for an engineer to run would look like: {noformat} buildscripts/resmoke.py --suites=aggregation --storageEngine=wiredTiger --jobs=4 --shuffle --continueOnFailure --storageEngineCacheSizeGB=1 {noformat} The changes from this ticket should add a new log message after [this line|https://github.com/mongodb/mongo/blob/r4.1.9/buildscripts/resmoke.py#L147] containing the simplified resmoke.py invocation. {code:python} self._resmoke_logger.info(""verbatim resmoke.py invocation: %s"", "" "".join(sys.argv)) if config.EVERGREEN_TASK_ID: args = ... self._resmoke_logger.info(""resmoke.py invocation for local usage: %s"", "" "".join(args)) {code} It must therefore do the following: * Always log the program name as \{{buildscripts/resmoke.py}} even though in Evergreen we run the wrapper script \{{buildscripts/evergreen_run_tests.py}}. * Always log the non-generated version of the test suite name. The \{{buildscripts/evergreen_generate_resmoke_tasks.py}} script generates new resmoke.py YAML suite files in order to be able to dynamically split the test suite into multiple Evergreen task which may run concurrently. Handling this behavior can be achieved by propagating \{{self.config_options.suite}} as a new \{{\-\-originSuite}} command line option to resmoke.py through [the \{{_generate_resmoke_args()}} function|https://github.com/mongodb/mongo/blob/r4.1.9/buildscripts/evergreen_generate_resmoke_tasks.py#L289-L294]. ** The sub-suite definitions are uploaded to Evergreen in the \{{*_gen}} task as ""Generated Task Config"" but we don't really want engineers to have to worry about using them. * Always remove the command line options not seen in the compact form above. The implementation should explicitly list the command line options to remove so that new ones still appear by default. It might be possible to be a little clever about trying to remove all the options from [the \{{evergreen_options}} group|https://github.com/mongodb/mongo/blob/r4.1.9/buildscripts/resmokelib/parser.py#L255] so that new ones added to that section never appear.",3 +"SERVER-40245","03/20/2019 19:37:34","Use attach.artifacts to link to GitHub wiki from every task page","[The {{attach.artifacts}} command|https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#attach-artifacts] takes a JSON file containing an array of links to include in the ""Files"" section of the task page. A link to https://github.com/mongodb/mongo/wiki/Running-Tests-from-Evergreen-Tasks-Locally should be added for any Evergreen task that calls the ""run tests"" function. {noformat} [ { ""name"": ""Running Tests from Evergreen Tasks Locally"", ""link"": ""https://github.com/mongodb/mongo/wiki/Running-Tests-from-Evergreen-Tasks-Locally"", ""visibility"": ""public"" } ] {noformat}",1 +"SERVER-40339","03/26/2019 16:44:58","Resmoke doesn't always show output from failing python unittests","See: https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_enterprise_rhel_62_64_bit_buildscripts_test_patch_e35e8076dbddc863205cf24517e1b16dc9104d07_5c9a4bcad1fe075bd51ca0ad_19_03_26_15_57_00/0?type=T The test_burn_in_tests.py failed with -2 but no output is shown.",3 +"SERVER-40415","04/01/2019 17:01:21","Tempfile cleanup from test_adb_monitor.py","Tempfiles are not cleaned up in test_adb_monitor.py The code which creates a named temp file, should also specify a temp_dir: {code} arg_test_file = tempfile.NamedTemporaryFile(delete=False).name {code} Should be: {code} arg_test_file = tempfile.NamedTemporaryFile(dir=self.temp_dir, delete=False).name {code}",1 +"SERVER-40418","04/01/2019 17:53:29","Refactor test_adb_monitor to not use files when testing","We should mock out all the file creation, and subprocess invocations, when testing adb_monitor.",3 +"SERVER-40421","04/01/2019 18:50:15","Add failpoint to skip doing retries on WiredTiger prepare conflicts","An operation within WiredTiger that attempts to get or set a value which has been prepared by another transaction may have a {{WT_PREPARE_CONFLICT}} error returned. (Note that until SERVER-40176 is addressed, this also applies to operations which may scan over such data.) The MongoDB layer then enqueues these operations to be retried after a prepared transaction has committed or aborted. In order to allow the rollback fuzzer to generate randomized insert, update, and delete operations that may prepare conflcits without hanging, it would be useful to add a failpoint to [the {{wiredTigerPrepareConflictRetry()}} function|https://github.com/mongodb/mongo/blob/a3c7bdb31e949cfd11c2c9e24f9a04dfd6c22ba1/src/mongo/db/storage/wiredtiger/wiredtiger_prepare_conflict.h#L56] where it doesn't do any retry logic and instead has the command fail with a {{WriteConflict}} error response.",1 +"SERVER-40468","04/04/2019 05:28:15","Allow RollbackTest fixture to skip collection validation when restarting node","The validate command requires a collection X lock, which cannot be acquired if there are outstanding transactions running on the server. The rollback fuzzer may attempt to call {{RollbackTest#restartNode()}} after having started transactions on the server. [The {{RollbackTest#restartNode()}} method|https://github.com/mongodb/mongo/blob/4459b439700f096a7b6287fdddde592db8934fe2/jstests/replsets/libs/rollback_test.js#L391] should therefore take an optional {{skipValidation}} boolean and pass it through to [the {{rst.stop()}} call|https://github.com/mongodb/mongo/blob/4459b439700f096a7b6287fdddde592db8934fe2/jstests/replsets/libs/rollback_test.js#L422]. The {{RollbackTest#transitionToSteadyStateOperations()}} method should also be updated to take an optional {{skipDataConsistencyChecks}} boolean (wrapped in an object so it's effectively a named parameter) that acts as a syntactic alternative to the usage of {{expectPreparedTxnsDuringRollback}} as a way to avoid calling {{checkDataConsistency()}}.",1 +"SERVER-40469","04/04/2019 05:29:24","Remove the expectPreparedTxnsDuringRollback parameter to the RollbackTest constructor","This is follow-up work around SERVER-40468 after the corresponding changes are made to the rollback fuzzer. The {{expectPreparedTxnsDuringRollback}} parameter also isn't aptly named because any outstanding transactions would prevent the ability to run the data consistency checks, not just prepared transactions.",1 +"SERVER-40470","04/04/2019 07:49:02","Use roundup_timestamp API instead of round_to_oldest","WT-4640 is deprecating the {{round_to_oldest}} API in favour of {{roundup_timestamp=(read=true)}}. Server needs to use the new API before {{round_to_oldest}} is removed from WiredTiger. This ticket is to address this change.",3 +"SERVER-40486","04/04/2019 21:58:33","Remove Test Lifecycle code","We are no longer using the Test Lifecycle code. We should go through and remove the code executing it. This would include the references in etc/evergreen.yml, buildscripts/fetch_test_lifecycle.py, buildscripts/update_test_lifecycle.py, and any other related files. ---- As a server engineer, I want the test lifecycle code removed. so that I don't have to spend time maintaining it. ---- AC * etc/evergreen should not run any test_lifecycle related tasks. * test lifecycle scripts (and tests for them) should be removed from mongo repository.",2 +"SERVER-40518","04/07/2019 23:31:25","backup_restore*.js tests send SIGTERM to resmoke.py and may leak mongo shell running FSM workloads","Sending a SIGTERM to resmoke.py causes the Python process to immediately exit (it doesn't register a handler for that signal like the server does) without waiting for any of the mongo shell processes it spawned to also exit (though they also receive the SIGTERM signal). This means if resmoke.py was in the midst of spawning a mongo shell process to run an FSM workload, then it could continue to run (i.e. be leaked) even after the parent resmoke.py process exits.",1 +"SERVER-40532","04/08/2019 19:02:21","""check binary version"" doesn't pip install evgtest-requirements.txt when bypass_compile is true","We're seeing patch builds failing in Evergreen due to missing python modules when bypass_compile is not set to false. [Check binary version|https://github.com/mongodb/mongo/blob/73e719a1ee1c174a3131e19b537b3ae8aa958dad/etc/evergreen.yml#L498-L526] is what should be pip installing the requirements but it exits early on patch builds.",1 +"SERVER-40550","04/09/2019 19:20:12","Refactor job.py to support mock of time.time","The tests for {{buildscripts/resmokelib/testing/job.py}} mock out {{time.time}}. However this is faulty, as {{time.time}} can be called from other logging calls, while the test is active. A cleaner solution would be to move {{time.time}} into a helper function which is then mocked: {code} @staticmethod def _get_time(): return time.time() {code} Test code in {{buildscripts/tests/resmokelib/testing/test_job.py}} {code} mock_time = MockTime(increment) job_object = UnitJob(suite_options) self.queue_tests(self.TESTS, queue, queue_element.QueueElemRepeatTime, suite_options) job_object._get_time = mock_time.time job_object._run(queue, self.mock_interrupt_flag()) {code}",2 +"SERVER-40589","04/11/2019 19:33:27","find command should validate $_internalReadAtClusterTime is not null","The 'find' command will hit an invariant failure [here|https://github.com/mongodb/mongo/blob/a13c018b51465b04027adee28fd79fd82ed4575b/src/mongo/db/commands/find_cmd.cpp#L330] when a command sends a null timestamp, Timestamp(0,0).",1 +"SERVER-40592","04/11/2019 19:57:03","Uncaught exception in resmoke.py job thread due to logkeeper unavailability when tearing down fixture","[The call to {{teardown_fixture()}}|https://github.com/mongodb/mongo/blob/a13c018b51465b04027adee28fd79fd82ed4575b/buildscripts/resmokelib/testing/job.py#L102] should happen in a try/except block. * If there's a {{buildscripts.resmokelib.errors.LoggerRuntimeConfigError}} exception, then we should record the message with {{self.logger.error(""%s"", err)}} to the task logs. * If there's some other kind of exception, then we should record the message and Python stacktrace with {{self.logger.exception(""Encountered an error when tearing down %s: %s"", self.fixture, err)}} to the task logs. We should additionally set the {{teardown_flag}}. {noformat} [2019/04/08 21:07:45.443] Exception in thread Thread-1: [2019/04/08 21:07:45.443] Traceback (most recent call last): [2019/04/08 21:07:45.443] File ""/opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/python3-v3.lJe/lib/python3.7/threading.py"", line 917, in _bootstrap_inner [2019/04/08 21:07:45.443] self.run() [2019/04/08 21:07:45.443] File ""/opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/python3-v3.lJe/lib/python3.7/threading.py"", line 865, in run [2019/04/08 21:07:45.443] self._target(*self._args, **self._kwargs) [2019/04/08 21:07:45.443] File ""/data/mci/6e26eb78c635416bfda724f51d2fa812/src/buildscripts/resmokelib/testing/job.py"", line 102, in __call__ [2019/04/08 21:07:45.443] if teardown_flag is not None and not self.teardown_fixture(): [2019/04/08 21:07:45.443] File ""/data/mci/6e26eb78c635416bfda724f51d2fa812/src/buildscripts/resmokelib/testing/job.py"", line 64, in teardown_fixture [2019/04/08 21:07:45.443] test_case(self.report) [2019/04/08 21:07:45.443] File ""/opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/python3-v3.lJe/lib/python3.7/unittest/case.py"", line 663, in __call__ [2019/04/08 21:07:45.443] return self.run(*args, **kwds) [2019/04/08 21:07:45.443] File ""/opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/python3-v3.lJe/lib/python3.7/unittest/case.py"", line 588, in run [2019/04/08 21:07:45.443] result.startTest(self) [2019/04/08 21:07:45.443] File ""/data/mci/6e26eb78c635416bfda724f51d2fa812/src/buildscripts/resmokelib/testing/report.py"", line 112, in startTest [2019/04/08 21:07:45.443] test.logger) [2019/04/08 21:07:45.443] File ""/data/mci/6e26eb78c635416bfda724f51d2fa812/src/buildscripts/resmokelib/logging/loggers.py"", line 205, in new_test_logger [2019/04/08 21:07:45.443] "" test_id"".format(test_basename)) [2019/04/08 21:07:45.443] buildscripts.resmokelib.errors.LoggerRuntimeConfigError: Encountered an error configuring buildlogger for test job0_fixture_teardown: Failed to get a new test_id {noformat}",2 +"SERVER-40663","04/16/2019 14:57:39","Reduce Frequency of Sys Perf WT_Develop variants until adoption can occur","In a recent performance meeting we discussed that there was limited value for the storage engines team to have the wt_develop variants running. We believe there may be value in: - A new adoption push in which we train the team on Perf Discovery - Value for the perf build baron Until we flesh that out, we want to reduce the spend by setting the frequency of these tests to run on a weekly (instead of daily) basis. Acceptance Criteria: - Update Sys Perf Master Evergreen Project to change all wt_develop variants to run weekly - Send an email to perf interest and downstream changes (if applicable)",1 +"SERVER-40690","04/17/2019 16:02:28","Create more lightweight Enterprise Windows build variant that's required for patch builds","The Enterprise Windows 2008R2 build variant runs a number of Evergreen tasks that appear (at least empirically) to be redundant with the list of tasks that run on some of the required Linux build variants. We should create a new variant with a task list similar to the following {code:yaml} - name: enterprise-windows-64-2k8-required display_name: ""! Enterprise Windows 2008R2"" modules: - enterprise run_on: - windows-64-vs2017-test expansions: ... # Use an anchor/alias to avoid duplicating these display_tasks: - *dbtest tasks: - name: compile_TG requires: - name: burn_in_tests_gen - name: verify_pip distros: - windows-64-vs2017-compile - name: burn_in_tests_gen - name: verify_pip - name: buildscripts_test - name: dbtest_TG distros: - windows-64-vs2017-compile - name: noPassthrough_gen {code} and update the {{display_name}} of the {{enterprise\-windows\-64\-2k8}} build variant to be ""* Enterprise Windows 2008R2"" in order to reflect it has a {{batchtime}} of 1 hour but isn't a required build variant. The definition of the {{required}} alias in the Evergreen database must also be updated to include {{enterprise\-windows\-64\-2k8\-required}} rather than {{enterprise\-windows\-64-2k8}}. ",2 +"SERVER-40702","04/17/2019 21:58:52","resmoke.py should wait for subprocesses it spawned to exit on KeyboardInterrupt","[resmoke.py doesn't wait for the job threads running tests to exit when they are interrupted by the user|https://github.com/mongodb/mongo/blob/b6d336bee9c7adb334333bcb22c432d376458af3/buildscripts/resmokelib/testing/executor.py#L145-L185]. It instead relies on the SIGINT being received by all the processes in the process group to exit on their own quickly. While this may reduce the likelihood a user would interrupt resmoke.py multiple times due to it taking longer to exit, it also means that processes spawned by resmoke.py may outlive the resmoke.py Python process. This behavior has caused failures in the {{backup_restore*.js}} tests which spawns its own resmoke.py subprocess in order to run FSM workloads against a {{ReplSetTest}} instance. We should call {{thr.join()}} even after a {{KeyboardInterrupt}} exception occurs. However, it would be convenient for users if we also logged a message (say after 2 seconds of waiting for the thread) that they can use ctrl-\ to send a SIGQUIT to all of the processes to get them to exit on Linux or ctrl-c again to get them to exit on Windows as the Job object has {{JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE}} set. Sending a SIGQUIT is an easy way to ensure resmoke.py exits even if the mongod process is hung. {code:python} def _run_tests(self, test_queue, setup_flag, teardown_flag): """"""Start a thread for each Job instance and block until all of the tests are run. Returns a (combined report, user interrupted) pair, where the report contains the status and timing information of tests run by all of the threads. """""" threads = [] interrupt_flag = threading.Event() user_interrupted = False try: # Run each Job instance in its own thread. for job in self._jobs: thr = threading.Thread( target=job, args=(test_queue, interrupt_flag), kwargs=dict( setup_flag=setup_flag, teardown_flag=teardown_flag)) # Do not wait for tests to finish executing if interrupted by the user. thr.daemon = True thr.start() threads.append(thr) # SERVER-24729 Need to stagger when jobs start to reduce I/O load if there # are many of them. Both the 5 and the 10 are arbitrary. # Currently only enabled on Evergreen. if _config.STAGGER_JOBS and len(threads) >= 5: time.sleep(10) joined = False while not joined: # Need to pass a timeout to join() so that KeyboardInterrupt exceptions # are propagated. joined = test_queue.join(TestSuiteExecutor._TIMEOUT) except (KeyboardInterrupt, SystemExit): interrupt_flag.set() user_interrupted = True else: # Only wait for all the Job instances if not interrupted by the user. self.logger.debug(""Waiting for threads to complete"") for thr in threads: thr.join() self.logger.debug(""Threads are completed!"") {code}",1 +"SERVER-40749","04/19/2019 21:43:21","Include execution in generated task configuration file name","The execution number should be included in the generated.task configuration file name. Otherwise, later execution can overwrite the file, but since evergreen will no-op on reruns of generate.task, we don't actually use the new configuration. By including the execution number, we will still have the original configuration for subsequent executions of the sub-tasks.",1 +"SERVER-40758","04/22/2019 16:02:20","Increase the amount of memory available for logical_session_cache_replication* tasks","There have been several instances where the OOM killer has killed a mongod process when running one of {{logical_session_cache_replication*}} tasks on Enterprise RHEL 6.2. With the {{num_jobs_available}} expansion equal to the number of CPUs, we end up running 4 tests concurrently (each using a 3-node replica set) on the {{rhel62\-small}} distro which is a c4.xlarge (4 CPU, 7.5GiB memory). We should change to use the {{rhel62\-large}} distro and limit the maximum number of resmoke.py jobs to 12. The {{rhel62\-large}} distro is a c4.4xlarge (16 CPU, 30GiB memory), so we'll end up running 12 tests concurrently, but have a larger ratio of available memory to number of concurrent tests.",1 +"SERVER-40801","04/24/2019 16:50:05","resmoke.py logs invalid --excludeWithAnyTags command line argument for local execution","[The {{to_local_args()}} function serializes {{option_value}} as a string|https://github.com/mongodb/mongo/blob/18181d9825ddc62351a6ba94325a38353086248c/buildscripts/resmokelib/parser.py#L450] even when it may actually be a list. The {{\-\-excludeWithAnyTags}} and {{\-\-includeWithAnyTags}} command line options are currently the only two which use {{action=""append""}}. {noformat} [2019/04/24 15:31:06.412] [resmoke] 2019-04-24T15:31:06.412+0000 verbatim resmoke.py invocation: buildscripts/evergreen_run_tests.py --suites=core --shellReadMode=legacy --shellWriteMode=compatibility --storageEngine=wiredTiger --excludeWithAnyTags=requires_find_command --jobs=4 --shuffle --continueOnFailure --storageEngineCacheSizeGB=1 --tagFile=etc/test_retrial.yml --log=buildlogger --staggerJobs=on --buildId=mongodb_mongo_master_enterprise_rhel_62_64_bit_f202c4c1ba24b9f561e8b11dac5b04fa0eeb4919_19_04_24_14_47_53 --distroId=rhel62-small --executionNumber=0 --projectName=mongodb-mongo-master --gitRevision=f202c4c1ba24b9f561e8b11dac5b04fa0eeb4919 --revisionOrderId=25585 --taskId=mongodb_mongo_master_enterprise_rhel_62_64_bit_jsCore_compatibility_f202c4c1ba24b9f561e8b11dac5b04fa0eeb4919_19_04_24_14_47_53 --taskName=jsCore_compatibility --variantName=enterprise-rhel-62-64-bit --versionId=mongodb_mongo_master_f202c4c1ba24b9f561e8b11dac5b04fa0eeb4919 --archiveFile=archive.json --reportFile=report.json --perfReportFile=perf.json [2019/04/24 15:31:06.414] [resmoke] 2019-04-24T15:31:06.413+0000 resmoke.py invocation for local usage: buildscripts/resmoke.py --suites=core --storageEngine=wiredTiger --continueOnFailure --excludeWithAnyTags=['requires_find_command'] --jobs=4 --shellReadMode=legacy --shellWriteMode=compatibility --shuffleMode=on --storageEngineCacheSizeGB=1 {noformat} https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_jsCore_compatibility_f202c4c1ba24b9f561e8b11dac5b04fa0eeb4919_19_04_24_14_47_53",1 +"SERVER-41170","04/25/2019 21:46:17","Run Genny on Microbenchmarks (CBI; etc/perf.yml)","Compile genny and run and run any workload and send to evergreen the ""legacy"" perf.json file produced by the genny metrics post-processing (not expanded metrics yet).",2 +"SERVER-40862","04/26/2019 21:24:53","Log collection options for createCollection commands","This involves changing this line https://github.com/mongodb/mongo/blob/8cbbba49935f632e876037f9f2d9eecc779eb96a/src/mongo/db/catalog/database_impl.cpp#L687. This can be useful to know things like whether the collection was created as a capped collection or not. ",1 +"SERVER-40868","04/26/2019 22:33:03","Log when copying source coll to temporary coll during a renameCollection across dbs","Currently we don't log that the contents of the source coll have been copied to the temporary coll. Logging that this happened would make it easier to see that the temporary coll corresponds to the source coll.",1 +"SERVER-40895","04/29/2019 20:58:55","Dynamically generate burn_in_tests for tag validation","There are certain build variants that are only around to ensure newly added or modified tests are tagged correctly. These variants copy the flags used from other variants and just run 'compile' and 'burn_in_tests'. We could use generate.tasks to dynamically build these variants, which would make them easier to manage and remove some configuration from evergreen. ---- As a mongo engineer, I want a script to generate burn_in_tests for testing tagged variants So there doesn't have to be an explicit evergreen configuration for each one. ---- AC * The following build variants are built dynamically and removed from etc/evergreen.yml. ** ! Enterprise RHEL 6.2 (majority read concern off) ** ! Linux (No Journal)",3 +"SERVER-40922","04/30/2019 23:19:18","Add npm install command to ""run jstestfuzz"" Evergreen function","The ""run jstestfuzz"" function lives [here|https://github.com/mongodb/mongo/blob/e3796fef68ec17ef475e669cd04193aac506bf58/etc/evergreen.yml#L1803] in the {{etc/evergreen.yml}} project configuration file. We'd like to remove the vendored copy of the fuzzer's dependencies from the 10gen/jstestfuzz repository. Note that removing the {{node_modules/}} directory is a ""break the world"" kind of change where older mongodb/mongo commits will fail because the dependencies are no longer present. Until we remove the {{node_modules/}} directory, running {{npm install}} should have no effect because the versions of all the fuzzer's transitive dependencies are pinned by the {{package\-lock.json}} file. We can therefore add the {{npm install}} command now in preparation for removing the {{node_modules/}} directory and break fewer older commits. This should happen in a {{type=system}} task so that if Artifactory is down, then the Evergreen task turns {color:purple}*purple*{color}. Note that this means it should happen before the {{shell.exec}} command to call {{npm test}} and {{npm run}}.",1 +"SERVER-40923","04/30/2019 23:20:27","Remove npm test command from ""run jstestfuzz"" Evergreen function","The ""run jstestfuzz"" function lives [here|https://github.com/mongodb/mongo/blob/e3796fef68ec17ef475e669cd04193aac506bf58/etc/evergreen.yml#L1803] in the {{etc/evergreen.yml}} project configuration file. With the introduction of the {{jstestfuzz\-self\-tests}} Evergreen project, we shouldn't need to run {{npm test}} before every execution of the fuzzer.",1 +"SERVER-40924","04/30/2019 23:21:36","Add Evergreen task to sanity check fuzzer can parse JavaScript tests","The mutational (jstestfuzz) fuzzer uses [acorn|https://github.com/acornjs/acorn] to parse the JavaScript tests into an abstract syntax tree. We've had cases where a Server engineer attempts to use newer JavaScript features supported by the version of SpiderMonkey integrated into the mongo shell than the ECMAScript version we've configured acorn to parse the JavaScript as. This is because special handling for these features (e.g. {{class}}) may need to be done to rewrite the generated JavaScript to avoid uncatchable {{SyntaxErrors}} or strict-mode violations. We should add a {{lint_fuzzer_sanity_patch}} Evergreen task to the Enterprise RHEL 6.2 required builder which takes the contents of [the {{patch_files.txt}} file generated by the ""get modified patch files"" function|https://github.com/mongodb/mongo/blob/f7a4c4a9632f75996ed607ffc77e2a3cab15ea88/etc/evergreen.yml#L1038-L1057] and calls {{npm run parse\-jstest}} on them. The {{lint_fuzzer_sanity_patch}} task should be declared in the {{requires}} section for the {{compile}} task such that scheduling the {{compile}} task (either implicitly or explicitly) implicitly schedules the {{lint_fuzzer_sanity_patch}} task. We should also add {{lint_fuzzer_sanity_all}} Evergreen task to the (since removed) TIG Daily Cron build variant which calls {{npm run parse\-jstest}} on the contents of the {{jstests/}} and {{src/mongo/db/modules/enterprise/jstests}} directories. This is to handle how we cannot guarantee all commits to mongodb/mongo repository have a corresponding patch build, nor can we guarantee Evergreen schedules every commit to the mongodb/mongo repository. Having a periodic task (once a day) which checks all JavaScript tests means we don't need complicated logic like the {{burn_in_tests}} task to diff against the files changed since the commit the task last ran against in the mainline.",3 +"SERVER-41003","05/03/2019 20:34:58","When generating suites, don't set repeat-suites if repeat is in options","In generate_resmoke_suites, there is an option to set repeat_suites to send to resmoke. By default, this is set to 1. If, however, a user tries to add repeat in the resmoke_args, the default will overwrite it. This will not give the user what they want. We could just not set the repeat_suite option if repeat is set [see here|https://github.com/mongodb/mongo/blob/b1a9c9adea89b475fb05660e2a1cad00971e6899/buildscripts/evergreen_generate_resmoke_tasks.py#L294-L295] ---- As a server engineer, when I set repeat in the resmoke_args of a task, I do not want it to be overwritten by the default 1. ---- AC * A _gen task in a patch build can have the repeat_suite value set in the resmoke_args and resmoke will use that value.",2 +"SERVER-41096","05/10/2019 20:44:22","ContinuousStepdown thread and resmoke runner do not synchronize properly on the ""stepdown permitted file"" and ""stepping down file""","Before running workload teardowns, the fsm runner's main thread * [removes the ""stepdown permitted file""|https://github.com/mongodb/mongo/blob/54cc4d76250719b247080c1195d4b672322d989e/jstests/concurrency/fsm_libs/resmoke_runner.js#L166] * [waits for the ""stepping down file"" to not be present.|https://github.com/mongodb/mongo/blob/54cc4d76250719b247080c1195d4b672322d989e/jstests/concurrency/fsm_libs/resmoke_runner.js#L167-L172] But the continuous stepdown thread does the following: * [checks for the ""stepdown permitted file""|https://github.com/mongodb/mongo/blob/54cc4d76250719b247080c1195d4b672322d989e/buildscripts/resmokelib/testing/hooks/stepdown.py#L183] * on [starting a stepdown round|https://github.com/mongodb/mongo/blob/54cc4d76250719b247080c1195d4b672322d989e/buildscripts/resmokelib/testing/hooks/stepdown.py#L187], [writes the ""stepping down file""|https://github.com/mongodb/mongo/blob/54cc4d76250719b247080c1195d4b672322d989e/buildscripts/resmokelib/testing/hooks/stepdown.py#L440] * on completing the stepdown round, [removes the ""stepping down file.""|https://github.com/mongodb/mongo/blob/54cc4d76250719b247080c1195d4b672322d989e/buildscripts/resmokelib/testing/hooks/stepdown.py#L444] This allows the following interleaving: * continuous stepdown thread checks for ""stepdown permitted file"" and sees it * fsm runner thread removes ""stepdown permitted file"" * fsm runner thread checks for ""stepping down file"" and doesn't see it * fsm runner thread starts executing a workload's teardown * continuous stepdown thread starts a stepdown round, which can cause the workload's teardown thread to get a network error|",3 +"SERVER-41169","05/16/2019 00:07:41","Most powercycle testing for Linux was removed from Evergreen","The {{powercycle\*}} Evergreen tasks were mostly being run on the SSL Ubuntu 14.04 build variant which was removed as part of SERVER-37765.",1 +"SERVER-41227","05/18/2019 05:56:50","Update multiversion tests following 4.2 branch","See SERVER-35152 for work done following 4.0 branch.",3 +"SERVER-41231","05/18/2019 06:26:57","Fix verify_versions_test.js for 4.2","Change from 4.1 to 4.2 after branching. See SERVER-35198 as an example.",1 +"SERVER-41295","05/23/2019 18:54:54","Add timeouts to burn_in generated tasks","Since we do not explicitly set a timeout for burn_in generated tasks, a hung test would not fail until the default timeout is hit. This has lead to some timeouts causing large log files to be written and makes it difficult to access the log files. If a given test has test history, we should be able to get an approximation of how long the test should run for and set timeouts dynamically. The ""generate_resmoke_tasks"" already does this. ---- As a server engineer, I want burn_in generated tasks to timeout if they run for too long, so that the log file for the test stays a manageable size. ---- AC: * Every task generated by burn_in that has a test history, sets a timeout based on that test history.",2 +"SERVER-41304","05/24/2019 05:04:10","Update EXIT_CODE_MAP in resmoke.py for Python 3 changes on Windows","The changes from [python/cpython@f2244ea|https://github.com/python/cpython/commit/f2244eaf9e3148d6270839e9aa7c2ad9752c17ed] as part of https://bugs.python.org/issue20172 changed the {{DWORD}} return value from [the {{GetExitCodeProcess()}} Win32 API|https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getexitcodeprocess] to be correctly interpreted as an {{unsigned long}}. That is to say, the code in Python 2.7 previously did {code:c} PyInt_FromLong(exit_code) {code} and was replaced with {code:c} return_value = Py_BuildValue(""k"", _return_value) {code} ([where ""k"" means {{unsigned long}}|https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue]).",1 +"SERVER-41309","05/24/2019 18:50:41","Create a commit_queue task in evergreen.yml","Create a task in evergreen.yml that will run the commit queue. This should use generate.tasks to generate the tasks the commit queue should run. ---- As a server engineer I was a commit queue task that runs as part of the commit queue So that changes I want to merge are validated. As a DAG engineer I want to commit queue to be a generating task So that I can update what tasks are included in the commit queue without disruption. ---- AC: * A new task, ""commit_queue"", is added to evergreen which will generate other tasks that the commit queue should require. * The commit_queue task should generate a ""lint"" task on rhel-62 and a ""compile"" task on all required (!) builders. * The new tasks should no-op on mainline builds.",3 +"SERVER-41321","05/25/2019 18:20:23","Stopping 'mongod-powertest' service returns an error on Windows","{noformat} [2019/05/17 14:58:26.546] 2019-05-17 14:58:25,787 INFO System was last booted 2019-05-17 14:46:34.000000, up 711 seconds [2019/05/17 14:58:26.546] 2019-05-17 14:58:25,787 INFO Operations to perform ['kill_mongod'] [2019/05/17 14:58:26.546] Traceback (most recent call last): [2019/05/17 14:58:26.546] File ""powertest.py"", line 890, in stop [2019/05/17 14:58:26.546] win32serviceutil.StopService(serviceName=self.name) [2019/05/17 14:58:26.546] File ""C:\cygwin\home\Administrator\venv_powercycle\lib\site-packages\win32\lib\win32serviceutil.py"", line 409, in StopService [2019/05/17 14:58:26.546] return ControlService(serviceName, win32service.SERVICE_CONTROL_STOP, machine) [2019/05/17 14:58:26.546] File ""C:\cygwin\home\Administrator\venv_powercycle\lib\site-packages\win32\lib\win32serviceutil.py"", line 320, in ControlService [2019/05/17 14:58:26.546] status = win32service.ControlService(hs, code) [2019/05/17 14:58:26.546] pywintypes.error: (109, 'ControlService', 'The pipe has been ended.') [2019/05/17 14:58:26.546] During handling of the above exception, another exception occurred: [2019/05/17 14:58:26.546] Traceback (most recent call last): [2019/05/17 14:58:26.546] File ""powertest.py"", line 2548, in [2019/05/17 14:58:26.546] main() [2019/05/17 14:58:26.546] File ""powertest.py"", line 2128, in main [2019/05/17 14:58:26.546] ret = remote_handler(options, args) [2019/05/17 14:58:26.546] File ""powertest.py"", line 1202, in remote_handler [2019/05/17 14:58:26.546] mongod.stop(timeout=30) [2019/05/17 14:58:26.546] File ""powertest.py"", line 1080, in stop [2019/05/17 14:58:26.546] return self.service.stop(timeout) [2019/05/17 14:58:26.546] File ""powertest.py"", line 904, in stop [2019/05/17 14:58:26.546] output = ""{}: {}"".format(err[1], err[2]) [2019/05/17 14:58:26.546] TypeError: 'error' object does not support indexing {noformat} https://evergreen.mongodb.com/task/mongodb_mongo_master_windows_64_2k8_ssl_powercycle_kill_mongod_39413ef58dd1f667728b67c86e1bf09146952242_19_05_16_20_32_34/0",1 +"SERVER-41322","05/25/2019 20:19:09","Cygwin rsync errors with ""No medium found"" during powercycle testing","{noformat} [2019/05/21 21:09:45.503] 2019-05-21 21:09:23,820 INFO System was last booted 2019-05-21 21:08:34.000000, up 49 seconds [2019/05/21 21:09:45.503] 2019-05-21 21:09:23,820 INFO Operations to perform ['rsync_data', 'start_mongod'] [2019/05/21 21:09:45.503] 2019-05-21 21:09:23,952 INFO Rsync'ing /data/db to /log/powercycle/beforerecovery-1 (excluding ['diagnostic.data/metrics.interim*']) [2019/05/21 21:09:45.514] 2019-05-21 21:09:44,567 INFO Error executing cmd ['rsync', '-va', '--delete', '--quiet', '--exclude', 'diagnostic.data/metrics.interim*', '/data/db', '/log/powercycle/beforerecovery-1']: rsync: write failed on ""/log/powercycle/beforerecovery-1/db/collection-52-5788850313987091066.wt"": No medium found (135) [2019/05/21 21:09:45.514] rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2] [2019/05/21 21:09:45.523] **** {noformat} https://evergreen.mongodb.com/task/mongodb_mongo_master_windows_64_2k8_ssl_powercycle_1397d1398b3b9b1723cd9b93de6b345f940a17e8_19_05_21_15_20_22/0",1 +"SERVER-41351","05/29/2019 04:29:38","Improve error message from failure to obtain lock for storage stats collection","SERVER-41327 reported a query failure with {{CursorKilled}} error along with the following error in acquiring the lock for storage stats collection: ""Timed out obtaining lock while trying to gather storage statistics for a slow operation."" Though the above message is just a warning, it sounds like this is the cause of the query being killed. Improve the error to remove this confusion and to notify that at worst this would result in the absence of storage statistics from slow operation logs.",1 +"SERVER-41390","05/30/2019 14:34:04","validate_mongocryptd should not check variants without push","The validate_mongocryptd script checks that buildvariants that push the mongocryptd binary also add the variant to a list. However, the script currently does this for all buildvariants, including ones that are only for testing and do not contain a push task. This is causing problems with some dynamic variants we are trying to create. The script could also check for a push task. ---- As a Server Engineer, I want validate_mongocryptd to check for a push task, So that variants that do not push mongocryptd do not need to be included in the variant list. ---- AC: * Variants the do not contain a push task will not fail to validate_mongocryptd if they are not included in the list of variants.",1 +"SERVER-41393","05/30/2019 15:16:32","Drives don't come back up on Enterprise Amazon Linux during powercycle testing","Through experimentation with [~brian.mccarthy], we've found that EBS volumes only come back up with the {{amazon1\-2018\-test}} Evergreen distro if the entry in {{/etc/fstab}} uses the UUID of the drive. The {{ubuntu1604\-powercycle}} Evergreen distro doesn't appear to be impacted. Note that this is also the recommendation in Amazon's documentation as well: {quote} To mount an attached EBS volume on every system reboot, add an entry for the device to the /etc/fstab file. You can use the device name, such as /dev/xvdf, in /etc/fstab, but we recommend using the device's 128-bit universally unique identifier (UUID) instead. Device names can change, but the UUID persists throughout the life of the partition. By using the UUID, you reduce the chances that the system becomes unbootable after a hardware reconfiguration. For more information, see Identifying the EBS Device. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html#ebs-mount-after-reboot {quote} Based on https://unix.stackexchange.com/a/270216, we can use either the {{blkid -o value -s UUID /dev/xvd...}} or {{lsblk -no UUID /dev/xvd...}} command to get the UUID and write that [in place of {{$device_name}} in the {{mount_drives.sh}} script|https://github.com/mongodb/mongo/blob/da754e6c0490a3ccacd04339f34fafbd878331b4/buildscripts/mount_drives.sh#L155].",2 +"SERVER-41401","05/30/2019 19:42:52","patch_files.txt doesn't distinguish between enterprise and community files","patch_files.txt does not distinguish between enterprise and community files, so if it has a line that looks like: {{src/module/my_feature.cpp}}, it's impossible to tell if it is a file in the enterprise or community repo.",1 +"SERVER-41423","05/31/2019 15:55:30","""shell: bash"" being set incorrectly for shell.exec commands in etc/evergreen.yml","The {{shell}} option should be nested under the {{params}} option for the command definition. [The ""setup jstestfuzz"" function has it as a top-level key for the command definition|https://github.com/mongodb/mongo/blob/182f15f37344118b33419c05820c2753d06191ed/etc/evergreen.yml#L1790-L1795]. We should audit for other command definitions which have {{shell: bash}} in the incorrect location.",1 +"SERVER-41439","05/31/2019 21:19:43","Dynamically choose distro for burn_in_tags.py","We're currently hardcoding the distro for compile (""rhe162-large"") but we need this to be dynamic: [https://github.com/mongodb/mongo/commit/ff945d4698dfcc61236537d7a5912ddd1abd9695#diff-2b1323d0ae241e7dacc3d4c913d481d8R99]   As a server engineer,  I want burn_in_tags compile distro to be dynamically selected based on which base variants are provided to the script So that adding a new variant would pick up the right distro   AC: * burn_in_tags picks distro based on buildvariant (i.e. a Windows buildvariant will not run compile on rhe162-large)",2 +"SERVER-41488","06/04/2019 15:13:36","Lint drivers_nightly.yml","YAML linting was added to 4.2, which failed on the new {{drivers_nightly.yml}} file. ",1 +"SERVER-41562","06/06/2019 14:51:30","Add new Evergreen task for the query fuzzer","The following YAML blob is modeled off of what we do for [the {{aggregation_wildcard_fuzzer_gen}} task|https://github.com/mongodb/mongo/blob/f3d9452220039ba74c68fe58b382a237d4e07ad1/etc/evergreen.yml#L5097-L5110]. The ""generate fuzzer tasks"" function uses [the {{generate.tasks}} command|https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#generate-tasks] to create separate Evergreen tasks for actually running the fuzzer so that we can take advantage of using multiple Evergreen hosts. The main difference compared to the aggregation and rollback fuzzers is that we'll use {{npm run query\-fuzzer}} as the entry point. {code:yaml} ## Standalone fuzzer for checking find and aggregate equivalence ## - <<: *jstestfuzz_template name: query_fuzzer_gen tags: [""query_fuzzer""] commands: - func: ""generate fuzzer tasks"" vars: <<: *jstestfuzz_config_vars num_files: 5 num_tasks: 10 npm_command: query-fuzzer resmoke_args: --suites=generational_fuzzer name: query_fuzzer {code} The "".query_fuzzer"" task selector (aka tag) should be added to the following build variants: * enterprise-rhel-62-64-bit * enterprise-rhel-62-64-bit-coverage * macos * ubuntu1804-debug-asan * ubuntu1804-debug-ubsan * windows-64-2k8-ssl",1 +"SERVER-41677","06/12/2019 20:50:08","perf.json should not call 'json.get_history' directly","The 'json.get_history' in evergreen will fail if a new task is being added since that task has no history. The ""etc/system_perf.yml"" file added a workaround for this in SERVER-35207, but ""etc/perf.yml"" was not updated at that time. We should also do this in ""etc/perf.yml"" so that newly added tasks do not fail as well. ---- As a Server Engineer, I want etc/perf.yml to not use 'json.get_history' So that I can add new tasks and have them succeed. ---- AC: * New tasks added to etc/perf.yml should be able to succeed despite not having any history.",2 +"SERVER-41680","06/12/2019 21:42:07","Propagate ${branch_name} Evergreen expansion to fuzzer invocation","[The {{npm run}} command for invoking the fuzzer|https://github.com/mongodb/mongo/blob/ce740566543792bfa4402d278a23e5cb4b1a80fe/etc/evergreen.yml#L1913] should include {noformat} --branch ${branch_name} {noformat} This way the fuzzer can be aware of how it should generate the files based on the version of the server it is running against.",1 +"SERVER-41708","06/13/2019 19:27:59","""set up virtualenv"" failure should be a system failure.","Currently it is marked as a test failure.",1 +"SERVER-41762","06/14/2019 19:56:13","burn_in_tags should not need to generate a compile task","The burn_in_tags scripts generates build_variants to variants that would not normally run, but we only want to run burn_in_tests on. It currently also generates a compile task for each build_variant it creates. It should not need to do this. It should be able to depend and use the artifacts from an existing build_variant. ---- As a server engineer, I want burn_in_tags generated tasks to depend on existing compiles, So that the compile work is not duplicated. ---- AC: * No build_variants generated by burn_in_tags have their own compile task.",2 +"SERVER-41802","06/17/2019 21:55:24","generate_resmoke_tasks doesn't apply max_sub_suites option","When max_sub_suites is set for a given suite, generate_resmoke_tasks does not actual apply it to split up the suites. See [here|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_display_multi_shard_local_read_write_multi_stmt_txn_jscore_passthrough_patch_259bd089d0265ac510acbe4512eb706cd553562b_5d0173e39ccd4e3b3d8a4ef8_19_06_12_21_52_04##%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%2522259bd089d0265ac510acbe4512eb706cd553562b%2522%257D%255D%257D] ---- As a Server engineer, I want max_sub_suites to control how much resmoke suites are split up so that I have a way to prevent suites being split up too much. ---- AC: * A generated suite with max_sub_suites set should only be split into the specified number of sub-suites.",2 +"SERVER-41836","06/20/2019 17:02:39","Log thread Id as part of error in FSM tests","It looks like we don't log the thread id when there is a single unique [stacktrace|https://github.com/mongodb/mongo/blob/2b34b45c83f03354cc88c295cf24aca7fb9418cc/jstests/concurrency/fsm_libs/runner.js#L337-L341]. ",1 +"SERVER-41926","06/26/2019 12:36:36","Enumerate and remove Storage-related FeatureCompatibilityVersion 4.0-dependent code and tests","The following tasks need to be completed: 1. Create a list of tickets with code and tests to remove, add them to the 4.4 Upgrade/Downgrade Epic, and mark them as ""is depended on by"" this ticket. This will assist the Upgrade/Downgrade team in tracking progress. If there is an insufficient amount of work to warrant multiple tickets, then the work can be done under this ticket directly. 2. Complete all necessary tickets promptly. 3. Create a ticket identifying Storage-related generic upgrade/downgrade references that the Upgrade/Downgrade team should update now that the 4.0-dependent references have been removed.",5 +"SERVER-41940","06/26/2019 21:23:49","Remove use of evergreen_client library in favor of evergreen.py in burn_in_tests","The [burn_in_tests script calls out to evergreen|#L32] to get test history and other information about the task running. We recently build a python evergreen [client|https://github.com/evergreen-ci/evergreen.py] and [added it to the burn_in_tests buildscript|https://mongodbcr.appspot.com/461660007]. We should be consistent and use evergreen.py to access the evergreen api everywhere in the burn_in_tests script. ---- As a mongo engineer I want burn_in_tests to use a common evergreen client library so that I don't have to maintain code to connect to evergreen. ---- AC * The burn_in_tests script does not use the evergreen client in buildscripts.    Related ticket: https://jira.mongodb.org/browse/SERVER-40893",2 +"SERVER-42032","07/01/2019 21:42:27","mongodb-javascript-stack always fails when running in hang_analyzer.py","{noformat} [2019/06/05 03:28:31.295] Running Print JavaScript Stack Supplement [2019/06/05 03:28:31.295] Ignoring GDB error 'No type ""mozjs"" within class or namespace ""mongo"".' in javascript_stack [2019/06/05 03:28:31.295] Ignoring GDB error 'No type ""mozjs"" within class or namespace ""mongo"".' in javascript_stack [2019/06/05 03:28:31.295] Ignoring GDB error 'No type ""mozjs"" within class or namespace ""mongo"".' in javascript_stack [2019/06/05 03:28:31.295] Ignoring GDB error 'No type ""mozjs"" within class or namespace ""mongo"".' in javascript_stack [2019/06/05 03:28:31.295] Ignoring GDB error 'No type ""mozjs"" within class or namespace ""mongo"".' in javascript_stack [2019/06/05 03:28:31.295] Ignoring GDB error 'No type ""mozjs"" within class or namespace ""mongo"".' in javascript_stack {noformat} https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_ubuntu1604_replica_sets_auth_1_ubuntu1604_c939010fe98ba0a8affe7d0d30d4e8d57e68242b_19_06_05_00_44_58/0?type=T#L2539 ---- h6. Original description In gdb, if we're in a frame that does not know about mongo::mozjs:kCurrentScope, then we will not print a javascript stack trace. This can be especially useful when debugging our integration tests and gdb optimizes variables out of the core dump. [~max.hirschhorn] figured out that switching the frame in gdb (frame 1) and then running mongodb-javascript-stack will work fine. Perhaps we can arbitrarily switch frames [here|https://github.com/mongodb/mongo/blob/a351f48ad122ca59ed45e5df877ef398c099c938/buildscripts/gdb/mongo.py#L530-L532] before trying to print the stack trace. ",1 +"SERVER-50085","07/02/2019 16:57:54","Make it easier to correlate mongo process names, ports, PIDs in logs of fixtures started by resmoke","For test suites whose underlying mongod/s processes are managed by resmoke.py, it can be hard to figure out the mapping between mongo process ports, replica states, and PIDs in the log messages. For example, in the {{replica_sets_jscore_passthrough}} suite, a log line only shows the state of that replica set node: {noformat} [ReplicaSetFixture:job9:primary] 2019-07-02T02:09:03.027+0000 I INDEX [conn98] validated collection config.transactions (UUID: f6c6dde1-fc12-4608-b439-c09508cfee9e) {noformat} Ideally, we would have an easy way to know, when looking at any log line, the replica set state (primary, secondary, etc.), the PID, and the port that the mongo process started up on. In sharding suites we also want to know whether the node is a config server, mongoS, etc. Currently, figuring out this information requires one to trace back to the beginning of the logs (which may be in an entirely separate file if the fixture was not restarted recently) and look for startup messages with this information. One thought would be to print out a complete mapping of ports, PIDs, and current replica set states at the beginning of every new test execution. We could also include this info directly in the log message prefix. ",2 +"SERVER-42071","07/03/2019 18:25:59","notary client errors should not be system-failures","in cases like this: https://evergreen.mongodb.com/task/mongodb_mongo_v4.0_enterprise_rhel_70_64_bit_push_5f93fc9db3a3475dd2c7543b9f1e1179e6f9065f_19_06_14_13_51_46 notary client errors obscured a different issue in evergreen. this is a one line change (adding {{type: ""test""}} on line 2437, the hard part is figuring what kind of error (test=red or setup=lavender) we want this to be. I think it shouldn't be a system failure, as this will probably make it harder to diagnose other issues. ",2 +"SERVER-42075","07/03/2019 19:56:41","Add DSI module to perf.yml","We explicitly git clone mongo-perf and DSI in perf.yml. We should clean it up to use modules for all those things, and review all the module calls.",2 +"SERVER-42094","07/05/2019 17:46:17","perf.yml should check out the enterprise module revision from the manifest, not master","This is a problem if you have changes to performance test with an old merge base. When running a patch build against the {{performance}} project, the system will apply the changes under test against the correct version of the mongodb/mongo repo, but will attempt to compile them against HEAD of the master branch of enterprise modules. This can cause a spurious compile failure if in the interim changes have been merged to enterprise modules which required paired changes in mongodb/mongo.",1 +"SERVER-42136","07/10/2019 15:15:44","Add new Evergreen task for sharded cluster version of the query fuzzer","* [The existing {{query_fuzzer_gen}} task|https://github.com/mongodb/mongo/blob/1433d75e416e1078bb490ecda04c9e12b1a0ab3d/etc/evergreen.yml#L5203-L5215] should be updated to have {{jstestfuzz_vars: \-\-diffTestingMode standalone}} specified. * A new {{query_fuzzer_sharded_gen}} task should be added that runs {{npm run query\-fuzzer \-\- \-\-diffTestingMode sharded}}. Note that because we'll tag it with ""query_fuzzer"" there shouldn't be a need to update the task lists for any build variants explicitly. {code:yaml} ## jstestfuzz sharded cluster fuzzer for checking find and aggregate equivalence ## - <<: *jstestfuzz_template name: query_fuzzer_sharded_gen tags: [""query_fuzzer""] commands: - func: ""generate fuzzer tasks"" vars: <<: *jstestfuzz_config_vars num_files: 5 num_tasks: 10 jstestfuzz_vars: --diffTestingMode sharded npm_command: query-fuzzer resmoke_args: --suites=generational_fuzzer name: query_fuzzer_sharded {code}",1 +"SERVER-42144","07/10/2019 18:05:40","Remove use of evergreen /rest/v1 API in favor of evergreen.py in bypass_compile_and_fetch_binaries.py","The bypass_compile_and_fetch_binaries and burn_in_tags_bypass_compile_and_fetch_binaries scripts call out to the evergreen API to get build ids for a given revision. We recently built a python evergreen client and added it to the burn_in_tests buildscript. We should be consistent and use evergreen.py to access the evergreen api everywhere. Currently, the scripts call this endpoint directly: https://evergreen.mongodb.com/rest/v1/projects//revisions/ Instead, they can use evergreen.py to call this v2 endpoint: https://evergreen.mongodb.com/rest/v2/versions/ ------------------------------------------------------ As a mongo engineer I want bypass_compile_and_fetch_binaries and burn_in_tags_bypass_compile_and_fetch_binaries to use a common evergreen client library so that I don't have to maintain code to connect to evergreen. AC * The bypass_compile_and_fetch_binaries and burn_in_tags_bypass_compile_and_fetch_binaries scripts do not directly call the Evergreen API ------------------------------------------------------ Related tickets: * https://jira.mongodb.org/browse/SERVER-40893 * https://jira.mongodb.org/browse/SERVER-41940",3 +"SERVER-42156","07/11/2019 09:46:50","Install of mongodb-org-tools 3.2.22 not possible on RHEL 7","It's not possible to install mongodb-org-tools.x86_64 0:3.2.22-1.el7 on RHEL.   {code:java}============================================================================================== Package Arch Version Repository Size==============================================================================================Updating: mongodb-org x86_64 3.2.22-1.el7 mongodb-org-3.2 5.8 k mongodb-org-mongos x86_64 3.2.22-1.el7 mongodb-org-3.2 5.7 M mongodb-org-server x86_64 3.2.22-1.el7 mongodb-org-3.2 13 M mongodb-org-shell x86_64 3.2.22-1.el7 mongodb-org-3.2 6.8 M mongodb-org-tools x86_64 3.2.22-1.el7 mongodb-org-3.2 35 MTransaction Summary==============================================================================================Upgrade 5 PackagesTotal size: 60 MTotal download size: 35 MIs this ok [y/d/N]: yDownloading packages:No Presto metadata available for mongodb-org-3.2mongodb-org-tools-3.2.22-1.el7 FAILED --:--:-- ETA https://repo.mongodb.org/yum/redhat/7Server/mongodb-org/3.2/x86_64/RPMS/mongodb-org-tools-3.2.22-1.el7.x86_64.rpm: [Errno 14] curl#63 - ""Callback aborted""Trying other mirror.Error downloading packages: mongodb-org-tools-3.2.22-1.el7.x86_64: [Errno 256] No more mirrors to try. {code} {code:java}[mongodb-org-3.2] name=MongoDB Repository baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.2/x86_64/ gpgcheck=1 enabled=1 gpgkey=https://www.mongodb.org/static/pgp/server-3.2.asc {code} A yum clean all, rm -rf /var/cache/yum/* does not fix the issue. The exact same problem at SERVER-39005 and SERVER-26564.   It does not matter if I use the 7server or 7 repository.",3 +"SERVER-42195","07/12/2019 05:16:45","Stepdown suites fail with Python exception when run with --repeat >1","[We're attempting to use the same {{FlagBasedStepdownLifecycle}} instance across executions of the test suite|https://github.com/mongodb/mongo/blob/9ae337bd27f7a513df548256400596a6eba4d7a3/buildscripts/resmokelib/testing/hooks/stepdown.py#L70-L84]. This would mean {{FlagBasedStepdownLifecycle.__should_stop == True}} the moment the second execution of the test suite begins. We should instead construct a new {{FlagBasedStepdownLifecycle}} instance when constructing a new {{_StepdownThread}} instance. {noformat} [executor] 2019-07-12T00:12:46.427-0400 Summary: All 5 test(s) passed in 10.96 seconds. [ContinuousStepdown:job0] Starting the stepdown thread. [ContinuousStepdown:job0] The stepdown thread is not running. [executor:js_test:job0] 2019-07-12T00:12:46.429-0400 JSTest jstests/core/indexc.js marked as a failure by a hook's before_test. Traceback (most recent call last): File ""/Users/maxh/debugging/mongo/buildscripts/resmokelib/testing/job.py"", line 242, in _run_hooks_before_tests self._run_hook(hook, hook.before_test, test) File ""/Users/maxh/debugging/mongo/buildscripts/resmokelib/testing/job.py"", line 228, in _run_hook hook_function(test, self.report) File ""/Users/maxh/debugging/mongo/buildscripts/resmokelib/testing/hooks/stepdown.py"", line 97, in before_test self._check_thread() File ""/Users/maxh/debugging/mongo/buildscripts/resmokelib/testing/hooks/stepdown.py"", line 113, in _check_thread raise errors.ServerFailure(msg) buildscripts.resmokelib.errors.ServerFailure: The stepdown thread is not running. {noformat}",1 +"SERVER-42377","07/12/2019 18:51:24","burn_in_tests looks at incorrect commit to compare against","I pushed a change that I expected to have a one-off {{burn_in_tests}} failure, but it failed in a few commits (see BF-13954) because the same tests were still being run. Here is the [task|https://evergreen.mongodb.com/task/mongodb_mongo_v4.2_enterprise_rhel_62_64_bit_required_inmem_display_burn_in_tests_9723ffc820396ca6ccf542cd5d1c3518b5d2db12_19_07_11_20_36_15] for the commit. It looks like {{burn_in_tests}} is looking at the wrong commit to compare against to find changed tests.",2 +"SERVER-42227","07/13/2019 14:04:32","Cap how many tasks burn_in_tests will generate","If burn_in_tests generates too many tests, it can push evergreen to the limit and cause bad slowdowns. We should put a cap on how many tasks burn_in_tests will generate and fail it we want to generate more tasks than that. ---- As a server engineer, I want burn_in_test to limit how many tasks are generated So that it doesn't cause evergreen to slow down. ---- AC: * burn_in_tests should not generate more than 1000 tasks.",1 +"SERVER-42228","07/13/2019 22:15:28","LoggerRuntimeConfigError exceptions can lead to background dbhash thread running until Evergreen task times out","[If {{TestReport.startTest()}} raises an exception, then {{Job._run_hooks_after_tests()}} won't be called|https://github.com/mongodb/mongo/blob/e6644474d876eb99579101e81d38c363feef07cd/buildscripts/resmokelib/testing/job.py#L198-L222]. For test suites which use the {{CheckReplDBHashInBackground}} hook, this leads to the background thread continuing to spawn mongo shell processes and running the {{run_check_repl_dbhash_background.js}} hook. If logkeeper is overwhelmed, then an {{errors.LoggerRuntimeConfigError}} exception can also occur when attempting to tear the fixture down. This leads the Evergreen task to time out instead of failing with code 75 because the background dbhash check will continue to run while the fixture is still running and resmoke.py's flush thread will therefore never exit. We don't want to always run the {{after_test()}} method for a hook though. For example, if running a test crashes the server, then we shouldn't attempt to run any data consistency checks because they'll just fail to connect to the downed server.",2 +"SERVER-42240","07/15/2019 20:21:56","burn_in_tags_gen tasks should use the binaries from the patch build","The tasks created from burn_in_tags gen are using binaries from the base commit, not the patch commit. See comments for more detail.",1 +"SERVER-42309","07/20/2019 15:57:48","test_generator should clean up files it creates","The test_generator tests create several files as they run, but do not clean them up. These files cause problems when running lint locally and should just be cleaned up after the test is run. ---- As a server engineer, I want test_generator to clean up the files it creates, So they don't cause problems when I'm trying to do other things. ---- AC: * After running `buildscripts_test`, no extra files are left around.",1 +"SERVER-42356","07/23/2019 19:31:55","teardown(finished=True) isn't ever called for the NoOpFixture","The [flush thread will block forever for the next event |https://github.com/mongodb/mongo/blob/de38a35403c64e2dfe7e9ffc38fb95f9674773b3/buildscripts/resmokelib/logging/flush.py#L106] if there isn't one lined up. We should make it not wait. One way could be to use the non-blocking version of [scheduler.run()|https://docs.python.org/3/library/sched.html#sched.scheduler.run]",1 +"SERVER-42440","07/25/2019 22:42:38","burn_in_test should run tasks on the distro they are normally run","When burn_in_tests runs the tests it discovered, it should run those tests on the distro they are normally run on. Otherwise, tests could fail due to resource constraints that are not normally there. ---- As a Server Engineer, I want burn_in_tests to run on their normal distro, so that I don't spend time investigating failure due to resource constraints. ---- AC: * Tasks that normally run on non-default distros run on the same distros during burn_in_tests.",2 +"SERVER-42452","07/26/2019 17:30:50","failNonIntentLocksIfWaitNeeded failpoint interrupts lock requests in UninterruptibleLockGuard","[Interrupting the lock request leads to a {{LockTimeout}} exception|https://github.com/mongodb/mongo/blob/25d5f6a0b01f261e633587013e4ab8116ea2930a/src/mongo/db/concurrency/lock_state.cpp#L905-L912] which is known not to be handled by the C++ code due to the presence of the {{UninterruptibleLockGuard}} and causes the server to abort. This issue was found during the rollback fuzzer where we suspect background thread (e.g. the TTL monitor) was holding an intent lock on the collection and prevented [the collection lock acquisition in {{MultiIndexBlock::cleanUpAfterBuild()}} for a background index build from being acquired immediately|https://github.com/mongodb/mongo/blob/25d5f6a0b01f261e633587013e4ab8116ea2930a/src/mongo/db/catalog/multi_index_block.cpp#L98].",1 +"SERVER-42482","07/29/2019 21:48:51","burn_in_tests needs to take minimum test runs into account for timeouts","burn_in_tests failed due to timeouts in this [patch|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_display_burn_in_tests_patch_dae371c478e1a828ac911096d85f94be8e936ef9_5d3f0da056234359d94af31c_19_07_29_15_15_53/0#/%23%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%2522dae371c478e1a828ac911096d85f94be8e936ef9%2522%257D%255D%257D#%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%2522dae371c478e1a828ac911096d85f94be8e936ef9%2522%257D%255D%257D]. This is because when we calculated the timeout value to use, we did not take into account that a minimum number of executions could be specified (which is 2 by default). The timeout is set based on how much over the 10 minute repeat time we expect the test to be, but in this case, a whole other execution of the test will occur cause it to hit the timeout. ---- As a server engineer, I want burn_in_tests not to timeout on tests that have a runtime greater than 10 minutes so that burn_in_tests can properly validate those tests. ---- AC: * burn_in_tests is able to run successfully on tests with runtimes > 10 minutes.",1 +"SERVER-42571","08/01/2019 04:57:23","Collect Windows event logs on remote machine during powercycle","We've had many failures since upgrading to Windows Server 2016 where the mongod service fails to start or the process abruptly terminates after having started. The Windows event logs revealed that after {{notmyfault.exe}} is used to crash the virtual machine, the log and data files, or in some cases the mongod.exe executable itself, cannot be opened. We should collect the recent messages from the Application, Security, and System event logs on Windows using [the {{wevtutil}} utility|https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/wevtutil] in order to have more diagnostics around this issue and for other mysterious ones that will surely come up in the future.",2 +"SERVER-42575","08/01/2019 15:24:47","compiling and running unittests should be a single task","As part of SERVER-33963, the unittest tests were split up into 2 tasks, one to compile the unittests and one to run the unittests. They were also put in a task group with max hosts of 1 since in order to run the unittests, you need the artifacts generated by compiling them. However, task groups do not have a hard guarantee that later tasks will run on the same host as earlier task. For tasks that can share a setup, task groups work well for saving some time by sharing setup execution, but they provide inconsistent results when sharing artifacts between tasks. We should switch the compile and run tasks back to be a single task, so that the unittest task can be more reliable. ---- As a Server Engineer, I want compile unittest and run unittest to be in a single task, So that it will not fail due to the tasks being run on different hosts. ---- AC: * compile unittests and run unittests run as a single task. ",2 +"SERVER-42607","08/02/2019 13:51:54","add quoting to resmoke's invocation for local usage line","Currently, there are situations where resmoke's local invocation line cannot be used verbatim; one such example is: [https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_enterprise_rhel_62_64_bit_alt_create_indexes_replica_sets_multi_stmt_txn_stepdown_jscore_passthrough_0_enterprise_rhel_62_64_bit_alt_create_indexes_cc1a75e4a6d8de8478e7253da7bd6376052d57a6_19_07_15_15_35_13/0?type=T#L392]   Could resmoke quote the line so that it would work with standard Bash?  I'm not sure how hard it would be to figure out where quotes would need to go.  (One simple way to do this would be to simply add double quotes around every parameter.)",2 +"SERVER-42615","08/02/2019 17:06:33","Run chkdsk command on Windows after each powercycle loop","We've seen a variety of errors during powercycle testing on Windows after upgrading to Windows Server 2016, none of which are indicative of a MongoDB issue: * StartService fails with ""The service did not respond to the start or control request in a timely fashion"" * StartService fails with ""The device is not ready"" * StartService fails with ""Access is denied"" * StartService fails with ""Error performing inpage operation"" * The mongod-powertest service terminates unexpectedly due to not being able to access some file (unnamed by the Application event logs) We should run [the {{chkdsk}} command|https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/chkdsk] in read-only mode (i.e. without any extra parameters) to see if we can collect diagnostics indicating the NTFS volume is corrupt after using {{notmyfault.exe}} to crash the machine.",2 +"SERVER-42622","08/04/2019 06:47:55","resmoke.py doesn't attempt to tear the fixture down if setting it up raises an exception","Discovered this issue while investigating SERVER-42356. It is yet another way for {{close()}} to never be called on the {{FixtureLogger}}. [{{Job.teardown_fixture()}} won't be called if {{Job.setup_fixture()}} raises an exception|https://github.com/mongodb/mongo/blob/ba434d76511a28336d23c0bb2985f5cf8164670a/buildscripts/resmokelib/testing/job.py#L103-L105].",1 +"SERVER-42623","08/04/2019 07:08:08","sched module in Python 3 causes close() event to mistakenly be canceled, leading to resmoke.py hang","Discovered this issue while investigating SERVER-42356. It is yet another way for {{close()}} to never be called on a {{FixtureLogger}} or {{TestLogger}}. The changes from https://hg.python.org/cpython/rev/d8802b055474 made it so [{{sched.Event}} instances returned by {{sched.scheduler.enter()}} and {{sched.scheduler.enterabs()}} are treated as equal if they have the same (time, priority)|https://github.com/python/cpython/blob/v3.7.0/Lib/sched.py#L36]. [It is therefore possible to remove the wrong event from the list when {{sched.scheduler.cancel()}} is called|https://github.com/python/cpython/blob/v3.7.0/Lib/sched.py#L96].",2 +"SERVER-42664","08/07/2019 15:00:09","Add function to mongo shell for converting BSONObj to Array","The purpose of this function is to make it possible to convert the object {code:javascript} {"""": 1, """": 2} {code} for sort keys returned by compound sort specifications into the array {code:javascript} [1, 2] {code} which can be meaningfully interacted with via JavaScript. The duplicate empty string field names in the object form are otherwise hidden by the first one. It should be possible to use this function on Objects which have non-duplicate and non-empty field names as well. In those cases, it can be thought of as a function similar to [{{Object.values()}}|https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_objects/Object/values] but one that actually works for {{BSONInfo}} instances. We can model this function off [the {{bsonWoCompare()}} and {{bsonBinaryEqual()}} functions|https://github.com/mongodb/mongo/blob/b8602c086ff469967bedc82b14d63d4a236d092c/src/mongo/scripting/mozjs/bson.cpp#L271-L299] where the mongo shell will define a {{bsonObjToArray()}} global function that uses [{{ObjectWrapper::toBSON()}}|https://github.com/mongodb/mongo/blob/b8602c086ff469967bedc82b14d63d4a236d092c/src/mongo/scripting/mozjs/objectwrapper.cpp#L529] to convert/extract the argument as a {{BSONObj}}. It should then use [{{ValueReader::fromBSONArray()}}|https://github.com/mongodb/mongo/blob/b8602c086ff469967bedc82b14d63d4a236d092c/src/mongo/scripting/mozjs/valuereader.cpp#L230-L246] to convert the {{BSONObj}} into a JavaScript Array.",2 +"SERVER-42671","08/07/2019 21:37:57","_gen task failure due to missing tests should be marked test failures","When generating tasks for sub-suite execution, a common problem is that a test that is referenced in a resmoke configuration has been moved or deleted. This will cause the generate script to fail, since the referenced file cannot be found. However, the _gen task fails as a system failure in evergreen and those errors often get ignored. We should switch the failure to show up as a test failure to let the developer know that action is required on their part to fix the issue. ---- As a Server engineer, I was _gen task failures to show up as test failures So that I can know to investigate them ---- AC: * _gen task failures caused by a missing test being referenced show up as test failures in evergreen.",1 +"SERVER-42704","08/08/2019 16:00:53","Add placeholder task for evergreen commit queue","Add a placeholder task that currently no-ops for the commit queue. This task should not be included in one of the required build variants. ---- As a Server engineer I want to be able to run the commit queue without any real tasks So that it I can use the commit queue while minimizing the change of colliding with other merges. ---- AC: * A single no-op task is available to be part of the commit queue. * The task is not part of the required builders.",1 +"SERVER-42913","08/20/2019 15:04:30","Use pre_error_fails_task in etc/evergreen.yml","In Evergreen, 'pre' tasks are tasks that run at the start of all task execution. However, failures in these tasks are silently ignored by default. This can lead to tasks being in different states when they run if any of the 'pre' tasks fail. Evergreen has an option, [pre_error_fails_task|https://github.com/evergreen-ci/evergreen/wiki/Project-Files#pre-post-and-timeout], that will cause failure in the 'pre' tasks to fail the task execution. Enabling this will allow us to avoid running tasks in a different state than they normally would be. ---- As a server engineer, I want 'pre_error_fails_task' to be enabled in the evergreen configuration, So that I can know my tasks are running in a consistent state. ---- AC: * pre_error_fails_task is enabled in etc/evergreen.yml.",2 +"SERVER-43022","08/23/2019 21:00:18","Allow compile to be run independently on rhel 62","For commit queue, we want to be able to run just ""compile"" with no other tasks. However, right now, several other tasks are pulled in with compile including burn_in_tests. We should remove this link in order to ensure the commit queue is stable.",1 +"SERVER-43055","08/27/2019 18:26:53","Prevent an exception from being thrown when gdb prints a BSONObj with datetimes beyond datetime.MAXYEAR","The BSON Python package can throw exceptions; the GDB pretty printer allows these exceptions to escape up into gdb/lldb, which can cause them to crash (I'm not sure why). Here is an example where I managed to get gdb to print a python stack trace and not crash (this is difficult to achieve): {noformat} (gdb) p oplogBSON $13 = owned BSONObj 340 bytes @ 0x555556a614a8Traceback (most recent call last): File ""buildscripts/gdb/mongo_printers.py"", line 130, in children bsondoc = buf.decode(codec_options=options) File ""/opt/mongodbtoolchain/revisions/e84eb3fd219668197589e62dba14b9914712642d/stow/python3-v3.FFC/lib/python3.7/site-packages/bson/__init__.py"", line 1164, in decode return _bson_to_dict(self, codec_options) bson.errors.InvalidBSON: year 292278994 is out of range {noformat} ",2 +"SERVER-43067","08/28/2019 19:44:20","Add end to end tests for generating sub-tasks","Add end to end tests for buildscripts/evergreen_generate_resmoke_suites.py. ---- As a server engineer, I want end to end tests for evergreen_generate_resmoke_suites so that I can makes changes to the scripts without worrying about breaking things. ---- AC * At least 1 test executes the main body of evergreen_genreate_resmoke_suites.",2 +"SERVER-43143","09/03/2019 18:24:21","Add timeouts to evergreen lint tasks.","The lint task in evergreen normally takes around 15 - 20 minutes to complete. Over the last 6 months, the highest runtimes we have seen have been around 35 minutes. About a week ago, however, we saw the lint task get hung and didn't exit until the task timed out. The lint task just uses the default timeouts, so it took over 3 hours before the task actually ended. Since the lint task is included as part of the commit queue, hangs like this are problematic. They would block the entire queue for a number of hours. To avoid this issue, we should add a more aggressive timeout to the lint task. Something around 40 minutes should be acceptable. ---- As a server engineer, I want to lint task to timeout if it runs for too long, So that I am not waiting on a hung task. ---- AC * Lint tasks in evergreen time out if running for more than 40 minutes.",1 +"SERVER-43150","09/04/2019 16:09:23","Reduce duration of jstestfuzz_interrupt_replication_flow_control_gen and jstestfuzz_replication_continuous_stepdown_flow_control_gen","This ticket is about the jstestfuzz_interrupt_replication_flow_control_gen and jstestfuzz_replication_continuous_stepdown_flow_control_gen tasks. I recently ran a patch (https://evergreen.mongodb.com/version/5d6456d461837d02851d7ac8) and noticed that these two tasks took ~1hr (since they take 30-40 mins each and compile takes ~20 mins). The Targeted Test Selection project hopes to bring down patch build times, and it will run all fuzzer tests as part of it, so it would be great if these tasks could each run in < 20 mins (not including compile time). [~robert.guo] recommends we reduce the number of generated files for these tasks like we do for some existing fuzzer tasks (https://github.com/mongodb/mongo/blob/ff685d2d6e370594261eccbef8e60b2f7cc61e28/etc/evergreen.yml#L5508-L5520). ---- As a server engineer, I should be able to run both jstestfuzz_interrupt_replication_flow_control_gen and jstestfuzz_replication_continuous_stepdown_flow_control_gen tasks in under 20 mins, so that the patch build time of all tasks run as part of Targeted Test Selection is less than an hour. ---- AC * Running jstestfuzz_interrupt_replication_flow_control_gen and jstestfuzz_replication_continuous_stepdown_flow_control_gen in a patch build takes under 20 mins (not including compile time).",1 +"SERVER-43153","09/04/2019 19:08:30","Expose pids of spawned processes in the shell","Expose the existing {{getRunningMongoChildProcessIds}} function as a shell-native so we can run the hang-analyzer on all sub-processes in error scenarios.",3 +"SERVER-43186","09/05/2019 22:00:10","Limit the number of tests added to a generated suite","The ""CleanEveryN"" test hook gets run every ""N"" tests. Due to the way tests are run, this could be run against a different test every execution. This means that when we use test runtime to calculate timeouts, we might not properly account for the ""CleanEveryN"" runtime and set a timeout too short. This is most problematic on suites made up of lots of short running tests. If we had a maximum number of tests per suite we used when dividing the tests up, this would no longer be a problem. ---- As a server engineer, I want there to be a maximum number of tests per generated sub-suite, So that the ""CleanEveryN"" hook does not cause timeouts. ---- AC: * All suites that run the ""CleanEveryN"" hook set a maximum number of tests per suite.",2 +"SERVER-43253","09/10/2019 20:49:06","Resmoke passes pids of peer mongo* process in TestData","A ""peer"" {{mongo*}} process is one started by resmoke rather than by the shell itself (via {{_startMongoProgram}}) for a particular test. Modify resmoke to pass peer {{mongo*}} process PIDs into spawned {{mongo}} (shell) processes via the existing {{TestData}} mechanism.",3 +"SERVER-43254","09/10/2019 20:50:30","Hang Analyzer shell integration uses child and peer mongo processes","This can be done in parallel with any other PM-1546 work. Create a new {{whatthefailure.js}} javascript file (class) in the shell. For now this will just have a single static method called {{WTF.areOtherMongosDoing(opts)}} (name subject to change). This will shell out to the existing {{hang_analyzer.py}} script via the {{runProgram}} shell built-in. The {{opts}} parameter has the following fields: # {{pids}} optional array of pids to pass into hang-analyzer. If not specified will use the pids of child and/or peer mongo processes obtained via TestData or shell built-ins (both added in other tickets) # {{args}} optional array of strings - additional set of args to pass to hang-analyzer. If not specified will use reasonable defaults (probably just empty). This is a ""private"" parameter (not documented or required to be backward-compatible) because most users should never need to use/see/set it.",3 +"SERVER-43255","09/10/2019 20:52:23","Automatically call whatthefailure from assert.soon and friends","Add logic to {{assert.soon}} to automatically call hang-analysis prior to throwing. Add an additional optional parameter to {{assert.soon}} which is additional params to pass to hang-analysis js function. For now do this *in addition to* throwing such that hang-analysis is just a fancy new bonus-feature. Need to do this while some users of assert.soon are using assert.soon as a retry mechanism. A separate ticket will fix all callers of assert.soon",2 +"SERVER-43256","09/10/2019 20:53:27","Fix incorrect uses of assert.soon and make hang-analysis call exit","A number of places use assert.soon as a retry mechanism (e.g. ssl_test.js), and at least one test (index_delete.js) runs assert.soon in a try/finally block to capture better error messages. ↑ looks like this: {code} try { assert.soon(function() { return checkProgram(serverPID).alive && (0 === _runMongoProgram.apply(null, clientArgv)); }, ""connect failed"", connectTimeoutMillis); } catch (ex) { return false; } finally { _stopMongoProgram(this.port); } {code} In the cases where assert.soon fails due to timeout, we want to instead run hang-analysis and exit rather than returning control to the caller. (Comment from Robert: I think assert.soon almost always fails due to a timeout. What's the reason for exiting in this case?) Uses of assert.soon are pervasive in {{jstests}} so do a best-effort fix here. Could modify assert.soon in a patch-build to throw immediately and any test that *doesn't* fail is likely using it incorrectly. If there are more than (say) 3-4 cases of using assert.soon as a retry mechanism, create a helper {{assert.retry}} method (name/location tbd) that has a similar signature to assert.soon but doesn't call hang-analyzer or barf if the callback is never truthy and instead returns a {{[success, error]}} array where {{success}} is the last result of the callback and error is any errors thrown by the callback. (Exact signature tbd depending on how it's used by callers of extant assert.soon.) Finally, once this is done, modify existing callers of the hang-analyzer (probably just assert.soon) to call exit after running hang-analysis.",3 +"SERVER-43288","09/11/2019 20:52:25","Update fallback values for generated tasks","The cached historic test results have been turned off in evergreen for the past few weeks. This has lead to generated tasks not being able to use runtime to split up the tasks. All tasks have a fallback value to use to split tasks if there isn't historic data, but some of those value are set to 1 and are causing timeouts. For [example|https://evergreen.mongodb.com/task/mongodb_mongo_master_ubuntu1804_debug_asan_display_logical_session_cache_sharding_100ms_refresh_jscore_passthrough_0031fa41177db46789e411895a5bcd33b2847ed5_19_09_04_12_24_50], we should do a pass through all the suites and make sure the fallback values are appropriate. ---- As a server engineer, I want the generated task fallback value to be set appropriately so that tasks are still split even if we can't get historic test results. ---- AC: * All ""generate resmoke tasks"" in etc/evergreen.yml have a fallback_num_sub_suites value set that is not 1.",2 +"SERVER-43406","09/20/2019 22:02:04","Reduce pip logging in tasks","In most evergreen tasks, we setup a python virtualenv for python scripts to run in. As part of that, we do a `pip install` for the requirements. This writes a lot of information to the logs that is rarely needed. We could pip this output to a file and upload it, that would clean up the logs, but still provide traceability if what python packages have been installed needs to be investigated. ---- As a Server engineer, I want pip install of requirements not to write to the evergreen logs, So that it is easier to find what I'm looking for in the logs. ---- AC: * pip install does not write all the installed packages to the log. * The packages and version installed by pip are still available if needed.",2 +"SERVER-43608","09/24/2019 19:28:34","End to end tests for burn_in_tests","There was an bug introduced to resmoke that caused burn_in_tests to start failing. We should add some end to end tests for burn_in_tests so that we can catch these type of errors in buildscripts_test. ---- As a server engineer, I want to catch errors in burn_in_tests before I commit, So that I can trust burn_in_tests is running correctly. ---- AC: * At least 1 end to end tests for burn_in_tests exists and is run as part of ""buildscripts_test"".",3 +"SERVER-43732","09/30/2019 20:56:00","burn_in_tests did not detect changes in core","I recently caused a failure due to a new test in the core suite not working in the sharded_collections_jscore_passthrough suite, even though I ran burn_in_tests in a patch build. I felt like burn_in_tests should have caught this.",2 +"SERVER-43866","10/07/2019 16:13:55","Remove parallel insert task from M60 like sys-perf variant","This should be a one-line removal of the above task.",0 +"SERVER-43900","10/08/2019 22:09:01","Set max_hosts to 1 for stitch_support_lib_build_and_test and embedded_sdk_build_and_test task groups","In BF-11716, the second task in the stitch_support_lib_build_and_test task group (stitch_support_run_tests) is failing because it gets run on a different build variant (and at the same time as, rather than after) it's dependency task (stitch_support_install_tests). We should set max_hosts to 1 on the stitch_support_lib_build_and_test task group so these tasks get run in consecutive order on the same host. Similarly, in BF-14342, embedded_sdk_install_dev is getting run before it's dependency task (embedded_sdk_build_cdriver). These tasks are run as part of the embedded_sdk_build_and_test task group. User story: As a server engineer, When I run the stitch_support_lib_build_and_test and embedded_sdk_build_and_test task groups, I should know that each task group's tasks will get run in consecutive order on the same host, so that I do not have failures in my build. AC: * Tasks within the stitch_support_lib_build_and_test task and the embedded_sdk_build_and_test task groups should run in consecutive order BF: https://jira.mongodb.org/browse/BF-11716 ",2 +"SERVER-43956","10/11/2019 14:11:09","Fix burn_in_tests file path on Windows","The file paths of generated tasks were recently changed to always use the ""/"" separator, as generate.task calls always run in bash, even on Windows. This was achieved by auditing and replacing usages of os.path.join. burn_in_tests.py also uses {{os.path.normpath}}, which needs to be supplemented to output the unix path as well.",1 +"SERVER-44009","10/15/2019 10:08:55","Upload pip freeze output for sys-perf and microbenchmarks","In the dsi supported, _run-dsi_ sets up a python virtualenv for python scripts to run in. As part of that, we do a `pip install` for the requirements. We should pip this output to a file and upload it, to provide traceability about which python packages have been installed if anything needs to be investigated. ---- As a Server engineer, I want pip to list / persist the requirements, So that it is easier to find what I'm looking for in the logs. ---- AC: * The packages and version installed by pip are still available if needed.",1 +"SERVER-44017","10/15/2019 15:41:08","Hang Analyzer Unzips Debug Symbols","Modify hang_analyzer.py to automatically unzip debug symbols if necessary and if not already unzipped into cwd. Take the logic from [here|https://github.com/mongodb/mongo/blob/master/etc/evergreen.yml#L818-L840] and port it to python. Once this is ported to python, consider calling the python instead of the shell on these lines. This is necessary in cases where hang_analyzer.py is called from the shell as a part of failures from assert.soon and friends in SERVER-43254 etc. ",2 +"SERVER-44070","10/17/2019 18:55:42","Platform Support: Add Community & Enterprise Ubuntu 20.04 x64","Platform Support: Add Community & Enterprise Ubuntu 20.04 x64",5 +"SERVER-44072","10/17/2019 18:56:53","Platform Support: Add Enterprise RHEL 8 PPC","Platform Support: Add Enterprise RHEL 8 PPC",5 +"SERVER-44140","10/21/2019 20:48:29","Use signal processing without DSI","As a DAG engineer, I would like signal processing to be run outside of DSI. AC: * performance and sys-perf projects in mongo repo use signal processing directly for detect-changes and detect-outliers",2 +"SERVER-44144","10/22/2019 14:52:17","Allow commit queue patches to publish to scons cache","The shared scons cache is only written to on non-patch builds. Since commit queue builds have a high likelihood of being merged into master, it would be valuable to have them write to the cache as well. In particular, the next item in the commit queue could reuse a lot of the artifacts. ---- As a server engineer, I want commit queue builds to write to the shared scons cache, So that future commit queue builds can reuse the artifacts. ---- AC: * Build done as part of the commit queue are able to write to the shared scons cache. ---- The logic for whether the commit queue is read-only or read/write can be found [here|https://github.com/mongodb/mongo/blob/563dc7451690efa475db5feda913098e777471da/buildscripts/generate_compile_expansions_shared_cache.py#L102-L109]. Additionally an expansion was added to tell if a given build is a commit queue build [here|https://jira.mongodb.org/browse/EVG-5877].",2 +"SERVER-44254","10/25/2019 21:04:08","Don't run package tests on 'Enterprise RHEL 7.0 (libunwind)' variant","We don't create packages for this build variant and therefore don't need to run package tests.",2 +"SERVER-44294","10/29/2019 14:08:17","Cap runtime of generated tasks","When an engineer tries to repro a test failure, they sometimes add a large {{resmoke_repeat_suites}} number to evergreen.yml. This causes generated tasks to compute a large Evergreen timeout and potentially leaving a host running for a long time. We should cap the runtime of generated tasks and either error out and inform the user of the max repeat number they can use, or interally reduce the repeat count to a smaller number. Almost always, if an issue fails to repro after 48 hours, it's unlikely for the repro to happen at all. This can indicate a bug with the way the repro is set up, or something wrong with the machine the original failure occurred on. AC: * Fails tasks that we expect to run over the specified time limit. * Provide a message to the user explaining why that task was failed and what they can do if they want to work around it.",2 +"SERVER-44312","10/30/2019 14:35:05","Specify evergreen auth in performance tests for signal processing","As a performance engineer I want signal processing commands to have proper evergreen auth, so that they can access data from the evergreen api. ---- AC: * detect_outliers can access evergreen API data. * detect_changes can access evergreen API data.",1 +"SERVER-44338","10/31/2019 17:32:19","Validate commit message as part of commit queue process","As part of the migration to commit queue, pre-commit git hooks are no longer run. One of the hooks that was run validated that the commit message conformed to certain rules. With EVG-6445, we should be able to create a task that runs as part of the commit queue to validate the commit message. ---- As a server engineer, I want a commit queue check to validate the commit message So that I don't know accidentally commit with a bad message. ---- AC: * A commit queue task is run that fails if the commit has an invalid message.",2 +"SERVER-44400","11/04/2019 20:14:28","evergreen_task_tags uses the wrong option for tasks","It uses tasks_for_tag_filter, but should use remove_tasks_for_tag_filter.",1 +"SERVER-44421","11/05/2019 14:53:35","Populate config values in burn_in_multiversion_gen","Before generating burn in multiversion tasks, we assert that the number of tests defined in {{etc/evergreen.yml}} that have the [MULTIVERSION_TAG|https://github.com/mongodb/mongo/blob/1bbcedbd0c744c6ad880cbde2f46eb711c5acf20/buildscripts/burn_in_tests.py#L76] equals the number of yaml suite files (living in {{buildscripts/resmokeconfig/suites}}) with the [BURN_IN_CONFIG_KEY|https://github.com/mongodb/mongo/blob/1bbcedbd0c744c6ad880cbde2f46eb711c5acf20/buildscripts/evergreen_gen_multiversion_tests.py#L43]. [get_named_suites_with_root_level_and_key|https://github.com/mongodb/mongo/blob/1bbcedbd0c744c6ad880cbde2f46eb711c5acf20/buildscripts/resmokelib/suitesconfig.py#L22] is a helper function that requires the config values to be populated before being called. We should make sure we have called {{buildscripts.resmokelib.parser.set_options()}} before we ever make any calls to this helper.",1 +"SERVER-44537","11/11/2019 02:53:28","Update multiversion platform for windows in 4.4","In SERVER-33049, we've renamed the windows platform. After cutting MongoDB 4.4, we should update the multiversion platform so the tests can still find it.",1 +"SERVER-44604","11/13/2019 16:18:26","Move benchmarks off of Enterprise RHEL 6.2","The benchmarks are taking up most of the available CBI machines and they don't provide much value in patch builds, we should move them to a different build variant.",1 +"SERVER-44632","11/14/2019 16:43:12","Platform Support: Remove Community zSeries from 4.2","Following MongoDB 4.4 GA, we will be removing community zseries support from MongoDB 4.2. * SSL RHEL 6.7 s390x * SSL RHEL 7.2 s390x * SSL SLES 12 s390x * SSL SLES 15 s390x * SSL Ubuntu 18.04 s390x",2 +"SERVER-44641","11/14/2019 20:55:44","Platform Support: Remove Enterprise RHEL 7 zSeries and SLES 12 zSeries from 3.6","Remove the following build variants from MongoDB 3.6: - Enterprise RHEL 7.2 s390x - Enterprise SLES 12 s390x  ",2 +"SERVER-44651","11/15/2019 14:36:25","Update signal processing version","Update performance and sys-perf to use latest version of signal processing 1.0.14",1 +"SERVER-44727","11/19/2019 15:16:45","detect-changes should not be called via run-dsi","The detect-changes script should not be called via run-dsi. It should be called via its own setup script.",1 +"SERVER-44790","11/22/2019 17:50:45","Should not run hang analyzer on shouldFail test in mixed_mode_repl_nossl","mixed_mode_repl_nossl expects the replsettest to fail. It does, due to an assert.soon, which runs the hang analyzer. It seems impractical to plumb the ""shouldRunHangAnalyzer"" all the way down to the particular assert.soon which fails. It might make more sense to put the default in TestData. ",0 +"SERVER-44831","11/25/2019 21:29:27","Create a fixture sigkill test case","This test case is intended to be executed before archiving begins in resmoke. It should be a subclass of FixtureTestCase and should send a SIGKILL terminating the fixture's processes.",3 +"SERVER-44832","11/25/2019 21:35:51","Modify HookTestArchival to reset fixtures","When archiving files, HookTestArchival should call a sig kill test case to terminate the fixture before archiving. After archiving, it should execute the FixtureSetupTestCase to restart the fixture. Failure on either of these steps should raise a StopExecution exception.",2 +"SERVER-44874","11/27/2019 18:59:20","Windows Mongo shell not being included by hang analyzer","For a particular BF I'm investigating, which has failed on all branches 3.6, 4.0, 4.2, and master, the hang analyzer finds interesting processes for python.exe, mongobridge.exe, and mongod.exe, but not mongo.exe. For this particular BF, we believe that the hanging process *is* actually the shell, so for this case it would have been very helpful to have the shell process stack traces. See linked BF for examples. Note that the first BF does actually have a mongo.exe in the Hang Analysis output, but we believe that it's incorrectly linked as a dup; the other BFG's are the ones to look at here.",0 +"SERVER-44991","12/06/2019 17:01:59","Performance regression in indexes with keys with common prefixes","Test creates 5 collections with 10 indexes each. Indexes are designed to have keys with substantial common prefixes; this seemed to be important to generate the issue. Collections are populated, then are sparsely updated in parallel, aiming for roughly 1 update per page, in order to generate dirty pages at a high rate with little application work. !compare.png|width=100%! Left is 4.0.13, right is 4.2.1. A-B and C-D are the update portions of the test. The performance regression is seen in the timespan for the update portion, and in the average latency. The rate of pages written to disk and pages evicted is substantially lower in 4.2.1. In this test dirty fill ratio is pegged at 20%, so the rate at which pages can be written to disk is the bottleneck. The test is run on a 24-CPU machine, so the CPU utilization during both tests is roughly what would be expected with ~5 constantly active application threads, plus 3-4 constantly active eviction threads. But in spite of the same CPU activity for eviction, we are evicting pages at a much lower rate, so we must be using more CPU per page evicted in 4.2.1. Perf shows that this additional CPU activity is accounted for by __wt_row_leaf_key_work.",1 +"SERVER-45074","12/11/2019 22:14:18","Commit queue commit message validation should double check the ticket ID (ticket key)","It is a real pain if you make a commit that has the wrong ticket number prefix - you end up needing to do a bunch of manual work to reconcile the various places things got logged incorrectly to paper over the mistake. Since we are adding validation for commit messages in other ways (formatting, etc.) we should see if it would be possible to also validate against JIRA that the ticket you are nominally committing to makes sense: is open, is assigned to you, etc. ",2 +"SERVER-45113","12/12/2019 21:16:34","Dump core on test failure","When a test fails, resmoke should call hang_analyzer.py to create core dumps for currently-running mongod processes. Modify hang_analyzer.py to have an option to only dump, without the rest of the analysis.",2 +"SERVER-45128","12/13/2019 16:56:06","Reset batchtime for ""~ Linux DEBUG WiredTiger develop"" build variant to default","The batchtime was increased to 7 days under SERVER-45127 to temporarily disable the build variant until the storage engines team has time to investigate the recent redness.",1 +"SERVER-45544","12/13/2019 22:49:32","burn_in_tests for certain tests can time out regardless of what changed","I recently worked on a ticket that modified the create_index_background_unique_collmod.js workload. I noticed that, even doing a patch build with a variable name change on the test would cause burn_in_tests to time out, possibly due to this test not interfacing well with the way burn_in_tests is run. ",3 +"SERVER-45313","12/27/2019 18:22:56","validate commit message doesn't escape commit messages properly","There was an issue with validate_commit when a commit message had both double quotes ("") and parens in it. It caused a bash error when we tried to process [it|https://evergreen.mongodb.com/task/mongodb_mongo_master_commit_queue_validate_commit_message_patch_185facf0acf9c22e09893051a28040e8ee39292b_5e0641fbe3c33123080b2a3c_19_12_27_17_41_04##%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%2522185facf0acf9c22e09893051a28040e8ee39292b%2522%257D%255D%257D]. We should ensure that commit messages are properly escaped when passed to the script. ---- As a Server engineer, I want validate_commit to properly handle characters that need to be escaped, so that my commit message can properly be validated. ---- AC: * validate commit is able to properly handle commit message with characters like: "", (, ), ', etc.",1 +"SERVER-45320","12/30/2019 22:38:04","Remove evergreen client from buildscripts","All uses of the [local evergreen client|https://github.com/mongodb/mongo/blob/master/buildscripts/client/evergreen.py] in buildscripts have been removed from the mongo repository (except a metrics script that is not currently being used) in favor of [evergreen.py|https://github.com/evergreen-ci/evergreen.py]. We should remove the metrics script and client so that we no longer have to maintain them and no one accidentally starts using them. ---- As a server engineer, I want the local evergreen client code removed, so that I no longer have to maintain it. ---- AC: * `buildscripts/client/evergreen.py` is removed. * all tests and dependencies on `buildscripts/client/evergreen.py` are removed.",1 +"SERVER-45355","01/03/2020 18:55:36","Send SIGABRT for failed tests that use Jasper","SERVER-45342 modifies process.py in resmoke to allow SIGABRT to be sent to fixtures when a test fails. We also need to allow jasper processes to do the same thing so core dumps are generated The main change is to allow stop() in jasper_process.py to stop the process with sigabrt, similar to how process.py does it [here|https://github.com/mongodb/mongo/blob/5a14578a131325525fc92cbb1ee315ebb35add8d/buildscripts/resmokelib/core/process.py#L196]",2 +"SERVER-45377","01/06/2020 20:16:07","Add methods to globally disable and re-enable hang analyzer in tests","While the passing the '{{runHangAnalyzer=false}}' argument to {{assert.soon}} works well in mosts tests, there are a few replica set tests that expect throws of {{assert.soon}} calls located several functions deep into the ReplSetTest fixture. In such cases, it would be cleaner if we had global enable/disable methods for the hang analyzer and let tests wrap an enable/disable pair around the places where they expect {{assert.soon}} to fail, to be used as such: {noformat}... hangAnalyzer.disable(); // This ultimately triggers an assert.soon failure. timeoutExpectedByTest(); hangAnalyzer.enable(); ... {noformat}",1 +"SERVER-45766","01/09/2020 17:09:32","Remove ""requires"" from the server yaml","As a EVG engineer, I want the server evergreen.yml to remove uses of 'requires' So that I no longer have to support 'requires' functionality. ---- AC: * All uses of requires in the server evergreen.yml have been removed. ---- While discussing a request around creating dependencies dynamically, it became apparent that requires is poorly understood and rarely used. From grepping all static configs, it looks like only the server uses it. From discussing this with [~david.bradford], it sounds like as a result of moving towards more task generation and uses, the current uses of requires are no longer important. The original implementation was motivated (EVG-720 by a cleanup requirement that no longer exists and is better solved in other ways in modern Evergreen. We should therefore remove it from the server config so that Evergreen can remove it from its code base.",1 +"SERVER-45491","01/10/2020 18:32:20","Add resmoke.py option to save mongod.log and mongos.log files"," [^SERVER-45491.patch] Copied from TIG-2235 to be a SERVER ticket. When resmoke.py runs replica set tests, the mongod servers log to stdout by default. This can be overridden only by modifying the Javascript test code to pass useLogFiles to ReplSetTest. It would be useful to me have a way, without modifying the Javascript test code, to make the servers log to files for post-test analysis. I propose a resmoke.py command-line option that enables logging to disk and prevents post-test cleanup of the logfiles.",2 +"SERVER-45644","01/17/2020 20:02:22","Reevaluate timeouts used by burn_in_test","In burn_in_tests, we will [dynamically set timeouts|https://github.com/mongodb/mongo/blob/b758eb90dd982460af62fbb61737f935dae9b828/buildscripts/burn_in_tests.py#L462-L519] based on the expected runtime of the test being run. There have, however, been [some issues|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_display_burn_in_tests_patch_e3dd9e80e38f3528bc50c3e1115c46a0687885fa_5e1602857742ae2ce7683e71_20_01_08_16_26_12##%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%2522e3dd9e80e38f3528bc50c3e1115c46a0687885fa%2522%257D%255D%257D] with the timeouts being used. We should investigate if we can improve the timeout calculations. ---- As a Server engineer, I don't want my burn_in_tests runs to timeout if there are not issues, So that I do not have to spend time investigating non-issues. ---- AC: * burn_in_timeouts are adjusted to generate less false-timeouts.",3 +"SERVER-45680","01/21/2020 21:35:04","Burn_in_tests should pick up changed files in mongo-enterprise-modules files","Currently, burn_in_tests picks up changed files in the mongodb/mongo repo. It should do the same for the 10gen/mongo-enterprise-modules repo. User story: As a MongoDB engineer, I should be able to run burn_in_tests patch builds that run the jstests I've changed in the 10gen/mongo-enterprise-modules repo, ensuring that my jstests changes do not cause them to fail. AC: * Burn_in_tests picks up changes to jstests in 10gen/mongo-enterprise-modules repo",2 +"SERVER-45713","01/22/2020 22:07:36","Run rhel7 push and publish tasks on large rhel70 distro","RHEL7 repos are too big for rhel70-small distro. Push and publish_reo tasks need more than 80G of free space now.",2 +"SERVER-45715","01/22/2020 23:52:45","Fix spelling mistake in warning around failure to get storage stats","https://github.com/mongodb/mongo/commit/3f469e451c7ff6a46d908197da87d018dce27bdf Introduced a warning message with a spelling mistake: ""aquire"" -> ""acquire"". This ticket should fix that. ",1 +"SERVER-45730","01/23/2020 19:35:27","update commit queue message validation","We should change the commit message validation to: * raise an error when a ticket reference is omitted. Currently this is a warning. * handle revert messages where the author can be different to the Jira assignee. Currently this only generates a warning but it should be changed to be handled the similar as a backport.",2 +"SERVER-45748","01/24/2020 14:20:16","burn_in_tags_bypass_compile is not looking at the correct task","The burn_in_tags_bypass_compile scripts is not looking at the correct task to download the compiled artifacts from. As a result, all compile tasks generated from burn_in_tags are recompiling the server instead of just downloading already compiled files. It looks like this has been happening since August.",1 +"SERVER-45764","01/24/2020 18:48:38","Generate resmoke tasks need to take setup time into account when setting timeouts","When we generate timeouts in generate_resmoke_suites, we calculate the timeout to use based on the historic runtime of that tests being run. However, we fail to include any of the time spent running setup, including the time required to download the artifacts. We do include a buffer in the calculated timeouts, so we don't hit this too frequently. But that buffer is a lot smaller than we thought and any extra time spent setting up could cause the tests to hit timeouts even if the tests run at normal times.",1 +"SERVER-45832","01/28/2020 22:20:01","Generate selected tasks in a patch using task mappings","As a mongoDB engineer, I should be able to run the selected_tests_gen task, and know that it will run all tasks for a given set of related task mappings returned by selected-tests service, So that I know all tasks that are affected by my code changes will be run ------------------------------ AC: * When I run the selected_tests_gen task in my patch build, it runs all tasks related to my file changes (using the task mappings model) * When I run buildscripts/selected_tests.py locally, it logs which tasks and steps are executed so that other engineers can debug any issues encountered. * When an execution task(s) and its parent display task are returned by the task mappings endpoint for a given source file, the selected_tests_gen task should know to run only the _gen task associated with that parent display task, not the execution task(s)",2 +"SERVER-45949","02/04/2020 17:30:11","Update validate commit message client to work with new patch description standard","The current patch description format is the commit message. The new patch description will be the following format: Commit Queue Merge: '' into '/:' Multiple modules are supported as follows: Commit Queue Merge: ' <- ' into 'owner/main_repo:master' This description comes from the evergreen client executable and since we can't force a client upgrade, the validate client must support both formats.  ",2 +"SERVER-45958","02/04/2020 20:27:59","End to end tests for selected_tests","User story: As a server engineer, I want to know that any changes made to code related to buildscripts/selected_tests_py will be tested via and end to end test of buildscripts/selected_tests_py, so that I can be sure that the script and its dependencies are functioning correctly. AC: * At least 1 end to end tests for buildscripts/selected_tests_py exists and is run as part of ""buildscripts_test"".",2 +"SERVER-46029","02/07/2020 14:28:57","do not write core files in the hang analyzer when running locally (sans Evergreen)","Currently, the hang analyzer can run for local testing if an assert.soon times out. This can write large core files into the current directory, silently, which can consume a lot of disk space. I think we should disable the writing of core files unless running under Evergreen.",0 +"SERVER-46125","02/13/2020 13:55:17","system_perf.yml and perf.yml cleanups","Following from SERVER-46082 etc/perf.yml and etc/system_perf.yml can now be cleaned up a bit. * source dsienv.sh can be removed * setup-dsi-env.sh can be removed * some of the shell.exec blocks can be merged, at least for perf.yml * perf.yml has diverged between master and 4.2, for example mongod.log ends up in different directory. This should be reconciled to stable branches. * In Microbenchmarks there are two mongod.log checks that are always green because they check files from dsi unittest-files. Pending future work on dsi libanalysis code, suggested solution is to just `rm -r dsi/bin/tests` * Ryan: Prefer that also perf.yml uses run-dsi. * I added a couple `set -o verbose` when troubleshooting bin/analysis.py. This could be removed I think. * I think it's possibly related to perf.yml that analysis.py / mongod.log check isn't picking up the test start and end times from perf.json. As a workaround, I disabled the election related checks in analysis.common.yml. * I bet boostrap.production isn't actually needed in perf.yml? * Addition from team discussion: ** Consolidate setupcluster into one DSI command  ** Read from expansions.yml instead of what sysperf currently does to write runtimesecret.yml, etc. For system_perf.yml, consider also renaming the yaml file expansions to match what is used in dsi: * cluster -> infrastructure_provisioning * setup -> mongodb_setup * test -> test_control",5 +"SERVER-46141","02/13/2020 21:02:55","Testing burn_in_tests requires multiversion installation","When I run the tests for burn_in_tests (`python -m unittest buildscripts\/tests\/test_burn_in_tests.py`), I get the following error that mongo-4.2 is not installed: {code:java} ====================================================================== ERROR: test_one_task_one_test (buildscripts.tests.test_burn_in_tests.TestCreateMultiversionGenerateTasksConfig) ---------------------------------------------------------------------- Traceback (most recent call last): File ""/Users/lydia.stepanek/src/mongo/buildscripts/tests/test_burn_in_tests.py"", line 634, in test_one_task_one_test evg_config, tests_by_task, evg_api, gen_config) File ""/Users/lydia.stepanek/src/mongo/buildscripts/burn_in_tests.py"", line 665, in create_multiversion_generate_tasks_config TASK_PATH_SUFFIX) File ""/Users/lydia.stepanek/src/mongo/buildscripts/evergreen_gen_multiversion_tests.py"", line 142, in get_exclude_files last_stable_commit_hash = get_backports_required_last_stable_hash(task_path_suffix) File ""/Users/lydia.stepanek/src/mongo/buildscripts/evergreen_gen_multiversion_tests.py"", line 96, in get_backports_required_last_stable_hash shell_version = check_output([last_stable_shell_exec, ""--version""]).decode('utf-8') File ""/Users/lydia.stepanek/.pyenv/versions/3.7.0/lib/python3.7/subprocess.py"", line 376, in check_output **kwargs).stdout File ""/Users/lydia.stepanek/.pyenv/versions/3.7.0/lib/python3.7/subprocess.py"", line 453, in run with Popen(*popenargs, **kwargs) as process: File ""/Users/lydia.stepanek/.pyenv/versions/3.7.0/lib/python3.7/subprocess.py"", line 756, in __init__ restore_signals, start_new_session) File ""/Users/lydia.stepanek/.pyenv/versions/3.7.0/lib/python3.7/subprocess.py"", line 1499, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '/data/multiversion/mongo-4.2': '/data/multiversion/mongo-4.2' ---------------------------------------------------------------------- Ran 77 tests in 52.378s FAILED (errors=4) {code} My understanding is that this test requires a developer to run `buildscripts\/evergreen_gen_multiversion_tests.py` locally in order to create the necessary multiversion mocks that the test needs to run. We should separate these tests into a separate file so that testing burn_in_tests does not also test multiversion setup, since they are different things. As a MongoDB engineer, I should be able to run burn_in_tests tests without needing any multiversion-related dependencies to be installed. AC: * The above error does not occur when running test_burn_in_tests.py ",3 +"SERVER-46146","02/13/2020 22:29:01","Reduce the number of BVs running the hang analyzer unittests","There's no need to run the shell hang analyzer unittest on more than a handful of build variants as we don't expect the behavior of Python to differ.",1 +"SERVER-46167","02/14/2020 15:58:56","Enumerate and remove Storage-related FCV 4.2-dependent code and tests","The following tasks need to be completed once we branch for 4.6: 1. Create a list of tickets with code and tests to remove, add them to the 4.6 Upgrade/Downgrade Epic, and mark them as ""is depended on by"" this ticket. This will assist the Upgrade/Downgrade team in tracking progress. If there is an insufficient amount of work to warrant multiple tickets, then the work can be done under this ticket directly. 2. Complete all necessary tickets promptly. 3. Create a ticket identifying Storage-related generic upgrade/downgrade references that the Upgrade/Downgrade team should update now that the 4.2-dependent references have been removed.",5 +"SERVER-46236","02/18/2020 20:42:43","Selected_tests_gen task should run tasks on all required variants, not just enterprise-rhel-62-64-bit ","Currently, the selected_tests_gen task [only runs tasks on the enterprise-rhel-62-64-bit variant|[https://github.com/mongodb/mongo/blob/74306a6fd07a7194567f77c930e0dc4e18098df3/etc/evergreen.yml#L1552|https://github.com/mongodb/mongo/blob/74306a6fd07a7194567f77c930e0dc4e18098df3/etc/evergreen.yml#L1552.]][.|https://github.com/mongodb/mongo/blob/74306a6fd07a7194567f77c930e0dc4e18098df3/etc/evergreen.yml#L1552.] As specified in the [Design doc|[https://docs.google.com/document/d/1azKrJr3babowhr6M8vs9cCTQOmLdUeyqmyVKp_ENSOw/edit#]], the selected_tests_gen task should run tasks from all required builders. As a MongoDB engineer, I should be confident that the selected_tests_gen task is running any affected tasks on all required builders, so that no required tasks are missed.   AC: * selected_tests_gen task can result in tasks being run on enterprise-rhel-62-64-bit, enterprise-windows-required, linux-64-debug, enterprise-ubuntu-dynamic-1604-clang, and ubuntu1804-debug-aubsan-lite.",2 +"SERVER-46267","02/19/2020 22:24:12","bypass compile on burn_in_tags is broken","The bypass compile for burn_in_tags is not working. You can see in the [this patch build|https://evergreen.mongodb.com/version/5e3d7a85c9ec4401bd159b1b]. There are a few things wrong. * The rhel-62 compile was also bypassed, so the burn_in_tags needs to use the mainline compile artifacts, but it is trying the use the rhel-62 artifacts, which do not exist. * The compile task no longer generates a ""shell"" file and bypass compile is attempting to copy that file. * The compile_TG now includes the package task by default, which add a 25 minute task to the burn_in build variants. We should fix all of these things. ---- As a Server engineer, I want bypass compile to work with burn_in_tags, So that I don't have to wait to recompile artifacts that have already been compiled. ---- AC: * burn_in_tags reused compiled artifacts even if the rhel 62 compile used bypass compile. * burn_in_tags does not run package in the generated build variants.",3 +"SERVER-46374","02/24/2020 20:12:04","Move noPassthrough test to run on large distro on rhel6.2 build variants","We have been seeing several cases where the noPassthrough suites are running out of memory on rhel 6.2. We can move them to large distros to avoid this.",1 +"SERVER-46437","02/26/2020 21:31:22","Create a baseline build variant to understand task splitting overhead","It would be nice to have a way to understand the overhead associated with splitting up tasks into subtasks. This would allow us to watch any runtime issues that might be hidden by splitting up a task (e.g. large test runtime increases being masking by more aggressive splits of the tests). It would also help us understand if and where there are opportunities for improvements in task splitting. One way to accomplish this would be to have a build variant that mimics a standard build variant, but without splitting the tasks. We wouldn't need to run the task frequently, once a week would likely be enough. ---- As a Dev Prod engineer, I want a build variant without task split to run, So that I can measure the overhead task splitting causes. ---- AC: * A way of measuring the overhead of task splitting exists.",2 +"SERVER-46439","02/26/2020 22:12:46","Add acceptance tests for burn_in_tags","Add end to end tests for buildscripts/burn_in_tags.py. As a server engineer, I want end to end tests for burn_in_tags so that I can makes changes to the scripts without worrying about breaking things. AC At least 1 test executes the main body of burn_in_tags.",2 +"SERVER-46643","03/05/2020 17:24:09","eslint running for 30+ minutes in Commit Queue for enterprise only changes","We are frequently seeing eslint take over 30 minutes in the commit queue. Looking at recent occurrences of this, it appears to happen for enterprise only changes. My guess is that since there are no changes to the mongo repository, it is linting the entire repository. ---- As a server engineer, I want enterprise only changes to the commit queue to not take over 30 minutes to lint, So that I don't have a long-running task blocking the commit queue. ---- AC: * Enterprise-only commit queue entries do not take over 30 minutes to process.",2 +"SERVER-46682","03/06/2020 19:24:02","Reuse debugger process for processes of same type in hang_analyzer.py","Reloading the symbols for every process is another bottleneck. To alleviate this, hang_analyzer.py will be modified to reuse the same debugger process and analyze all processes of the same type (ex. All mongod processes will be analyzed in the same debugger process). - Processes will be grouped by process type (Ex. all mongod processes) - A single process will be created that will: {code:java} run debugger load symbols for process in processes: attach process dump info {code} The debugger scripts are all hardcoded strings, the [script for GDB|https://github.com/mongodb/mongo/blob/c553f6acd0ce7768d25a2dcdfa9358aa22b5ee55/buildscripts/hang_analyzer.py#L363-L385] is especially ugly. GDB has an [API for python|https://sourceware.org/gdb/onlinedocs/gdb/Python-API.html], so if this change turns out to be non-trivial to hardcode as plaintext, we can consider rewriting it to use the python API. As part of this ticket, ensure the performance improves. ",3 +"SERVER-46684","03/06/2020 19:48:17","Repackage the hang-analyzer as a resmoke subcommand","- Move hang_analyzer.py to the resmoke directory - Rewire all usages of the hang-analyzer to be run through new command syntax (assert.soon, evergreen.yml 'run hang analyzer' - Ensure running the hang-analyzer locally works as it does now - Update documentation and do engineer outreach to ensure users are aware of the change - buildscripts/hang_analyzer.py will print the new command syntax - backport to 4.2 The exact syntax will need to be fleshed out, but the new command should at least be able to accept {{pids}} and {{process_types}} for backwards compatibility.",2 +"SERVER-46688","03/06/2020 20:06:37","Use TestData.inEvergreen to determine if data files should be archived","The [--archiveFile|https://github.com/mongodb/mongo/blob/c553f6acd0ce7768d25a2dcdfa9358aa22b5ee55/buildscripts/resmokelib/parser.py#L55-L60] flag needs to be specified for archival to be done. The intention was for local invocations of resmoke to not archive data files in s3. The same functionality can be achieved by checking {{TestData.inEvergreen}} without needing to set a command line argument.",1 +"SERVER-46691","03/06/2020 20:39:47","Rework the timeout task in evergreen.yml and ensure analysis & archival works","Once SERVER-46687 is completed, the timeout task in evergreen.yml needs to be modified to: - If running powercycle or jepsen, run the hang-analyzer as before (call {{resmoke hang-analyzer args directly}} - this just means keep [this section|https://github.com/mongodb/mongo/blob/c553f6acd0ce7768d25a2dcdfa9358aa22b5ee55/etc/evergreen.yml#L3526-L3577]. (Resmoke does not execute powercycle, so the mechanism outlined in this project will not work for it.) otherwise: call new script that will: - Send sigusr1/windows event to resmoke processes explictly - the code for this exists in hang_analyzer.py, it needs to be moved into it's own file that still lives in mongo/buildscripts. - Wait for the resmoke processes to exit. Since we already know the pids this should be easy.",3 +"SERVER-46732","03/09/2020 19:52:01","Cap number of tasks generated on non-required build variants","We recently hit an issue where the amount of evergreen project config generated by generate.tasks exceeded the maximum document size and started causing issues. It looks like this was caused by the number of tasks being generated as each task adds to the project config. It also looks like this would only happen if all the build variants in the version were run. We could reduce the chance of this happening by capping the number of sub-tasks we will generate for a task in non-required build variants. ---- As a server engineer, I want to limit the number of tasks dynamically generated in non-required builders, So that versions do not hit the maximum document size. ---- AC: * non-required builders set a cap on number of tasks to generate.",1 +"SERVER-46769","03/10/2020 18:42:39","Migrate from optparse to argparse","Python's optparse does not support subcommands, but argparse/click does. To enable doing that, we need to migrate. * Look into click and whether that would be a better option. * Define new resmoke syntax with argparse. There are differences between how optparse/argparse work, so the syntax will have to change. * Add infrastructure for resmoke to run subcommands, with emphasis on extensibility for the future * Attempt to keep old resmoke syntax (and eventually deprecate). This might not be possible. If not, look into having a legacy flag, or at least print nice error messages to make it easy to figure out the new syntax. * Update usages of resmoke in the system. * Update documentation and do engineer outreach to ensure users are aware of the change ",3 +"SERVER-46813","03/11/2020 23:13:53","Revert ""Temporarily reduce frequency of randomized testing""","We implemented a change to reduce frequency of randomized testing, to reduce the number of BFGs being created, so this is to revert that change.",1 +"SERVER-46820","03/12/2020 14:53:33","Kill hung processes as the last step in resmoke's signal handler","After the signal handler has finished running the hang-analyzer, it will kill the hung processes, similarly to what archival does now. This is necessary to ensure resmoke can make forward progress and shut itself down so that the evergreen agent does not timeout while waiting for resmoke processes to exit. Note that archival will still need to be able to shut down hung processes in the case that the test fails normally without timing out.",2 +"SERVER-46827","03/12/2020 16:31:26","E2E tests","Before working on actual features, add the E2E tests outlined in the test plan and ensure they fail: Create resmoke unit tests that will test these scenarios: - Run a single test using a resmoke fixture (simulating a test timeout) - Run multiple tests using a resmoke fixture (simulating a task timeout) - Run a test using mongorunner to spin up mongods (simulating a non-fixture test) For each test, the script that sends a signal to resmoke will be called. The same script waits for those processes to have exited. Once they have, we will inspect that the analysis and archival has been done for all cases above. This should test everything except for evergreen calling it’s timeout task. When beginning to add project features, ensure they pass these tests.",3 +"SERVER-46842","03/13/2020 00:07:39","resmoke.py shouldn't run data consistency checks in stepdown suites if a process has crashed","resmoke.py ordinarily checks that a test didn't cause the server to crash [by calling {{self.fixture.is\_running()}}|https://github.com/mongodb/mongo/blob/d09c84a0856060c38e58d971599966af8719a454/buildscripts/resmokelib/testing/job.py#L180] after the test finishes. However, due to the stepdown thread and the job thread only being synchronized by calling {{ContinuousStepdown.after_test()}}, [it isn't safe to check whether the fixture is still running|https://github.com/mongodb/mongo/blob/d09c84a0856060c38e58d971599966af8719a454/buildscripts/resmokelib/testing/job.py#L32-L37] immediately after the test finishes. {code:python} # Don't check fixture.is_running() when using the ContinuousStepdown hook, which kills # and restarts the primary. Even if the fixture is still running as expected, there is a # race where fixture.is_running() could fail if called after the primary was killed but # before it was restarted. self._check_if_fixture_running = not any( isinstance(hook, stepdown.ContinuousStepdown) for hook in self.hooks) {code} Skipping this check causes resmoke.py to continue to run the other data consistency checks, even when a process in the MongoDB cluster has crashed. While misleading for Server engineers in terms of causing them to click on the ""wrong"" link in Evergreen for the task failure, it also have a severe negative impact on our automated log extraction tool by preventing it from finding relevant information. We should ensure process crashes in test suites using the {{ContinuousStepdown}} hook prevent other tests and hooks from running. I suspect having [{{_StepdownThread.pause()}}|https://github.com/mongodb/mongo/blob/d09c84a0856060c38e58d971599966af8719a454/buildscripts/resmokelib/testing/hooks/stepdown.py#L427-L436] check that fixture is still running as the last thing it does would accomplish this.",1 +"SERVER-46851","03/13/2020 16:53:42","Decrease the number of jobs in logical session cache tests","The logical session cache tests are frequently running out of memory due to the host size they are running. Reducing the number of jobs should help them use less memory.",1 +"SERVER-46867","03/13/2020 21:10:23","Ensure that a db directory is created even when alwaysUseLogFiles is enabled","When alwaysUseLogFiles is enabled, the noCleanData option is set to make sure that previously-logged data is not deleted. This will prevent resetDbpath() from being called. However, resetDbpath() is what is used to create the db path in the first place. We need to create the directory if it doesn't exist, while allowing existing paths to not be cleaned.",2 +"SERVER-46887","03/16/2020 14:18:59","Use threshold of 0 in selected_tests_gen","As part of https://jira.mongodb.org/browse/TIG-2412, we determined that the best threshold to use is 0.  As a server engineer, I should know that the threshold used to determine which tasks to run for selected-tests is 0, since that is the threshold that most accurately captures which tasks should be run based on my code changes. AC: * Threshold is set to 0 in selected_tests_gen here: [https://github.com/mongodb/mongo/blob/36ae8a4824a88bd49ab5fa62740419c10c6bc39d/buildscripts/selected_tests.py#L56.]  ",0 +"SERVER-46891","03/16/2020 15:22:29","Selected_tests_gen is creating tasks that should be excluded","Selected_tests_gen should not generate non-jstest tasks. Currently, the logic it uses to exclude non-jstest tasks is only run on tasks generated by task_mappings (see [here|[https://github.com/mongodb/mongo/blob/36ae8a4824a88bd49ab5fa62740419c10c6bc39d/buildscripts/selected_tests.py#L192-L196]).] We should also run this logic on tasks generated by test_mappings. Currently, some tasks generated by test mappings are resulting in non-jstest tasks being generated (see example [here|[https://evergreen.mongodb.com/version/5e6d148f2fbabe4bcd3215b5]] which shows _concurrency* tasks being generated).   As a server engineer, I should know that selected_tests_gen only generates jstest tasks, to be in line with the scope of the Selected Tests project.   AC: * Changing the files that are changed in the version above should not result in concurrency_* tasks being generated",2 +"SERVER-46914","03/17/2020 14:50:13","burn_in_tests is looking at multiversion in non-multiversion case","The burn_in_test script is failing to generate task in the normal case (non-multiversion) because it is attempting to look up multiversion information even though multiversion has not been setup. See [here|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_burn_in_tests_gen_patch_8c1515929f34d41dbefbb9476e1dd893d523ad01_5e70dbc89ccd4e532fc74873_20_03_17_14_16_56##%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%25228c1515929f34d41dbefbb9476e1dd893d523ad01%2522%257D%255D%257D] for an example. If burn_in_tests is not running in multiversion mode, it should not look at multiversion configuration.",1 +"SERVER-46983","03/19/2020 13:29:27","Upload repobuilding packages to correct URL","Barque service needs the packages uploaded to S3 so it can run the repobuilding job.",0 +"SERVER-46996","03/19/2020 18:08:12","all push/publish_packages tasks should run on small hosts","With the migration to the new architecture for publishing linux packages, it's safe to move all package publication tasks (push/etc.) to using the smallest possible evergreen hosts (i.e. -smalls.) Additionally if any of these tasks aren't running on linux x86_64, they probably ought to be. ",1 +"SERVER-47004","03/19/2020 21:07:01","eslint is not properly linting enterprise modules","eslint is not be run properly in the commit-queue and allowing lint failures to be introduced. See [here|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_lint_eslint_fadc3d1cd88084567e24559f75b216158186bde8_20_03_17_17_17_42] for an example.",2 +"SERVER-47054","03/23/2020 17:25:36","Don't fail due to long timeouts on non-patch builds","When generating sub-tasks, if we appear to be setting timeouts greater than a certain threshold, we will fail the task generation. This is to inform the patch builder that they are trying to run a patch build that will take a really long time. We require them to manually disable the check if that is something they are sure they want to do. On mainline builds, however, we should skip this check since there is no one to go manually override it and some of the repeated-execution tests are bumping up against it.",1 +"SERVER-47165","03/27/2020 21:01:58","Missing the mongohouse binary for a server patch with no code changes","{color:#1d1c1d}Missing the mongohouse binary for a server patch with no code changes{color}   {color:#1d1c1d}https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_mqlrun_patch_c5eea7753b2fe3082d853ff9400117c85ac42dab_5e7e553f7742ae355b925437_20_03_27_19_34_40{color}",2 +"SERVER-47312","04/02/2020 23:31:29","Run hang_analyzer.py via assert.soon() without calling gcore on ASan builders","The changes from [e89c041|https://github.com/mongodb/mongo/commit/e89c041616cbaea0648bb60ce32ddab1f33d3e97] as part of SERVER-45884 disabled running the hang analyzer on ASan build variant (i.e. builders using {{--sanitize=address}}) via {{assert.soon()}} due to {{gcore}} not respecting the {{madvise()}} settings on the 20TB of shadow memory. This had come up previously in SERVER-29886 for the ""timeout"" phase in the {{etc/evergreen.yml}} project configuration and was resolved by running hang_analyzer.py without the {{-c}} option in order to avoid producing core dumps. We could similarly have the mongo shell omit the {{-c}} option when running hang_analyzer.py. On a related note - the mongo shell offers an {{_isAddressSanitizerActive()}} function which returns true if it was compiled with {{--sanitize=address}} (we generally assume the server binaries have the same build flags), so we should consider removing {{TestData.isAsanBuild}} to avoid there being two ways of expressing the same thing. Add a test to check that the hang_analyzer still runs on ASan without dumping core.",3 +"SERVER-47409","04/08/2020 17:05:12","writeconcern>1 gives no error on standalone server","When i query ""db.collection.insertOne(\{name:""xyz""},\{w:2})"" gives error on standalone server which is correct but ""db.collection.insert(\{name:""xyz""},\{w:2})"" does not.   I got it on mongo enterprise 4.2.",3 +"SERVER-47592","04/13/2020 15:22:39","Tasks missing buildvariants in version","This screen shot is from the ""patches"" page. I should only see results for five variants, but it looks like there are two additional variants shown with no titles. ",1 +"SERVER-47537","04/14/2020 17:06:13","Adjust frequency of less common build variants on 4.4","The following build variants are run on most commits on the 4.4 branch, but don't really need to be run that frequently. For comparison, they are being run once a day or less on master. * ~ Enterprise RHEL 7.0 (no-libunwind) * ~ Enterprise RHEL 7.0 (Dagger) * hot_backups RHEL 7.0",1 +"SERVER-47547","04/15/2020 02:04:52","benchmarks*.yml test suites need to be updated after switch to hygienic","The changes from [a83ee33|https://github.com/mongodb/mongo/commit/a83ee33c56dcfc8cdcfa0dd0c458f6fef89a3113] to exclude the {{hash_table_bm}} microbenchmark from Evergreen have effectively been undone by the file now being run as {{build/install/bin/hash_table_bm}}. This has been causing [the {{benchmarks_orphaned}} task to time out in Evergreen since mid-February|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_benchmarks_benchmarks_orphaned_876b3af1091b299884869c34a41f7f37d4dcc0bb_20_02_14_12_24_50/0]. This has also been causing the {{chunk_manager_refresh_bm}} microbenchmark is now being run as part of the {{benchmarks_orphaned}} task rather than the {{benchmarks_sharding}} task.",3 +"SERVER-47589","04/16/2020 14:29:25","Integrate live-record with resmoke.py","Add an option to resmoke.py to run with live-record",2 +"SERVER-47590","04/16/2020 14:31:41","Download and install udb ","Allow udb to be downloaded and installed before live-record runs. Need to figure out a way to keep the download URL private.",3 +"SERVER-47591","04/16/2020 14:32:53","Add build variant that runs some tests with live-record","Run a smattering of light-weight tests with {{live-record}} in a new BV. Tests that start up more than 5 processes will be excluded.",3 +"SERVER-47611","04/16/2020 23:38:11","Re-work to_local_args function using argparse","The current optparse implementation uses a private method {{_get_all_options()}} that is has no argparse equivalent. ",2 +"SERVER-47796","04/27/2020 16:04:38","commit-queue lint-clang-format not run on enterprise changes","I have the same issue as SERVER-45586 I think. https://jira.mongodb.org/browse/EVG-7894 It looks to me like maybe the commit-queue CI might not include Enterprise module changes in the lint-clang-format task, the way that the master waterfall CI does. I submitted a change that got through with format errors in enterprise code, only to break the master waterfall and generate a BFG. https://jira.mongodb.org/browse/BFG-599619 My commit-queue CI run's lint-clang-format task doesn't look like it made a patch from the enterprise module change. https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_enterprise_rhel_62_64_bit_lint_clang_format_patch_085ffeb310e8fed49739cf8443fcb13ea795d867_5ea60ada306615619458e984_20_04_26_22_29_05/0?type=T#L1035 ",2 +"SERVER-47880","05/01/2020 15:36:43","Send SIGSTOP to all processes before attaching to any","We can prevent processes from getting unstuck when the hang analyzer attaches to them by sending SIGSTOP to all of them first. Commands that run in process threads should still work if we use these commands: {noformat}(gdb) handle SIGSTOP ignore (gdb) handle SIGSTOP noprint {noformat}",2 +"SERVER-47965","05/05/2020 18:52:47","Remove multiverison blacklisting from burn_in_tests.py","See SERVER-47136 for more context. As part of adding multiversion testing for repl and sharding, there was some blacklisting code that was added to burn_in_tests that was specific for these suites in order to blacklist tests that behaved differently between Mongo version (for example while a bug fix is waiting for a backport). However, burn_in_tests is probably not the correct place for this blacklisting logic to live. By putting the logic in burn_in_tests, we are effectively saying that the tests are unstable, not that the tests have inconsistencies between versions. It might make more sense to put the logic in resmoke, which burn_in_tests is already using to get the list of tests to run and is already doing some blacklisting.",3 +"SERVER-47989","05/06/2020 19:18:42","Ensure lint dependencies are specified","As part of SERVER-47796, we added some extra python dependencies to linting. We should make sure those are included in the lint requirements.",1 +"SERVER-47995","05/06/2020 21:41:44","Extraneous ""Unit Tests"" Link on Failed Tasks","Failed Evergreen non-unittest tasks have the ""Unit Tests"" link on them, some of which are tarballs that are in the GBs. E.g. [this one|https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_replica_sets_multiversion_patch_da4a8e0e85627d6febd8bce3fd87f221e0ff97c6_5eb2be6be3c3310f36bc5cf2_20_05_06_13_42_24##%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%2522da4a8e0e85627d6febd8bce3fd87f221e0ff97c6%2522%257D%255D%257D] is 1.8GB. The unzipped file showed binaries and debug symbols for mongo/d/s that ""gather failed unittests"" is picking. We should have the function not run for non-unittest tasks. This requires a change in evergreen.yaml.  ",2 +"SERVER-48017","05/07/2020 18:47:30","Don't pass deleted files to lint","clang_format is hanging in patch builds on he'll 6.2. It looks like it appears to be because it is trying to run against a file that has been removed.",1 +"SERVER-48090","05/11/2020 17:43:01","Support python 3.6 for evergreen.py and shrub.py","There was a request via MAKE-1317 to support python 3.6 with our test generation tools. This ticket is to make the updates to server codebase for that support.",1 +"SERVER-48105","05/11/2020 21:12:43","Selected tests is trying to access data from a NoneType","I saw this in a patch build using selected tests. It looks like we are trying to use some unavailable data. {noformat} [2020/05/06 21:13:03.679] File ""buildscripts/selected_tests.py"", line 498, in [2020/05/06 21:13:03.679] main() # pylint: disable=no-value-for-parameter [2020/05/06 21:13:03.679] File ""/data/mci/fbc5e0e7b726bb8f530a138052af17b6/venv/lib/python3.7/site-packages/click/core.py"", line 829, in __call__ [2020/05/06 21:13:03.679] return self.main(*args, **kwargs) [2020/05/06 21:13:03.679] File ""/data/mci/fbc5e0e7b726bb8f530a138052af17b6/venv/lib/python3.7/site-packages/click/core.py"", line 782, in main [2020/05/06 21:13:03.679] rv = self.invoke(ctx) [2020/05/06 21:13:03.679] File ""/data/mci/fbc5e0e7b726bb8f530a138052af17b6/venv/lib/python3.7/site-packages/click/core.py"", line 1066, in invoke [2020/05/06 21:13:03.679] return ctx.invoke(self.callback, **ctx.params) [2020/05/06 21:13:03.679] File ""/data/mci/fbc5e0e7b726bb8f530a138052af17b6/venv/lib/python3.7/site-packages/click/core.py"", line 610, in invoke [2020/05/06 21:13:03.679] return callback(*args, **kwargs) [2020/05/06 21:13:03.679] File ""buildscripts/selected_tests.py"", line 493, in main [2020/05/06 21:13:03.679] task_expansions, repos, origin_build_variants) [2020/05/06 21:13:03.679] File ""buildscripts/selected_tests.py"", line 423, in run [2020/05/06 21:13:03.679] changed_files) [2020/05/06 21:13:03.679] File ""buildscripts/selected_tests.py"", line 383, in _get_task_configs [2020/05/06 21:13:03.679] selected_tests_variant_expansions, related_tasks, build_variant_config) [2020/05/06 21:13:03.679] File ""buildscripts/selected_tests.py"", line 327, in _get_task_configs_for_task_mappings [2020/05/06 21:13:03.679] build_variant_config) [2020/05/06 21:13:03.679] File ""buildscripts/selected_tests.py"", line 246, in _get_evg_task_config [2020/05/06 21:13:03.679] task_vars = task.run_tests_command[""vars""] [2020/05/06 21:13:03.679] TypeError: 'NoneType' object is not subscriptable [2020/05/06 21:13:03.747] Command failed: command encountered problem: error waiting on process 'd205ecea-4c74-4768-990d-07df3c320cc4': exit status 1 [2020/05/06 21:13:03.747] Task completed - FAILURE. [2020/05/06 21:13:03.750] Running post-task commands. {noformat}",1 +"SERVER-48109","05/11/2020 22:01:58","Skip Known-Broken Python Resmoke Tests","Mark known-broken tests as ignored (in the unittest annotations) rather than via resmoke suite configs. Confirm with original author(s) of the exclude lines.",1 +"SERVER-48112","05/11/2020 22:04:01","Use Absolute Imports in Resmoke","Use absolute imports. E.g., \{{from buildscripts.resmokelib}} instead of \{{from .}}.",2 +"SERVER-48132","05/12/2020 14:24:24","Selected tests is missing the majority of fuzzer tasks","The Selected Tests alias is using the regex "".\*\_fuzz.\*"" to detect fuzzer tasks. However, most of the fuzzer tasks are named something like ""jstestfuzz_*"", so none of those are being picked up by the fuzzer. We need to update that regex or come up with a better way of selected tasks. ---- As a server engineer, I want selected tests to pull in all fuzzer tasks, So that I don't have to run them manually. ---- AC: * All js fuzzer tasks are run in a selected tests patch build.",2 +"SERVER-48145","05/12/2020 18:26:28","Extract resmoke logging configurations","ExecutorRootLogger, FixtureRootLogger, and TestsRootLogger mostly exist for configuration purposes, but they create a confusing secondary logger hierarchy. They should be removed in favor of passing configuration information itself to loggers. This configuration may need to be partially global with its own inheritance relationships to avoid duplicating logic, as long as it's cleanly separated from the loggers themselves.",3 +"SERVER-48150","05/12/2020 18:47:19","Streamline resmoke loggers","We currently [construct|https://github.com/mongodb/mongo/blob/e2602ad053b2120982fbcac8e33e1ad64e6ec30a/buildscripts/resmokelib/logging/loggers.py#L149] JobLoggers from ExecutorRootLogger instances, creating a weird situation where they use some information from ExecutorRootLogger and some information from FixtureRootLogger. After changing how logging configuration is managed, we should be able to construct JobLoggers directly in Job objects.",2 +"SERVER-48155","05/12/2020 19:44:35","Remove TestQueueLogger","The [TestQueueLogger|https://github.com/mongodb/mongo/blob/e2602ad053b2120982fbcac8e33e1ad64e6ec30a/buildscripts/resmokelib/logging/loggers.py#L321] is never logged to directly; it's only used for providing test type information and endpoints to TestLoggers. Now that we're storing logging configuration information globally, we can have each logger determine its own endpoint. We should remove TestQueueLogger completely and allow TestLogger and HookLogger to be constructed directly and log to the configured endpoints.",2 +"SERVER-48158","05/12/2020 19:58:09","Add Resmoke testing for Jasper's logging endpoint","As part of a test-driven-development approach we're taking for the Jasper resmoke integration project, we will first add unittests in resmoke for the following scenarios 1. resmoke logging to a ""parent"" end point directly, with one resmoke and one jasper process logging to a child endpoint 2. Same as above but with two parent endpoints and one child logging endpoint for each parent. We'd like to assert that the output log exhibits properties of the logging hierarchy. The log itself can go to a file or an in-memory buffer.",2 +"SERVER-48163","05/12/2020 21:35:23","Fix jstestfuzz_*_multiversion generation","A recent refactor to who tasks are generated cause an issue with generating jstestfuzz_*_multiversion tests. It looks like we are trying to generate extra tasks without the needed configuration that causes the generate task to fail.",1 +"SERVER-48287","05/19/2020 03:56:15","Don't run FuzzerRestoreClusterSettings on suites with FCV 4.2","Same as SERVER-47716, but for jstestfuzz_sharded_multiversion",1 +"SERVER-50282","05/19/2020 20:29:00","Provide a debugging setup script for spawnhosts that load artifacts with coredumps","Overlong filenames truncating important properties is actually a bug. This ticket has been repurposed to provide a script that unpackages files necessary for inspecting a coredump on a spawnhost. It assumes the bug will be eventually fixed (and the bug only impacts a subset of cases). *Original Description* The only time I spawn a host with data files from a test failure is when there's an available core dump that I want to load in GDB. I have a script that programmatically unpackages everything into the appropriate directory. Whether or not server engineers use a script to set up their gdb usage, I believe spawning a host to investigate a core dump is a common use-case. Unfortunately when filenames are long, important properties [can be trimmed|https://github.com/evergreen-ci/evergreen/blob/86ebeb15ddc211f1390c5cc56af54a3712728e62/operations/fetch.go#L476] such as the keyword {{coredump}}\[1\]. What makes this difficult is that it not only breaks my script (acceptable, this sort of scripting isn't supported or built on some established agreement), but it also breaks my ability to do the corollary work by hand. Doing a {{tar \-tf }} AFAIK is a complete filescan. At that point it's faster to just download the coredumps by hand. This arguably defeats the purpose of spawning a host with artifacts loaded. I don't know what a feasible solution here is. There's probably a reason why filenames are long (for uniqueness? though IMO, unreadable). Some ideas: * Use shorter strings for {{evergreen fetch}} to generate, which preserve the contents of the archive (at the expense of labeling the variant/task id which AFAIK only becomes a problem if a user fetches artifacts for multiple tasks in the same directory). ** If this is backwords breaking for established use-cases, consider adding a flag to {{fetch}}, e.g: {{evergreen fetch -t --artifacts --shortnames}}. Let users spawning a host and loading data to opt-in to short filenames * Add environment variables containing absolute paths to interesting artifacts for users sshing into the instance. Scripts can hook into these without needing to rely on filename patterns. E.g: ** BIN_ARCHIVE for the archive containing mongod ** DBG_ARCHIVE for the archive containing mongod.debug ** COREDUMP_ARCHIVE for the archive containing all coredumps ** SRC_DIR for the mongodb repository path {{fetch --sources}}) \[1\] {noformat} [root@ip-10-122-8-102 me]# ll /data/mci/artifacts-patch-1419_linux-64-debug_* /data/mci/artifacts-patch-1419_linux-64-debug_compile: total 2503040 -rw-r--r-- 1 root root 136935 May 19 01:45 config-mongodb_mongo_v4.4_linux_64_debug_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14.log -rw-r--r-- 1 root root 2473743732 May 19 01:46 debugsymbols-mongodb_mongo_v4.4_linux_64_debug_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14.tgz -rw-r--r-- 1 root root 84980170 May 19 01:45 mongo-mongodb_mongo_v4.4_linux_64_debug_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14.tgz -rw-r--r-- 1 root root 3536789 May 19 01:45 mongodb_mongo_v4.4_linux_64_debug_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14.tgz -rw-r--r-- 1 root root 1097 May 19 01:45 pip-requirements-mongodb_mongo_v4.4_linux_64_debug_compile_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-0.txt -rw-r--r-- 1 root root 699562 May 19 01:45 scons-cache-mongodb_mongo_v4.4_linux_64_debug_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-0.log /data/mci/artifacts-patch-1419_linux-64-debug_jsCore: total 120 -rw-r--r-- 1 root root 80088 May 19 01:45 Running-Tests-from-Evergreen-Tasks-Locally -rw-r--r-- 1 root root 1446 May 19 01:45 mongo-diskstats-mongodb_mongo_v4.4_linux_64_debug_jsCore_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-0.tgz -rw-r--r-- 1 root root 29980 May 19 01:45 mongo-system-resource-info-mongodb_mongo_v4.4_linux_64_debug_jsCore_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-0.tgz -rw-r--r-- 1 root root 1097 May 19 01:45 pip-requirements-mongodb_mongo_v4.4_linux_64_debug_jsCore_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-0.txt /data/mci/artifacts-patch-1419_linux-64-debug_retryable_writes_jscore_stepdown_passthrough: total 1204712 -rw-r--r-- 1 root root 80088 May 19 01:45 Running-Tests-from-Evergreen-Tasks-Locally -rw-r--r-- 1 root root 1212594087 May 19 01:46 m.4_linux_64_debug_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-retryable_writes_jscore_stepdown_passthrough-0.tgz -rw-r--r-- 1 root root 10900 May 19 01:45 m.4_linux_64_debug_retryable_writes_jscore_stepdown_passthrough_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-0.tgz -rw-r--r-- 1 root root 20560324 May 19 01:45 m_(1).4_linux_64_debug_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-retryable_writes_jscore_stepdown_passthrough-0.tgz -rw-r--r-- 1 root root 262274 May 19 01:45 m_(1).4_linux_64_debug_retryable_writes_jscore_stepdown_passthrough_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-0.tgz -rw-r--r-- 1 root root 101019 May 19 01:45 m_(2).4_linux_64_debug_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-retryable_writes_jscore_stepdown_passthrough-0.tgz -rw-r--r-- 1 root root 1097 May 19 01:45 p.4_linux_64_debug_retryable_writes_jscore_stepdown_passthrough_patch_1d5d11155689d29bb7de42ccb5a5f4b3c7247469_5ebf0cd932f4170aad0ca35f_20_05_15_21_43_14-0.txt {noformat}",0 +"SERVER-48375","05/21/2020 20:18:20","Create jepsen ""smoke-test""","Add it as a bang (required) builder so we don't accidentally break jepsen in the future.",1 +"SERVER-48395","05/25/2020 15:48:19","Extended stalls during heavy insert workload","While working with the repro for WT-6175 I noticed that there were extended stalls during the insert phase. !stalls.png|width=100%! * The stalls seem to end with the start of the next checkpoint * With checkpoints disabled the stalls lasted as long as 10 minutes * During the stalls the log reports operations that took the entire duration of the stall to complete * They appear to have something to do with page splits. FTDC, logs, and repro script attached. The repro creates two collections of 5 GB each with a 5 GB cache, using 50 client threads on a machine with 24 cpus. ",5 +"SERVER-48590","06/04/2020 16:21:29","QOL improvements for hang-analyzer","1) Log a raw_stacks file for each process instead of one file for all processes (this was the behavior prior to SERVER-46682). The 'debugger_mongod.XXXXX.log' files that we used to get are now all in the 'debugger_mongod.log' file, but we can't split those out into per process logs because we only call gdb once. 2) Pass '-o=file' into the hang-analyzer usage in assert.soon to get the hang-analyzer output as an artifact in the task. ",1 +"SERVER-48703","06/10/2020 21:44:11","Dynamically split causally_consistent_hedged_reads_jscore_passthrough","It looks like the causally_consistent_hedged_reads_jscore_passthrough suites was recently added. It looks like this suite has a runtime of over 1.5 hours. We should convert this suite to a dynamically generated suite so that we can split it into sub-suites that can run in parallel. ---- As a Server engineer, I want the causally_consistent_hedged_reads_jscore_passthrough to be split into subsuites, So that it can be run in parallel and I can have lower makespans in patch builds. ---- AC: * causally_consistent_hedged_reads_jscore_passthrough is dynamically split into sub-suites.",1 +"SERVER-48705","06/10/2020 22:48:34","resmoke.py sending SIGABRT to take core dumps on fixture teardown may overwrite core files from hang analyzer","When archival is enabled for a test or test suite, resmoke.py sends a SIGABRT signal to its fixture processes to take a core dump of them (in addition to collecting the mongod data files). If a JavaScript test has already invoked the hang analyzer via an assert.soon(), then the core file generated from the hang analyzer will be overwritten. {noformat} [fsm_workload_test:agg_merge_when_matched_replace_with_new] 2020-05-26T08:34:20.234+0000 sh118695| Saved corefile dump_mongod.4235.core ... [ShardedClusterFixture:job0:shard0:secondary0] Attempting to send SIGABRT from resmoke to mongod on port 20002 with pid 4235... {noformat} Note that the core dump taken by resmoke.py sending a SIGABRT signal is unlikely to match the thread stacks in the hang analyzer output because running the hang analyzer is expected to perturb the state of the MongoDB cluster.",2 +"SERVER-48951","06/18/2020 15:34:44","Create a resmoke option that better manages output for reproducing runs locally","Resmoke will dump all output into either stdout or a log file. When people are running things locally, they often want: * To save all the detailed output into a log file for perusal * A way to know that the run may have hit an error condition * A way to know that the set of tests is making progress Many people use a series of mrlog, greps, tee and output redirection to achieve this, e.g: {noformat} resmoke run --suite sample_suite | mrlog | tee output_file.log | egrep ""invariant|fassert|BACKTRACE|failed to load|..."" {noformat} Additionally, many people that craft those for individual reproduction attempts, haven't yet realized that saving it as a shell alias/function/script would be useful in the future. The goal of this ticket is to provide a useful starting point for accomplishing the bullet points using the current keywords that MongoDB tests generate.",3 +"SERVER-48953","06/18/2020 15:38:41","Add an option for resmoke to accept a replay file that lists tests.","The patch now allows the following command line invocations: {noformat} resmoke run --replayFile foo.txt {noformat} and {noformat} resmoke run @foo.txt {noformat} If the contents of {{foo.txt}} are: {noformat} jstests/concurrency/fsm_workloads/indexed_insert_multikey.js jstests/concurrency/fsm_workloads/indexed_insert_2dsphere.js {noformat} the invocation is analogous to: {noformat} resmoke run jstests/concurrency/fsm_workloads/indexed_insert_multikey.js jstests/concurrency/fsm_workloads/indexed_insert_2dsphere.js {noformat} Thus other command line flags such as {{--suite}} still take effect. Also noteworthy, resmoke will run the contents of the replay file in the order they are listed. Repeated test files will be run once per repetition; prior behavior was to dedup test files. Original ticket: When running a set of tests with resmoke, one has the option of running them in alphabetical order or totally random. It would be nice if the random runs could at least use a deterministic order when provided a seed. The goal here is to also be compatible with using {{\-\-jobs=N}}. The jobs flag will partition the test suite into different runners. When providing a seed, the each runner should deterministically run the same tests in the same order.",2 +"SERVER-48960","06/18/2020 16:51:25","Drive powercycle setup commands with expansions.yml","Create a python wrapper around {{remote_operations.py}} that takes an Evergreen expansions.yml as input and calls {{remote_operations.py}} as we do now throughout evergreen.yml. Since the wrapper script is in Python, we can call [{{RemoteOperations}}|https://github.com/mongodb/mongo/blob/31a64f0cc546f325e1773091562f15264049c2d1/buildscripts/remote_operations.py#L344] directly instead of through a subprocess. Because {{remote_operations.py}} is invoked at different points in evergreen.yml, we can't combine all 20 invocations into a single call to the wrapper, instead we want to retain the existing logic for now and create a subcommand to the wrapper for each contiguous group of calls. The subcommand names can mirror the function names in evergreen.yml : - copy_ec2_monitor_files - setup_ec2_intance - tar_ec2_artifacts - copy_ec2_artifacts - gather_remote_event_logs - gather_remote_mongo_coredumps - copy_remote_mongo_coredumps It's likely some the calls can be further grouped together based on [their usage in evergreen.yml|https://github.com/mongodb/mongo/blob/31a64f0cc546f325e1773091562f15264049c2d1/etc/evergreen.yml#L3184-L3196] but we can overlook that as part of this ticket to ensure we don't accidentally change the behavior of powercycle. The goal of this ticket is to set {{$ssh_connection_options}}, {{$private_ip_address}} and all the other command line options for every invocation only once in the Python wrapper.",5 +"SERVER-49096","06/25/2020 15:28:33","Have replica set tests log a pid/port topology","The goal is to make it more convenient to find these mappings without having to look for the invocation of each process. This should also include ports used by {{mongobridge}}. Investigate whether this log can be output after resmoke rotates logs after each test.",5 +"SERVER-49097","06/25/2020 15:30:26","sys-perf builds differ from release builds","sys-perf artifacts are multiple gigs whereas regular waterfall builds are dozens of megs. Cursory glance shows we may be doing a debug build on sys-perf.",0.5 +"SERVER-49164","06/29/2020 16:05:45","Sweep for and fix missing dependencies in evergreen.yml","We just noticed that some tasks (like ""jscore txns large txns format"") are dependent on compile when it probably makes more sense to have them depend on jscore? https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_dynamic_required_jsCore_txns_large_txns_format_3ea60282c8841c67d4e2f3a365b7f1640c84198c_20_06_29_14_07_33 While at it, it seems like someone should just do a one-time sweep and see if there's any other dependencies it would make sense to add. I know this doesn't fit perfectly with the STM charter, but TBH it probably doesn't fit well with *anyone's* charter, and this seems like the cleanest fit? CC [~pasette]",0 +"SERVER-49203","06/30/2020 21:03:44","Jepsen-Smoke Has a 15% System-Failure Rate","The failures manifest themselves like {noformat} [2020/06/03 20:24:22.459] ERROR [2020-06-03 20:24:22,458] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why: [2020/06/03 20:24:22.459] java.util.concurrent.ExecutionException: java.lang.RuntimeException: Mongo setup on ip-10-122-57-2:20000 timed out! [2020/06/03 20:24:22.459] at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_252] [2020/06/03 20:24:22.459] at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_252] [2020/06/03 20:24:22.459] at clojure.core$deref_future.invokeStatic(core.clj:2208) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$future_call$reify__6962.deref(core.clj:6688) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$deref.invokeStatic(core.clj:2228) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$deref.invoke(core.clj:2214) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$map$fn__4785.invoke(core.clj:2644) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.RT.seq(RT.java:521) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$seq__4357.invokeStatic(core.clj:137) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:24) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core.protocols$fn__6738.invokeStatic(protocols.clj:75) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core.protocols$fn__6738.invoke(protocols.clj:75) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core.protocols$fn__6684$G__6679__6697.invoke(protocols.clj:13) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$reduce.invokeStatic(core.clj:6545) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$into.invokeStatic(core.clj:6610) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$into.invoke(core.clj:6604) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at jepsen.control$on_nodes.invokeStatic(control.clj:373) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.control$on_nodes.invoke(control.clj:357) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.control$on_nodes.invokeStatic(control.clj:362) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.control$on_nodes.invoke(control.clj:357) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.core$run_BANG_$fn__4284$fn__4287.invoke(core.clj:584) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.core$run_BANG_$fn__4284.invoke(core.clj:572) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.core$run_BANG_.invokeStatic(core.clj:553) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.core$run_BANG_.invoke(core.clj:500) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.cli$single_test_cmd$fn__4984.invoke(cli.clj:329) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.cli$run_BANG_.invokeStatic(cli.clj:273) [jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.cli$run_BANG_.invoke(cli.clj:203) [jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.mongodb.runner$_main.invokeStatic(runner.clj:164) [classes/:na] [2020/06/03 20:24:22.459] at jepsen.mongodb.runner$_main.doInvoke(runner.clj:162) [classes/:na] [2020/06/03 20:24:22.459] at clojure.lang.RestFn.invoke(RestFn.java:3894) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.Var.invoke(Var.java:676) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at user$eval5.invokeStatic(form-init3654414589850886228.clj:1) [na:na] [2020/06/03 20:24:22.459] at user$eval5.invoke(form-init3654414589850886228.clj:1) [na:na] [2020/06/03 20:24:22.459] at clojure.lang.Compiler.eval(Compiler.java:6927) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.Compiler.eval(Compiler.java:6917) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.Compiler.load(Compiler.java:7379) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.Compiler.loadFile(Compiler.java:7317) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main$load_script.invokeStatic(main.clj:275) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main$init_opt.invokeStatic(main.clj:277) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main$init_opt.invoke(main.clj:277) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main$initialize.invokeStatic(main.clj:308) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main$null_opt.invokeStatic(main.clj:342) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main$null_opt.invoke(main.clj:339) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main$main.invokeStatic(main.clj:421) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main$main.doInvoke(main.clj:384) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.RestFn.invoke(RestFn.java:421) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.Var.invoke(Var.java:383) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.AFn.applyToHelper(AFn.java:156) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.Var.applyTo(Var.java:700) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.main.main(main.java:37) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] Caused by: java.lang.RuntimeException: Mongo setup on ip-10-122-57-2:20000 timed out! [2020/06/03 20:24:22.459] at jepsen.mongodb.core$db$reify__2125.setup_BANG_(core.clj:330) ~[classes/:na] [2020/06/03 20:24:22.459] at jepsen.db$cycle_BANG_.invokeStatic(db.clj:25) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at jepsen.db$cycle_BANG_.invoke(db.clj:20) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at clojure.core$partial$fn__4759.invoke(core.clj:2516) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at jepsen.control$on_nodes$fn__2069.invoke(control.clj:372) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.AFn.applyToHelper(AFn.java:154) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$apply.invokeStatic(core.clj:646) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1881) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1881) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.RestFn.applyTo(RestFn.java:142) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$apply.invokeStatic(core.clj:650) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.core$bound_fn_STAR_$fn__4671.doInvoke(core.clj:1911) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.RestFn.invoke(RestFn.java:408) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at jepsen.util$real_pmap$launcher__1160$fn__1161.invoke(util.clj:49) ~[jepsen-0.1.8.jar:na] [2020/06/03 20:24:22.459] at clojure.core$binding_conveyor_fn$fn__4676.invoke(core.clj:1938) ~[clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at clojure.lang.AFn.call(AFn.java:18) [clojure-1.8.0.jar:na] [2020/06/03 20:24:22.459] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_252] [2020/06/03 20:24:22.459] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_252] [2020/06/03 20:24:22.459] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_252] [2020/06/03 20:24:22.459] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_252] {noformat} A repeated execution of the [latest failure|https://evergreen.mongodb.com/task/mongodb_mongo_master_ubuntu1804_debug_aubsan_lite_jepsen_smoke_e53293b8749c692ae2abe50ff02f4aee6fea8b84_20_06_30_10_26_41] is green indicating that this is a transient error rather than something linked to the server. This represents a bug in the test or infrastructure rather than in the server. The fixes I see are: # Disable this task # Add retry logic to this task # Dig deeper into the jepsen test itself to figure out why this happens roughly 15% of the time. ",0 +"SERVER-49402","07/09/2020 16:01:18","Misleading error message when connecting to Data Lake","When users connect to Atlas Data Lake (ADL) with the mongo shell, they may sometimes encounter the error: {noformat} *** It looks like this is a MongoDB Atlas cluster. Please ensure that your IP whitelist allows connections from your network. {noformat} This is because ADL URIs have a {{.query.mongodb.net}} suffix - and not a {{mongodb.net}} suffix. We should update https://github.com/mongodb/mongo/blob/43e2423bae07e13cf624b9d5fb74e62bd1959b19/src/mongo/shell/mongo.js#L360-L370 to provide a correct error message for ADL users.",1 +"SERVER-49488","07/13/2020 23:49:32","Mongo shell is conflating authentication & network errors","The mongo shell attempts to authenticate right after it connects to a server and returns [an exception|https://github.com/mongodb/mongo/blob/43e2423bae07e13cf624b9d5fb74e62bd1959b19/src/mongo/shell/mongo.js#L360-L370] if it's unable to. This means that if the client credentials are invalid, it will interpret that as a _connection_ failure and raise an exception. It is unexpected that the shell would conflate an authentication problem with the general class of network connection failures. Here's what clients see when all that's wrong is invalid credentials: {noformat}*** It looks like this is a MongoDB Atlas cluster. Please ensure that your IP whitelist allows connections from your network. {noformat} It's unclear to me if this is expected behavior (it's confusing at best). If so, using a more generic error message instead of specifically offering that clients check their IP allowlist would be less confusing to users. If not, we should fix it.",1 +"SERVER-49716","07/17/2020 22:32:14","""gather_failed_unittests"" does not work on ubuntu1804-build","Observed in https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_ubuntu_no_latch_1804_64_bit_unittests_f1c2d6c29d960506c770958ed39ebe0677a3fdda_20_07_15_23_08_41/0.",0 +"SERVER-49764","07/21/2020 16:41:09","Update instructions for running Genny sys-perf patch builds","The instructions in system_perf.yml for genny patch tasks mentions {{--force-workloads}}, which is no longer an option in Genny, we should update the instructions there to the new approach. We may also consider updating genny to reject unknown arguments",1 +"SERVER-49786","07/21/2020 22:17:29","Freeze DSI and Genny for non-master perf projects","PM-1822 will change the interface between DSI and sys-perf evergreen yamls. To minimize the risk of breaking non-master branches and the overhead of multiple backports during PM-1822, we wish to ""freeze"" the version of DSI in use on the following projects: # [sys-perf v4.4|https://github.com/mongodb/mongo/blob/v4.4/etc/system_perf.yml] # [perf v4.4|https://github.com/mongodb/mongo/blob/v4.4/etc/perf.yml] # [sys-perf v4.2|https://github.com/mongodb/mongo/blob/v4.2/etc/system_perf.yml] # [perf v4.2|https://github.com/mongodb/mongo/blob/v4.2/etc/perf.yml] # [sys-perf v4.0|https://github.com/mongodb/mongo/blob/v4.0/etc/system_perf.yml] # [perf v4.0|https://github.com/mongodb/mongo/blob/v4.0/etc/perf.yml] # [sys-perf v3.6|https://github.com/mongodb/mongo/blob/v3.6/etc/system_perf.yml] # [perf v3.6|https://github.com/mongodb/mongo/blob/v3.6/etc/perf.yml] To do this we will create a ""legacy"" branch of DSI and modify non-master evergreen yamls to use this. This means that any changes to DSI will not be usable by old perf projects unless those changes are backported. At the end of PM-1822 we could selectively bring the required projects back up to DSI master or leave these projects ""forever frozen"".",1 +"SERVER-49818","07/23/2020 03:09:36","Enterprise Windows required builder no longer runs burn_in_tests as non-required builder","The changes from [9a421e1|https://github.com/mongodb/mongo/commit/9a421e19cef1caa2627d4776db700ae5c8751932] as part of SERVER-46450 reduced the set of Evergreen tasks which run on the ""! Enterprise Windows"" build variant, but did not restore [the {{burn_in_tests_build_variant: enterprise-windows}} setting|https://github.com/mongodb/mongo/commit/deca8251f356292eb1c813b65c4f6ebd458a1094#diff-71ccc9b828b2d68dab47c8be07ab6f96L9137] for it. This greatly limits the ability of patch builds to detect whether a new or modified test is going to fail post-commit on the ""* Enterprise Windows"" build variant.",1 +"SERVER-49945","07/28/2020 13:57:45","Mark mypip.ini file as hidden","It is a small thing, but all the other linter configuration files are hidden files starting with a {{.}}, with the exception of {{mypy.ini}}. A brief glance at the docs suggests that it too could be hidden. Doing so would slightly reduce the clutter at the top of the tree.",1 +"SERVER-50078","08/03/2020 15:55:16","Compile bypass applied when it should not have","See dev-only comment with patch build links, but as far as I understand it, changes to {{SConstruct}} should disable compile bypass. See also EVG-12714.",2 +"SERVER-50133","08/05/2020 22:31:59","Perf YAML Cleanups","Depends on SERVER-49786 From sys-perf yaml: # Remove mark_idle invocations and # Switch run-dsi invocations to use subprocess.exec and no extensions ## kill the {{set}} lines; no dsienv.sh or setup-dsi-env.sh--kill in DSI (kill signal_processing_setup.sh from DSI while you're there) ## never any absolute paths or .py suffixes always just {{run-dsi command}}; kill bin/anaysis.py # Kill ""write yml config"" in favor of expansions.yml # ""deploy cluster"" calls run-dsi deploy-cluster # Do json.send as a post task # Kill useless/constant {{project_dir}}, {{platform}}, {{script_flags}} vars For perf.yml: # call analysis through run-dsi # the killall_mci expansion doesn't exist so kill that from the pre/post steps; make the {{pkill}} scripts not ugly af",2 +"SERVER-50277","08/12/2020 19:32:21","Performance Yaml Cleanups pt 1","A handful of things that will make iteration a bit easier. This is all really hard to do in separate tickets or in a staged way. Smaller PRs first wherever possible, but minimize times we backport. Best to just rip off the bandaid. Changes to sys-perf yamls (master and 4.4): # Single f_run_dsi_workload Evergreen function ## Mostly the same logic that's currently in the handful of existing functions, but in a single function. ## Update the param names to match the files e.g. {{cluster}} to {{infrastructure_provisioning}} ## Use conventional module locations where possible ## Use conventional report output locations where possible ## Remove cruft like dsienv.sh; run-dsi invocations are single-line scripts # Change the order of tasks/functions to keep compile and dsi stuff more separated. # Add a genny task that is *not* a generated task. Changes to microbenchmarks yamls (master and 4.4): # Single f_run_microbenchmarks_workload evergreen function for non-genny workloads # Single f_run_genny_workload evergreen function for genny workloads # Both functions to use conventional module locations where possible # Remove extraneous genny invocation--I think just need the call to {{lamp}} without venv nonsense # Tidy the weird {{pkill}} logic # Conventional locations for genny, DSI, and signal-processing modules # Change the order of tasks/functions to keep compile, dsi, and non-dsi-based-workloads more separated. Changes to DSI: # Update for conventional paths above # Make evergreen-dsitest.yml a representative snapshot of what's in system_perf.yml and, if possible, something similar for perf.yml # Change documentation for how to patch-build without compile Changes to Genny: # Kill ""legacy"" task-gen logic",5 +"SERVER-50313","08/14/2020 13:50:51","Add standalone tasks to live-record buildvariant","all test suites that run on a standalone, except for: * Unittests *non-mongod/s C++ test suites like libfuzzer, snmp * all test suites that run on a replica set, except ones explicitly listed below * all test suites that run on a sharded cluster, except ones explicitly listed below * has requires_fast_memory tag (–excludeWithAnyTags=requires_fast_memory)   We will make the following effort to run the above tests with undo. If the following approaches combined don't ensure the suite can run undo, we will modify the scope to exclude the failing test suites * blacklist problematic tests, up to once time per suite (i.e. all failing tests in one patch build) * Audit tags of failing tests or tags for performance requirements and exclude those tests * increase the election timeout * increase the test and task timeout * reduce the WT cache size * reduce the size of the cluster * turn off continueOnFailure * reduce the number of clients * adjust the data size * run on larger instances, including adding new EC2 8x instance",2 +"SERVER-50352","08/18/2020 14:21:40","Add understanding of previous syntax for multiversion exclusions","We'd expected that backporting SERVER-48048 immediately would obviate the need to understand the previous yml syntax, but that doesn't seem to be the case; it looks like we use previous release versions in multiversion tests rather than the tips of other branches. We should add back in the logic from earlier CRs of SERVER-48048 to handle this. Right now everything in {{etc/backports_required_for_multiversion_tests.yml}} is being unconditionally excluded.",1 +"SERVER-50362","08/18/2020 18:44:38","Add resilience to repeat execution for multiversion tag generation","If a task like {{sharding_multiversion_gen}} is run more than once, the {{generated_resmoke_config}} directory isn't created, so placing a tag file in it fails. It should be relatively safe to no-op if the path we're placing the tag file in doesn't exist; if this happens when it shouldn't then in the worst case a test will run which should've been excluded and fail, which would let us know something was wrong.",1 +"SERVER-50379","08/19/2020 13:57:28","Reduce frequency of ! and * builders on 4.4","Now that 4.4 has been released, we'd like to adjust the frequency of the hourly builders to run with a higher interval.",1 +"SERVER-50641","08/31/2020 14:17:58","Add more aggressive timeouts to commit queue tasks","There are occasionally issues where the tasks in the commit queue appear to hang. Since the default timeouts are around 2 hours, this can cause large back ups in the queue. We could add more aggressive timeouts to the tasks in the queue so that they timeout much earlier. The lint tasks already do this and timeout around 40 minutes. We should look at the historic runtimes of the tasks to pick appropriate timeouts. ---- As a server engineer, I want hung commit queue tasks to time out earlier, So that the commit queue does not get blocked for a long period of time waiting for a task that is just going to timeout anyway. ---- AC: * All tasks in the commit-queue have a timeout of under 1 hour.",1