Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix or Usage Issue
    • Affects Version/s: 2.0.0-M1
    • Fix Version/s: None
    • Component/s: Core: Configuration
    • Labels:
      None
    • User impact:
      Low
    • Similar Issues:
      MULE-7001Write initial draft of shared ports spec
      MULE-6306Return generated keys when inserting records in databases with AutoIncrement Ids
      MULE-7122Fix flaky test LoanBrokerSyncTestCase
      MULE-4587Default port for FTP
      MULE-1915Magic numbers in TransportFactory
      MULE-1263XFire binding to port
      MULE-3517Port examples
      MULE-1320Retain version numbers of downloaded jars
      MULE-2827Host & port set as individual parameters are ignored by RmiRegistryAgent
      MULE-7570Build Number is not displayed at startup and is not present in MANIFEST files

      Description

      Sometimes tests seem to fail due to port conflicts. We could avoid this by making port numbers auto-increment. One way to implement this would be to use a variable like "$

      {MULE_SEQUENTIAL_PORT}

      " in the config and then implement a custom property placeholder handler.

      If this were used to set global endpoints in the config these could then be accessed from the Java test code (the test would send to the global name and have no need to know the port number).

      We could extend the test framework to restart tests a second time on error. If the port number were taken from a global (thread safe) counter we would get new ports for each test.

        Activity

        Hide
        andrew cooke added a comment -

        PS

        • I should have given credit to Daniel for suggesting the property placeholder approach.
        • Antoine, I hadn't thought of that, but it would mean a common Java interface across all connectors that use IP sockets (which might be a good idea anyway).
        Show
        andrew cooke added a comment - PS I should have given credit to Daniel for suggesting the property placeholder approach. Antoine, I hadn't thought of that, but it would mean a common Java interface across all connectors that use IP sockets (which might be a good idea anyway).
        Hide
        andrew cooke added a comment -

        I wrote a test (ReuseExperimentMule2067TestCase in tcp tests in trunk) that tries to repeatedly open/close sockets, with various pauses, with and without the SO_REUSEADDR. In the run shown below it tries top open/close sockets up to 100 times (stopping at less if an "address already in use" occurs). It repeats that 10 times to work out an average "run length" (which will be 100 in the best case and 0 if open/close fails immediately).

        The output is:

        {console}
        [07-24 19:11:04] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 100 ms
        [07-24 19:11:05] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0
        [07-24 19:11:07] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 100 ms
        [07-24 19:12:51] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0
        [07-24 19:12:53] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 10 ms
        [07-24 19:12:54] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0
        [07-24 19:12:56] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 10 ms
        [07-24 19:13:13] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0
        [07-24 19:13:15] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 1 ms
        [07-24 19:13:16] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0
        [07-24 19:13:18] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 1 ms
        [07-24 19:13:27] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0{console}

        which is exactly what you would expect and suggest we might try setting SO_REUSEADDR in our code.

        HOWEVER the output above is extremely rare. Typically I get zeroes everywhere. I have no idea what is happening...

        Show
        andrew cooke added a comment - I wrote a test (ReuseExperimentMule2067TestCase in tcp tests in trunk) that tries to repeatedly open/close sockets, with various pauses, with and without the SO_REUSEADDR. In the run shown below it tries top open/close sockets up to 100 times (stopping at less if an "address already in use" occurs). It repeats that 10 times to work out an average "run length" (which will be 100 in the best case and 0 if open/close fails immediately). The output is: {console} [07-24 19:11:04] INFO ReuseExperimentMule2067TestCase [main] : Measuring average run length for 100 repeats without reuse and a pause of 100 ms [07-24 19:11:05] INFO ReuseExperimentMule2067TestCase [main] : Average run length: 0.0 +/- 0.0 [07-24 19:11:07] INFO ReuseExperimentMule2067TestCase [main] : Measuring average run length for 100 repeats with reuse and a pause of 100 ms [07-24 19:12:51] INFO ReuseExperimentMule2067TestCase [main] : Average run length: 100.0 +/- 0.0 [07-24 19:12:53] INFO ReuseExperimentMule2067TestCase [main] : Measuring average run length for 100 repeats without reuse and a pause of 10 ms [07-24 19:12:54] INFO ReuseExperimentMule2067TestCase [main] : Average run length: 0.0 +/- 0.0 [07-24 19:12:56] INFO ReuseExperimentMule2067TestCase [main] : Measuring average run length for 100 repeats with reuse and a pause of 10 ms [07-24 19:13:13] INFO ReuseExperimentMule2067TestCase [main] : Average run length: 100.0 +/- 0.0 [07-24 19:13:15] INFO ReuseExperimentMule2067TestCase [main] : Measuring average run length for 100 repeats without reuse and a pause of 1 ms [07-24 19:13:16] INFO ReuseExperimentMule2067TestCase [main] : Average run length: 0.0 +/- 0.0 [07-24 19:13:18] INFO ReuseExperimentMule2067TestCase [main] : Measuring average run length for 100 repeats with reuse and a pause of 1 ms [07-24 19:13:27] INFO ReuseExperimentMule2067TestCase [main] : Average run length: 100.0 +/- 0.0{console} which is exactly what you would expect and suggest we might try setting SO_REUSEADDR in our code. HOWEVER the output above is extremely rare. Typically I get zeroes everywhere. I have no idea what is happening...
        Hide
        andrew cooke added a comment -

        It seems that there's some kind of "longer term fail state" - once a port is associated with an "already in use exception" it seems to be unusable for much longer than a typical closed socket. The results above used code with a single socket, hence the tendency to have zeroes everywhere.

        If we use a different port number for each sample we get much more reasonable results:
        Measuring average run length for 100 repeats without reuse and a pause of 100 ms - Average run length: 57.3 +/- 33.15131973240282
        Measuring average run length for 100 repeats with reuse and a pause of 100 ms - Average run length: 100.0 +/- 0.0
        Measuring average run length for 100 repeats without reuse and a pause of 10 ms - Average run length: 96.8 +/- 7.332121111929359
        Measuring average run length for 100 repeats with reuse and a pause of 10 ms - Average run length: 100.0 +/- 0.0
        Measuring average run length for 100 repeats without reuse and a pause of 1 ms - Average run length: 75.8 +/- 37.690317058894586
        Measuring average run length for 100 repeats with reuse and a pause of 1 ms - Average run length: 100.0 +/- 0.0

        While this looks like it's a complex issue (it relies heavily on C level libraries which are going to vary between OS), there doesn't seem to be much downside to trying Socket.setReuseAddress(true) on our server sockets.

        Show
        andrew cooke added a comment - It seems that there's some kind of "longer term fail state" - once a port is associated with an "already in use exception" it seems to be unusable for much longer than a typical closed socket. The results above used code with a single socket, hence the tendency to have zeroes everywhere. If we use a different port number for each sample we get much more reasonable results: Measuring average run length for 100 repeats without reuse and a pause of 100 ms - Average run length: 57.3 +/- 33.15131973240282 Measuring average run length for 100 repeats with reuse and a pause of 100 ms - Average run length: 100.0 +/- 0.0 Measuring average run length for 100 repeats without reuse and a pause of 10 ms - Average run length: 96.8 +/- 7.332121111929359 Measuring average run length for 100 repeats with reuse and a pause of 10 ms - Average run length: 100.0 +/- 0.0 Measuring average run length for 100 repeats without reuse and a pause of 1 ms - Average run length: 75.8 +/- 37.690317058894586 Measuring average run length for 100 repeats with reuse and a pause of 1 ms - Average run length: 100.0 +/- 0.0 While this looks like it's a complex issue (it relies heavily on C level libraries which are going to vary between OS), there doesn't seem to be much downside to trying Socket.setReuseAddress(true) on our server sockets.
        Hide
        andrew cooke added a comment -

        Leaving this for now. Here's a summary of my thoughts on the subject ("address already in use"). The causes I've considered include:

        1. The socket is indeed open elsewhere
          1. an unrelated process on the test machine (might be fixed by auto-increment endpoints)
          2. another test (implies tests are running in parallel, which should not be the case)
          3. the same test, with poorly constrained parallel processes (need to look at case by case)
          4. the same test, with poorly implemented parallel processes (ie a bug in Mule rather than the config/test).
        2. The socket is in some intermediate state
          1. successive closing and opening can cause problems (see test code; this should be addressed by the SO_REUSEDADDRESS fix)
            1. that fix doesn't seem to work completely - see MULE-2069
            2. there may be issues with the underlying (OS?) libraries (eg the default state for SO_REUSEADDRESS is undefined and probably depends on lib implementation).
          2. there may be other related states, as well as TIME_WAIT (eg seems to be some kind of longer lived "error" state)

        One snippet of "evidence" that may suggest the "2..." branch (or an extreme 1.4) is that the failure in MULE-2069 is with the Mule Admin agent, which is (I assume) a singleton.

        Show
        andrew cooke added a comment - Leaving this for now. Here's a summary of my thoughts on the subject ("address already in use"). The causes I've considered include: The socket is indeed open elsewhere an unrelated process on the test machine (might be fixed by auto-increment endpoints) another test (implies tests are running in parallel, which should not be the case) the same test, with poorly constrained parallel processes (need to look at case by case) the same test, with poorly implemented parallel processes (ie a bug in Mule rather than the config/test). The socket is in some intermediate state successive closing and opening can cause problems (see test code; this should be addressed by the SO_REUSEDADDRESS fix) that fix doesn't seem to work completely - see MULE-2069 there may be issues with the underlying (OS?) libraries (eg the default state for SO_REUSEADDRESS is undefined and probably depends on lib implementation). there may be other related states, as well as TIME_WAIT (eg seems to be some kind of longer lived "error" state) One snippet of "evidence" that may suggest the "2..." branch (or an extreme 1.4) is that the failure in MULE-2069 is with the Mule Admin agent, which is (I assume) a singleton.
        Hide
        Andrew Perepelytsya added a comment -

        Weighing the pros and cons, I don't think there's enough benefits in support of the implementation effort.

        Show
        Andrew Perepelytsya added a comment - Weighing the pros and cons, I don't think there's enough benefits in support of the implementation effort.

          People

          • Assignee:
            Andrew Perepelytsya
            Reporter:
            andrew cooke
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development