JIRA

  • Log In Access more options
    • Online Help
    • GreenHopper Help
    • Agile Answers
    • Use Agile By Default
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What’s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • Agile Access more options (Alt+g)
  • Create Issue
  • Mule
  • MULE-2067

Auto-increment port numbers

  • Agile Board
  • More Actions
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Won't Fix or Usage Issue
  • Affects Version/s: 2.0.0-M1
  • Fix Version/s: None
  • Component/s: Core: Configuration
  • Labels:
    None
  • User impact:
    Low
  • Similar Issues:
    None

Description

Sometimes tests seem to fail due to port conflicts. We could avoid this by making port numbers auto-increment. One way to implement this would be to use a variable like "${MULE_SEQUENTIAL_PORT}" in the config and then implement a custom property placeholder handler.

If this were used to set global endpoints in the config these could then be accessed from the Java test code (the test would send to the global name and have no need to know the port number).

We could extend the test framework to restart tests a second time on error. If the port number were taken from a global (thread safe) counter we would get new ports for each test.

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
  • Transitions
  • Commits
  • Source
  • Builds
Hide
Permalink
Andrew Perepelytsya added a comment - 23/Jul/07 04:42 PM

Extend it further. Have it check extra MULE_SEQUENTIAL_PORT_START and MULE_SEQUENTIAL_PORT_END variables, and throw a normal error if still couldn't bind. Then you can have different groups for port ranges, and after then learn to know when to stop

Show
Andrew Perepelytsya added a comment - 23/Jul/07 04:42 PM Extend it further. Have it check extra MULE_SEQUENTIAL_PORT_START and MULE_SEQUENTIAL_PORT_END variables, and throw a normal error if still couldn't bind. Then you can have different groups for port ranges, and after then learn to know when to stop
Hide
Permalink
Antoine Borg added a comment - 24/Jul/07 01:17 AM

Rather than restart the tests upon error, why not implement a new ConnectionStrategy that changes the port upon error?

Show
Antoine Borg added a comment - 24/Jul/07 01:17 AM Rather than restart the tests upon error, why not implement a new ConnectionStrategy that changes the port upon error?
Hide
Permalink
Holger Hoffstaette added a comment - 24/Jul/07 04:39 AM

Here's another radical suggestion: maybe we can find out why the ports "sometimes" clash in the first place? just sayin'..my money is on Occam's Razor which says it's the Mule.

Show
Holger Hoffstaette added a comment - 24/Jul/07 04:39 AM Here's another radical suggestion: maybe we can find out why the ports "sometimes" clash in the first place? just sayin'..my money is on Occam's Razor which says it's the Mule.
Hide
Permalink
andrew cooke added a comment - 24/Jul/07 09:03 AM

You inspired (provoked?) me to take a bit closer look at this, Holger. I've added a test to the trunk that shows the address already in use error without using any Mule code. I need to look at something else, but will get back to it later and see if SO_LINGER can make a difference. I also have a longer reply about priorities, pragmatism and hair shirts (you know - the usual), but that can wait too...

Show
andrew cooke added a comment - 24/Jul/07 09:03 AM You inspired (provoked?) me to take a bit closer look at this, Holger. I've added a test to the trunk that shows the address already in use error without using any Mule code. I need to look at something else, but will get back to it later and see if SO_LINGER can make a difference. I also have a longer reply about priorities, pragmatism and hair shirts (you know - the usual), but that can wait too...
Hide
Permalink
Holger Hoffstaette added a comment - 24/Jul/07 01:06 PM

Andrew, google for "Java TIME_WAIT" and/or "java bind address already in use". It is a good mess of a) used OS b) OS tuning settings (Windows Firewall) c) JVM version d) JVM bugs. Apparently some of the socket options methods do not work, or only sometimes, or only on Windows..or maybe not.
I'll see if I catch anything obvious.

Show
Holger Hoffstaette added a comment - 24/Jul/07 01:06 PM Andrew, google for "Java TIME_WAIT" and/or "java bind address already in use". It is a good mess of a) used OS b) OS tuning settings (Windows Firewall) c) JVM version d) JVM bugs. Apparently some of the socket options methods do not work, or only sometimes, or only on Windows..or maybe not. I'll see if I catch anything obvious.
Hide
Permalink
andrew cooke added a comment - 24/Jul/07 01:22 PM

i just googled around and found that SO_REUSEADDR is exposed. that might be sufficient...

Show
andrew cooke added a comment - 24/Jul/07 01:22 PM i just googled around and found that SO_REUSEADDR is exposed. that might be sufficient...
Hide
Permalink
andrew cooke added a comment - 24/Jul/07 01:27 PM

PS

  • I should have given credit to Daniel for suggesting the property placeholder approach.
  • Antoine, I hadn't thought of that, but it would mean a common Java interface across all connectors that use IP sockets (which might be a good idea anyway).
Show
andrew cooke added a comment - 24/Jul/07 01:27 PM PS
  • I should have given credit to Daniel for suggesting the property placeholder approach.
  • Antoine, I hadn't thought of that, but it would mean a common Java interface across all connectors that use IP sockets (which might be a good idea anyway).
Hide
Permalink
andrew cooke added a comment - 24/Jul/07 02:20 PM

I wrote a test (ReuseExperimentMule2067TestCase in tcp tests in trunk) that tries to repeatedly open/close sockets, with various pauses, with and without the SO_REUSEADDR. In the run shown below it tries top open/close sockets up to 100 times (stopping at less if an "address already in use" occurs). It repeats that 10 times to work out an average "run length" (which will be 100 in the best case and 0 if open/close fails immediately).

The output is:

{console}
[07-24 19:11:04] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 100 ms
[07-24 19:11:05] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0
[07-24 19:11:07] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 100 ms
[07-24 19:12:51] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0
[07-24 19:12:53] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 10 ms
[07-24 19:12:54] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0
[07-24 19:12:56] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 10 ms
[07-24 19:13:13] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0
[07-24 19:13:15] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 1 ms
[07-24 19:13:16] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0
[07-24 19:13:18] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 1 ms
[07-24 19:13:27] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0{console}

which is exactly what you would expect and suggest we might try setting SO_REUSEADDR in our code.

HOWEVER the output above is extremely rare. Typically I get zeroes everywhere. I have no idea what is happening...

Show
andrew cooke added a comment - 24/Jul/07 02:20 PM I wrote a test (ReuseExperimentMule2067TestCase in tcp tests in trunk) that tries to repeatedly open/close sockets, with various pauses, with and without the SO_REUSEADDR. In the run shown below it tries top open/close sockets up to 100 times (stopping at less if an "address already in use" occurs). It repeats that 10 times to work out an average "run length" (which will be 100 in the best case and 0 if open/close fails immediately). The output is: {console} [07-24 19:11:04] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 100 ms [07-24 19:11:05] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0 [07-24 19:11:07] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 100 ms [07-24 19:12:51] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0 [07-24 19:12:53] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 10 ms [07-24 19:12:54] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0 [07-24 19:12:56] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 10 ms [07-24 19:13:13] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0 [07-24 19:13:15] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats without reuse and a pause of 1 ms [07-24 19:13:16] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 0.0 +/- 0.0 [07-24 19:13:18] INFO ReuseExperimentMule2067TestCase [main]: Measuring average run length for 100 repeats with reuse and a pause of 1 ms [07-24 19:13:27] INFO ReuseExperimentMule2067TestCase [main]: Average run length: 100.0 +/- 0.0{console} which is exactly what you would expect and suggest we might try setting SO_REUSEADDR in our code. HOWEVER the output above is extremely rare. Typically I get zeroes everywhere. I have no idea what is happening...
Hide
Permalink
andrew cooke added a comment - 24/Jul/07 02:42 PM

It seems that there's some kind of "longer term fail state" - once a port is associated with an "already in use exception" it seems to be unusable for much longer than a typical closed socket. The results above used code with a single socket, hence the tendency to have zeroes everywhere.

If we use a different port number for each sample we get much more reasonable results:
Measuring average run length for 100 repeats without reuse and a pause of 100 ms - Average run length: 57.3 +/- 33.15131973240282
Measuring average run length for 100 repeats with reuse and a pause of 100 ms - Average run length: 100.0 +/- 0.0
Measuring average run length for 100 repeats without reuse and a pause of 10 ms - Average run length: 96.8 +/- 7.332121111929359
Measuring average run length for 100 repeats with reuse and a pause of 10 ms - Average run length: 100.0 +/- 0.0
Measuring average run length for 100 repeats without reuse and a pause of 1 ms - Average run length: 75.8 +/- 37.690317058894586
Measuring average run length for 100 repeats with reuse and a pause of 1 ms - Average run length: 100.0 +/- 0.0

While this looks like it's a complex issue (it relies heavily on C level libraries which are going to vary between OS), there doesn't seem to be much downside to trying Socket.setReuseAddress(true) on our server sockets.

Show
andrew cooke added a comment - 24/Jul/07 02:42 PM It seems that there's some kind of "longer term fail state" - once a port is associated with an "already in use exception" it seems to be unusable for much longer than a typical closed socket. The results above used code with a single socket, hence the tendency to have zeroes everywhere. If we use a different port number for each sample we get much more reasonable results: Measuring average run length for 100 repeats without reuse and a pause of 100 ms - Average run length: 57.3 +/- 33.15131973240282 Measuring average run length for 100 repeats with reuse and a pause of 100 ms - Average run length: 100.0 +/- 0.0 Measuring average run length for 100 repeats without reuse and a pause of 10 ms - Average run length: 96.8 +/- 7.332121111929359 Measuring average run length for 100 repeats with reuse and a pause of 10 ms - Average run length: 100.0 +/- 0.0 Measuring average run length for 100 repeats without reuse and a pause of 1 ms - Average run length: 75.8 +/- 37.690317058894586 Measuring average run length for 100 repeats with reuse and a pause of 1 ms - Average run length: 100.0 +/- 0.0 While this looks like it's a complex issue (it relies heavily on C level libraries which are going to vary between OS), there doesn't seem to be much downside to trying Socket.setReuseAddress(true) on our server sockets.
Hide
Permalink
andrew cooke added a comment - 25/Jul/07 11:37 AM

Leaving this for now. Here's a summary of my thoughts on the subject ("address already in use"). The causes I've considered include:

  1. The socket is indeed open elsewhere
    1. an unrelated process on the test machine (might be fixed by auto-increment endpoints)
    2. another test (implies tests are running in parallel, which should not be the case)
    3. the same test, with poorly constrained parallel processes (need to look at case by case)
    4. the same test, with poorly implemented parallel processes (ie a bug in Mule rather than the config/test).
  2. The socket is in some intermediate state
    1. successive closing and opening can cause problems (see test code; this should be addressed by the SO_REUSEDADDRESS fix)
      1. that fix doesn't seem to work completely - see MULE-2069
      2. there may be issues with the underlying (OS?) libraries (eg the default state for SO_REUSEADDRESS is undefined and probably depends on lib implementation).
    2. there may be other related states, as well as TIME_WAIT (eg seems to be some kind of longer lived "error" state)

One snippet of "evidence" that may suggest the "2..." branch (or an extreme 1.4) is that the failure in MULE-2069 is with the Mule Admin agent, which is (I assume) a singleton.

Show
andrew cooke added a comment - 25/Jul/07 11:37 AM Leaving this for now. Here's a summary of my thoughts on the subject ("address already in use"). The causes I've considered include:
  1. The socket is indeed open elsewhere
    1. an unrelated process on the test machine (might be fixed by auto-increment endpoints)
    2. another test (implies tests are running in parallel, which should not be the case)
    3. the same test, with poorly constrained parallel processes (need to look at case by case)
    4. the same test, with poorly implemented parallel processes (ie a bug in Mule rather than the config/test).
  2. The socket is in some intermediate state
    1. successive closing and opening can cause problems (see test code; this should be addressed by the SO_REUSEDADDRESS fix)
      1. that fix doesn't seem to work completely - see MULE-2069
      2. there may be issues with the underlying (OS?) libraries (eg the default state for SO_REUSEADDRESS is undefined and probably depends on lib implementation).
    2. there may be other related states, as well as TIME_WAIT (eg seems to be some kind of longer lived "error" state)
One snippet of "evidence" that may suggest the "2..." branch (or an extreme 1.4) is that the failure in MULE-2069 is with the Mule Admin agent, which is (I assume) a singleton.
Hide
Permalink
Andrew Perepelytsya added a comment - 23/Feb/09 11:57 AM

Weighing the pros and cons, I don't think there's enough benefits in support of the implementation effort.

Show
Andrew Perepelytsya added a comment - 23/Feb/09 11:57 AM Weighing the pros and cons, I don't think there's enough benefits in support of the implementation effort.

People

  • Assignee:
    Andrew Perepelytsya
    Reporter:
    andrew cooke
Vote (0)
Watch (1)

Dates

  • Created:
    23/Jul/07 04:33 PM
    Updated:
    23/Feb/09 11:57 AM
    Resolved:
    23/Feb/09 11:57 AM

Agile

  • View on Board
  • Atlassian JIRA (v5.0.7#734-sha1:8ad78a6)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for MuleForge. Try JIRA - bug tracking software for your team.