Discussion:
FRS Only replicates on inbound connection, no changes go out.
(too old to reply)
Mike Drechsler - SPAM PROTECTED EMAIL
2005-08-21 14:23:23 UTC
Permalink
I have a pair of servers in a small network that will only do file
replication in one direction.

Any changes made on the main server are replicating over to the offsite
server but changes made on the offsite server do not make it back to the
main server.

The strange thing is that if you investigate with Sonar. There are no
files backlogged on either server. I installed Ultrasound and the only
things it show are a few sharing violations on one of the DFS replicas
but all other items appear to be ok. I cleaned up some of the sharing
violations but it does not help. There are 2 other DFS links and the
sysvol that show no errors but none will send changes back to the main
server. I tried doing a D2 restore on the sysvol on the offsite server
but this did not help. I have restarted both servers many times. Both
are running Windows 2000 Server Std. All patches from windows update
are installed and I even tried the hotfix in KB article 815473 but that
didn't fix the replication problem (though I do get the event log
entries for those sharing violations now).

I get an event 13508 for the DFS link that has the sharing violations
but not for sysvol or the other 2 DFS links. Files from the main server
still replicate to the DFS link getting the 13508 event so it doesn't
seem to amount to much. The main server shows the same 13508 for the
same DFS link but nothing else.

Replication between the two servers has worked fine in the past. It
just started doing this around July 27. The only events I recall from
that time was one of the servers ran out of hard drive space around
then, but it's been fixed and there is plenty of free space on both now.
The other event was a power outage but the UPS has shutdown software
running so the servers would have shut themselves down.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler (mike-newsgroup@-deletethispart-.upcraft.com)
garry
2005-08-21 17:52:01 UTC
Permalink
Hi run frsdiag and look at the constat txt file and see when they last
joined. A quick win is to delete the connection obnjects under sites and
serivces on both sides then run repadmin /kcc on both servers then see if
they have joined correctly now. You may have to stop and start ntfrs as well
Post by Mike Drechsler - SPAM PROTECTED EMAIL
I have a pair of servers in a small network that will only do file
replication in one direction.
Any changes made on the main server are replicating over to the offsite
server but changes made on the offsite server do not make it back to the
main server.
The strange thing is that if you investigate with Sonar. There are no
files backlogged on either server. I installed Ultrasound and the only
things it show are a few sharing violations on one of the DFS replicas
but all other items appear to be ok. I cleaned up some of the sharing
violations but it does not help. There are 2 other DFS links and the
sysvol that show no errors but none will send changes back to the main
server. I tried doing a D2 restore on the sysvol on the offsite server
but this did not help. I have restarted both servers many times. Both
are running Windows 2000 Server Std. All patches from windows update
are installed and I even tried the hotfix in KB article 815473 but that
didn't fix the replication problem (though I do get the event log
entries for those sharing violations now).
I get an event 13508 for the DFS link that has the sharing violations
but not for sysvol or the other 2 DFS links. Files from the main server
still replicate to the DFS link getting the 13508 event so it doesn't
seem to amount to much. The main server shows the same 13508 for the
same DFS link but nothing else.
Replication between the two servers has worked fine in the past. It
just started doing this around July 27. The only events I recall from
that time was one of the servers ran out of hard drive space around
then, but it's been fixed and there is plenty of free space on both now.
The other event was a power outage but the UPS has shutdown software
running so the servers would have shut themselves down.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler - SPAM PROTECTED EMAIL
2005-08-22 03:23:09 UTC
Permalink
Post by garry
Hi run frsdiag and look at the constat txt file and see when they last
joined. A quick win is to delete the connection obnjects under sites and
serivces on both sides then run repadmin /kcc on both servers then see if
they have joined correctly now. You may have to stop and start ntfrs as well
Post by Mike Drechsler - SPAM PROTECTED EMAIL
I have a pair of servers in a small network that will only do file
replication in one direction.
Any changes made on the main server are replicating over to the offsite
server but changes made on the offsite server do not make it back to the
main server.
The strange thing is that if you investigate with Sonar. There are no
files backlogged on either server. I installed Ultrasound and the only
things it show are a few sharing violations on one of the DFS replicas
but all other items appear to be ok. I cleaned up some of the sharing
violations but it does not help. There are 2 other DFS links and the
sysvol that show no errors but none will send changes back to the main
server. I tried doing a D2 restore on the sysvol on the offsite server
but this did not help. I have restarted both servers many times. Both
are running Windows 2000 Server Std. All patches from windows update
are installed and I even tried the hotfix in KB article 815473 but that
didn't fix the replication problem (though I do get the event log
entries for those sharing violations now).
I get an event 13508 for the DFS link that has the sharing violations
but not for sysvol or the other 2 DFS links. Files from the main server
still replicate to the DFS link getting the 13508 event so it doesn't
seem to amount to much. The main server shows the same 13508 for the
same DFS link but nothing else.
Replication between the two servers has worked fine in the past. It
just started doing this around July 27. The only events I recall from
that time was one of the servers ran out of hard drive space around
then, but it's been fixed and there is plenty of free space on both now.
The other event was a power outage but the UPS has shutdown software
running so the servers would have shut themselves down.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
That is a good idea, something I had not tried before. I did as you
suggested and waited for the autogenerated connection objects to show up
on the other server. Then I tried creating files in the sysvol folder
of both servers and waiting to see if it would replicate.
Unfortunately, it's still only going in one direction.

Now the interesting bit that doing all this has revealed to me is in the
constat file:
Replica: DOMAIN SYSTEM VOLUME (SYSVOL SHARE)
(057026ff-1bcb-4c05-ab8c86d344b069be)
Member: REMOTESERVER ServiceState: 3 (ACTIVE) OutLogSeqNum: 182
OutlogCleanup: 182 Delta: 0

Config Flags: Multimaster Online
Root Path : c:\winnt\sysvol\domain
Staging Path: c:\winnt\sysvol\staging\domain
File Filter : *.tmp, *.bak, ~*
Dir Filter :


Send Cleanup Cos
Partner I/O State Rev LastJoinTime
OLog State Leadx Delta Trailx Delta LMT Out Last VVJoin

DOM\MAINSRV$ In Joined 7 Sun Aug 21, 2005 21:00:27
DOM\MAINSRV$ Out Unjoined 7 Sun Aug 21, 2005 21:00:27
OLP_INACTIVE 182 0 182 0 0 0 Sun Aug 21, 2005
18:15:01

The OLP_INACTIVE status seems to indicate that the main server is
somehow refusing to accept changes, though I was not able to find any
hints on how to solve this. Perhaps a non authoritative (D2) restore on
this machine? I had been assuming that this computer was fine since it
was sending out changes properly, perhaps my logic was reversed and it's
more important to think about which server is not taking the updates.

Any hints from anyone before I do something crazy?
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler (mike-newsgroup@-deletethispart-.upcraft.com)
Mike Drechsler - SPAM PROTECTED EMAIL
2005-08-22 03:26:06 UTC
Permalink
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Post by garry
Hi run frsdiag and look at the constat txt file and see when they last
joined. A quick win is to delete the connection obnjects under sites
and serivces on both sides then run repadmin /kcc on both servers then
see if they have joined correctly now. You may have to stop and start
ntfrs as well
Post by Mike Drechsler - SPAM PROTECTED EMAIL
I have a pair of servers in a small network that will only do file
replication in one direction.
Any changes made on the main server are replicating over to the
offsite server but changes made on the offsite server do not make it
back to the main server.
The strange thing is that if you investigate with Sonar. There are
no files backlogged on either server. I installed Ultrasound and the
only things it show are a few sharing violations on one of the DFS
replicas but all other items appear to be ok. I cleaned up some of
the sharing violations but it does not help. There are 2 other DFS
links and the sysvol that show no errors but none will send changes
back to the main server. I tried doing a D2 restore on the sysvol on
the offsite server but this did not help. I have restarted both
servers many times. Both are running Windows 2000 Server Std. All
patches from windows update are installed and I even tried the hotfix
in KB article 815473 but that didn't fix the replication problem
(though I do get the event log entries for those sharing violations
now).
I get an event 13508 for the DFS link that has the sharing violations
but not for sysvol or the other 2 DFS links. Files from the main
server still replicate to the DFS link getting the 13508 event so it
doesn't seem to amount to much. The main server shows the same 13508
for the same DFS link but nothing else.
Replication between the two servers has worked fine in the past. It
just started doing this around July 27. The only events I recall
from that time was one of the servers ran out of hard drive space
around then, but it's been fixed and there is plenty of free space on
both now. The other event was a power outage but the UPS has
shutdown software running so the servers would have shut themselves
down.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
That is a good idea, something I had not tried before. I did as you
suggested and waited for the autogenerated connection objects to show up
on the other server. Then I tried creating files in the sysvol folder
of both servers and waiting to see if it would replicate. Unfortunately,
it's still only going in one direction.
Now the interesting bit that doing all this has revealed to me is in the
Replica: DOMAIN SYSTEM VOLUME (SYSVOL SHARE)
(057026ff-1bcb-4c05-ab8c86d344b069be)
Member: REMOTESERVER ServiceState: 3 (ACTIVE) OutLogSeqNum: 182
OutlogCleanup: 182 Delta: 0
Config Flags: Multimaster Online
Root Path : c:\winnt\sysvol\domain
Staging Path: c:\winnt\sysvol\staging\domain
File Filter : *.tmp, *.bak, ~*
Send Cleanup Cos
Partner I/O State Rev LastJoinTime OLog
State Leadx Delta Trailx Delta LMT Out Last VVJoin
DOM\MAINSRV$ In Joined 7 Sun Aug 21, 2005 21:00:27
DOM\MAINSRV$ Out Unjoined 7 Sun Aug 21, 2005 21:00:27
OLP_INACTIVE 182 0 182 0 0 0 Sun Aug 21, 2005
18:15:01
The OLP_INACTIVE status seems to indicate that the main server is
somehow refusing to accept changes, though I was not able to find any
hints on how to solve this. Perhaps a non authoritative (D2) restore on
this machine? I had been assuming that this computer was fine since it
was sending out changes properly, perhaps my logic was reversed and it's
more important to think about which server is not taking the updates.
Any hints from anyone before I do something crazy?
Oh one other thing. All the DFS shares do not show this unjoined
status, only the sysvol shows this status. All DFS entires show that
both in and outbound replication is joined and the status shows
OLP_ELIGIBLE but all 3 DFS replicas also only replicate changes into the
remote server but not out from it just like sysvol.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler (mike-newsgroup@-deletethispart-.upcraft.com)
Ace Fekay [MVP]
2005-08-22 04:20:07 UTC
Permalink
In news:iBbOe.73325$***@fe02.news.easynews.com,
Mike Drechsler - SPAM PROTECTED EMAIL
<mike-newsgroup@-DELETETHISPART-.upcraft.com> made this post, which I then I
commented on below
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Oh one other thing. All the DFS shares do not show this unjoined
status, only the sysvol shows this status. All DFS entires show that
both in and outbound replication is joined and the status shows
OLP_ELIGIBLE but all 3 DFS replicas also only replicate changes into
the remote server but not out from it just like sysvol.
Besides being a possible DNS config issue, this more and more looks like a
possible firewall or VPN issue. As long as the SRVs are available for each
DC, and it will resolve the queries for the service locations, then we can
rule that out. You can try to point all the DCs to one DNS server, the one
it's working from, and test that out, and it doesn't work, then we can rule
out DNS issues, unless the routers cannot handle ENDS0, which allows UDP
response traffic greater than 512 bytes. You will have to check the router's
vendore docs on that.

However, to continue, there was a previous poster I was helping out months
ago with a similar situation. He allowed me to remote into it to work on it,
and we wound up working on it together (over the phone too) for a couple of
days. It racked my brain. Finally I asked about the VPNs and what he may
have changed, if anything. I asked to see his VPN connection properties on
his routers, specifically the MTU settings. It turned out he recently
upgraded one of his VPN router's firmware and specifically the MTU (I forget
the name brand). The new firmware, for some reason, forced the MTU to 1492.
It would not allow me to change it to 1500, which is default and where it
should be. The other end was the default 1500. Usually ADSL will cause a
lower MTU since it uses the 8 bytes for header info. But he had T1 lines. It
didn't make sense. Luckily he had saved the previous config file. Once he
replaced the old version, everything started working again, replication
errors disappeared and we were all happy! Apparently something was up with
the newer firmware. He wound up upgrading the VPN routers with Netscreens,
which are very effective units.
--
Regards,
Ace

Please direct all replies ONLY to the Microsoft public newsgroups
so all can benefit.

This posting is provided "AS-IS" with no warranties or guarantees
and confers no rights.

Ace Fekay, MCSE 2003 & 2000, MCSA 2003 & 2000, MCSE+I, MCT, MVP
Microsoft Windows MVP - Windows Server - Directory Services
Infinite Diversities in Infinite Combinations.
=================================
Mike Drechsler - SPAM PROTECTED EMAIL
2005-08-22 07:44:50 UTC
Permalink
Post by Ace Fekay [MVP]
Mike Drechsler - SPAM PROTECTED EMAIL
commented on below
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Oh one other thing. All the DFS shares do not show this unjoined
status, only the sysvol shows this status. All DFS entires show that
both in and outbound replication is joined and the status shows
OLP_ELIGIBLE but all 3 DFS replicas also only replicate changes into
the remote server but not out from it just like sysvol.
Besides being a possible DNS config issue, this more and more looks like a
possible firewall or VPN issue. As long as the SRVs are available for each
DC, and it will resolve the queries for the service locations, then we can
rule that out. You can try to point all the DCs to one DNS server, the one
it's working from, and test that out, and it doesn't work, then we can rule
out DNS issues, unless the routers cannot handle ENDS0, which allows UDP
response traffic greater than 512 bytes. You will have to check the router's
vendore docs on that.
However, to continue, there was a previous poster I was helping out months
ago with a similar situation. He allowed me to remote into it to work on it,
and we wound up working on it together (over the phone too) for a couple of
days. It racked my brain. Finally I asked about the VPNs and what he may
have changed, if anything. I asked to see his VPN connection properties on
his routers, specifically the MTU settings. It turned out he recently
upgraded one of his VPN router's firmware and specifically the MTU (I forget
the name brand). The new firmware, for some reason, forced the MTU to 1492.
It would not allow me to change it to 1500, which is default and where it
should be. The other end was the default 1500. Usually ADSL will cause a
lower MTU since it uses the 8 bytes for header info. But he had T1 lines. It
didn't make sense. Luckily he had saved the previous config file. Once he
replaced the old version, everything started working again, replication
errors disappeared and we were all happy! Apparently something was up with
the newer firmware. He wound up upgrading the VPN routers with Netscreens,
which are very effective units.
Another good idea.

I did some MTU tests, messed with the MTU sizes on the routers on either
end and I'm 99.99% sure that there is no MTU or WAN issues blocking
replication. I changed the MTU values of the tunnels to be manualy set
to make sure. I can do RPC communication (view event logs remotely)
withough problems on either machine. I can transfer large or small
files without problems. I can do ping tests with the -f (do not
fragment) switch and it correctly reports the packet requires
fragmenting when it reaches a certain size with no "gap" where it simply
goes into a request timed out mode. (IE. packets size 1416 works, size
1417 gives "packet requires fragmenting but DF bit set" as it should).
The routers on both ends have no packet filters installed between the
sites, it's wide open between the two for traffic on any port, any
protocol, and any address. Packetloss as measured by ping tests with
1416byte data sizes show 0 lost packets and over 1000 received while
transferring an 80GB file from the remote server to the main server.

I let it run for about 90 minutes and did a restart of both servers just
after changing the MTU values on the routers. It is only doing the
replication in a single direction. As a further test I created a new
DFS link with some test folders. I threw a few text files into the
remoteserver and set it as the master when enabling replication. After
everything settled down the files appeared on the main server as you
would expect but after this, new files added on the remote server or
changes to existing files are not being replicated to the main server.
Changes on the main server are replicating to the offsite server just
the same as all the other DFS and sysvol folders so even a brand new
folder setup exibits the problem which means D4 and D2 restore is not
likely going to help me either.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler (mike-newsgroup@-deletethispart-.upcraft.com)
Ace Fekay [MVP]
2005-08-22 10:47:30 UTC
Permalink
In news:SnfOe.76846$***@fe02.news.easynews.com,
Mike Drechsler - SPAM PROTECTED EMAIL
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Another good idea.
I did some MTU tests, messed with the MTU sizes on the routers on
either end and I'm 99.99% sure that there is no MTU or WAN issues
blocking replication. I changed the MTU values of the tunnels to be
manualy set to make sure. I can do RPC communication (view event
logs remotely) withough problems on either machine. I can transfer
large or small files without problems. I can do ping tests with the
-f (do not fragment) switch and it correctly reports the packet
requires fragmenting when it reaches a certain size with no "gap"
where it simply goes into a request timed out mode. (IE. packets
size 1416 works, size 1417 gives "packet requires fragmenting but DF
bit set" as it should). The routers on both ends have no packet
filters installed between the sites, it's wide open between the two
for traffic on any port, any protocol, and any address. Packetloss
as measured by ping tests with 1416byte data sizes show 0 lost
packets and over 1000 received while transferring an 80GB file from
the remote server to the main server.
I let it run for about 90 minutes and did a restart of both servers
just after changing the MTU values on the routers. It is only doing
the replication in a single direction. As a further test I created a
new DFS link with some test folders. I threw a few text files into
the remoteserver and set it as the master when enabling replication. After
everything settled down the files appeared on the main server
as you would expect but after this, new files added on the remote
server or changes to existing files are not being replicated to the
main server. Changes on the main server are replicating to the
offsite server just the same as all the other DFS and sysvol folders
so even a brand new folder setup exibits the problem which means D4
and D2 restore is not likely going to help me either.
What did you change the MTU to? Are you saying the MTU is set to 1500 on
both sides now? They should be left alone at 1500. If not, LDAP loses it's
ability to communicate, even though RPC will work fine.

What sort of line do you have, T1, ADSL or cable?

Can we see an edited ipconfig /all from both DCs please?

Ace
Mike Drechsler - SPAM PROTECTED EMAIL
2005-08-22 15:50:12 UTC
Permalink
Post by Ace Fekay [MVP]
Mike Drechsler - SPAM PROTECTED EMAIL
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Another good idea.
I did some MTU tests, messed with the MTU sizes on the routers on
either end and I'm 99.99% sure that there is no MTU or WAN issues
blocking replication. I changed the MTU values of the tunnels to be
manualy set to make sure. I can do RPC communication (view event
logs remotely) withough problems on either machine. I can transfer
large or small files without problems. I can do ping tests with the
-f (do not fragment) switch and it correctly reports the packet
requires fragmenting when it reaches a certain size with no "gap"
where it simply goes into a request timed out mode. (IE. packets
size 1416 works, size 1417 gives "packet requires fragmenting but DF
bit set" as it should). The routers on both ends have no packet
filters installed between the sites, it's wide open between the two
for traffic on any port, any protocol, and any address. Packetloss
as measured by ping tests with 1416byte data sizes show 0 lost
packets and over 1000 received while transferring an 80GB file from
the remote server to the main server.
I let it run for about 90 minutes and did a restart of both servers
just after changing the MTU values on the routers. It is only doing
the replication in a single direction. As a further test I created a
new DFS link with some test folders. I threw a few text files into
the remoteserver and set it as the master when enabling replication. After
everything settled down the files appeared on the main server
as you would expect but after this, new files added on the remote
server or changes to existing files are not being replicated to the
main server. Changes on the main server are replicating to the
offsite server just the same as all the other DFS and sysvol folders
so even a brand new folder setup exibits the problem which means D4
and D2 restore is not likely going to help me either.
What did you change the MTU to? Are you saying the MTU is set to 1500 on
both sides now? They should be left alone at 1500. If not, LDAP loses it's
ability to communicate, even though RPC will work fine.
What sort of line do you have, T1, ADSL or cable?
Can we see an edited ipconfig /all from both DCs please?
Ace
MTU of the ethernet interfaces on the routers is 1500
MTU of the IPSEC tunnels is 1444
It is an ADSL connection but does not use PPPoE.
The best way to test MTU to my knowledge is using ping with the do not
fragment flag set (-f on command line). It should report success for
packet sizes smaller than the MTU (minus size of packet headers) until
you hit the MTU where it should start to warn you that it could not send
the packet because the DF bit was set. I get this behaviour from both
sides of the link. Before changing the MTU setting of the tunnel like
you suggested, there was a point where I was getting "request timed out"
for packet sizes above 1444 when the DF bit was set on the ping packet.
The tunnel MTU was previously set to 1723 before I changed it.
Windows automatic path MTU detection may have been working, because
pings without the DF flag would work at the larger packet sizes before I
made that change. Replication behaviour did not change as a result of
fixing the MTU setting for the tunnel.


ipconfig /all for main server:

Windows 2000 IP Configuration

Host Name . . . . . . . . . . . . : mainsrv
Primary DNS Suffix . . . . . . . : domain.local
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.local

Ethernet adapter Local Area Connection:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Compaq NC3163 Fast Ethernet NIC
Physical Address. . . . . . . . . : 00-50-8B-CB-5F-11
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.0.88
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.0.2
DNS Servers . . . . . . . . . . . : 127.0.0.1
192.168.42.155
Primary WINS Server . . . . . . . : 192.168.0.88

ipconfig /all for remote server:

Windows 2000 IP Configuration

Host Name . . . . . . . . . . . . : remotesrv
Primary DNS Suffix . . . . . . . : domain.local
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.local

Ethernet adapter Local Area Connection:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : 3Com EtherLink XL 10/100 PCI TX NIC
(3C905B-TX)
Physical Address. . . . . . . . . : 00-50-04-F4-13-BB
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.42.155
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.42.1
DNS Servers . . . . . . . . . . . : 127.0.0.1
192.168.0.88
Primary WINS Server . . . . . . . : 192.168.42.155
Secondary WINS Server . . . . . . : 192.168.0.88
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler (mike-newsgroup@-deletethispart-.upcraft.com)
Ace Fekay [MVP]
2005-08-23 02:48:32 UTC
Permalink
In news:UumOe.83933$***@fe07.news.easynews.com,
Mike Drechsler - SPAM PROTECTED EMAIL
Post by Mike Drechsler - SPAM PROTECTED EMAIL
MTU of the ethernet interfaces on the routers is 1500
MTU of the IPSEC tunnels is 1444
It is an ADSL connection but does not use PPPoE.
The best way to test MTU to my knowledge is using ping with the do not
fragment flag set (-f on command line). It should report success for
packet sizes smaller than the MTU (minus size of packet headers) until
you hit the MTU where it should start to warn you that it could not
send the packet because the DF bit was set. I get this behaviour
from both sides of the link. Before changing the MTU setting of the
tunnel like you suggested, there was a point where I was getting
"request timed out" for packet sizes above 1444 when the DF bit was
set on the ping packet. The tunnel MTU was previously set to 1723
before I changed it. Windows automatic path MTU detection may have been
working, because
pings without the DF flag would work at the larger packet sizes
before I made that change. Replication behaviour did not change as a
result of fixing the MTU setting for the tunnel.
Windows 2000 IP Configuration
Host Name . . . . . . . . . . . . : mainsrv
Primary DNS Suffix . . . . . . . : domain.local
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.local
Description . . . . . . . . . . . : Compaq NC3163 Fast Ethernet NIC
Physical Address. . . . . . . . . : 00-50-8B-CB-5F-11
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.0.88
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.0.2
DNS Servers . . . . . . . . . . . : 127.0.0.1
192.168.42.155
Primary WINS Server . . . . . . . : 192.168.0.88
Windows 2000 IP Configuration
Host Name . . . . . . . . . . . . : remotesrv
Primary DNS Suffix . . . . . . . : domain.local
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.local
Description . . . . . . . . . . . : 3Com EtherLink XL 10/100 PCI TX
NIC (3C905B-TX)
Physical Address. . . . . . . . . : 00-50-04-F4-13-BB
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.42.155
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.42.1
DNS Servers . . . . . . . . . . . : 127.0.0.1
192.168.0.88
Primary WINS Server . . . . . . . : 192.168.42.155
Secondary WINS Server . . . . . . : 192.168.0.88
It's recommended to change the local loopback to the actual IP address of
the server.

As for the MTU, you are correct on how to test it. I'm somewhat surprised
the VPN is set to that low of an MTU of 1444. You said the VPN tunnel was
set to 1723? That sounds like a port number, rather than an MTU? Max MTUs is
1500 for TCP/IP. Max Transmit Unit or packet size, is what it refers to,
which is 1500 for TCP/IP. So I am a little confused on the 1723 part. All in
all, if the MTU is lower than 1500, LDAP communication fails.

Anyway, back to the ADSL connection. If it is not PPPoE, is it a routed
connection, such as what SDSL uses or T1? What ISP is it. I've seen
replication issues with any sort of ADSL. ADSL requires an 8byte overhead
for data transmission. By default, the router you are using will drop it to
1492 for ADSL to work.

Ace
Mike Drechsler - SPAM PROTECTED EMAIL
2005-08-23 06:04:37 UTC
Permalink
Post by Ace Fekay [MVP]
Mike Drechsler - SPAM PROTECTED EMAIL
Post by Mike Drechsler - SPAM PROTECTED EMAIL
MTU of the ethernet interfaces on the routers is 1500
MTU of the IPSEC tunnels is 1444
It is an ADSL connection but does not use PPPoE.
The best way to test MTU to my knowledge is using ping with the do not
fragment flag set (-f on command line). It should report success for
packet sizes smaller than the MTU (minus size of packet headers) until
you hit the MTU where it should start to warn you that it could not
send the packet because the DF bit was set. I get this behaviour
from both sides of the link. Before changing the MTU setting of the
tunnel like you suggested, there was a point where I was getting
"request timed out" for packet sizes above 1444 when the DF bit was
set on the ping packet. The tunnel MTU was previously set to 1723
before I changed it. Windows automatic path MTU detection may have been
working, because
pings without the DF flag would work at the larger packet sizes
before I made that change. Replication behaviour did not change as a
result of fixing the MTU setting for the tunnel.
Windows 2000 IP Configuration
Host Name . . . . . . . . . . . . : mainsrv
Primary DNS Suffix . . . . . . . : domain.local
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.local
Description . . . . . . . . . . . : Compaq NC3163 Fast Ethernet NIC
Physical Address. . . . . . . . . : 00-50-8B-CB-5F-11
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.0.88
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.0.2
DNS Servers . . . . . . . . . . . : 127.0.0.1
192.168.42.155
Primary WINS Server . . . . . . . : 192.168.0.88
Windows 2000 IP Configuration
Host Name . . . . . . . . . . . . : remotesrv
Primary DNS Suffix . . . . . . . : domain.local
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.local
Description . . . . . . . . . . . : 3Com EtherLink XL 10/100 PCI TX
NIC (3C905B-TX)
Physical Address. . . . . . . . . : 00-50-04-F4-13-BB
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.42.155
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.42.1
DNS Servers . . . . . . . . . . . : 127.0.0.1
192.168.0.88
Primary WINS Server . . . . . . . : 192.168.42.155
Secondary WINS Server . . . . . . : 192.168.0.88
It's recommended to change the local loopback to the actual IP address of
the server.
As for the MTU, you are correct on how to test it. I'm somewhat surprised
the VPN is set to that low of an MTU of 1444. You said the VPN tunnel was
set to 1723? That sounds like a port number, rather than an MTU? Max MTUs is
1500 for TCP/IP. Max Transmit Unit or packet size, is what it refers to,
which is 1500 for TCP/IP. So I am a little confused on the 1723 part. All in
all, if the MTU is lower than 1500, LDAP communication fails.
Anyway, back to the ADSL connection. If it is not PPPoE, is it a routed
connection, such as what SDSL uses or T1? What ISP is it. I've seen
replication issues with any sort of ADSL. ADSL requires an 8byte overhead
for data transmission. By default, the router you are using will drop it to
1492 for ADSL to work.
Ace
1500 is not the MTU of TCP/IP. It is the MTU of Ethernet. The IP
protocol can go higher than this on a physical interface that supports
larger packet sizes such as gigabit ethernet. It can also go lower on
interfaces like ATM that have a smaller frame size. Ethernet has just
become so common that you might assume that 1500 is somehow tied to
basic TCP/IP specifications. LDAP will work just fine with MTU values
lower than 1500 unless you are thinking about the problem experienced
with Windows 2003 sp1 or the MS05-019 patch resolved in a hotfix
referred to in KB article 898060? This only applies to the 2003
environment under those specific conditions mentioned in the KB article.
We are using Windows 2000 here so will not be subject to that
particular bug. As long as the MTU is larger than the minimum segment
size but equal or smaller than the actual MTU of the network path,
things will work.

The ADSL connection is routed. The ISP is Telus, specifically it is
their advanced communications division (TAC) or whatever they now call
it. The product is called managed business internet. It is separate
from their consumer service which uses DHCP for address assignment. In
either case of managed or consumer service the MTU available to the
customer is 1500 on the ethernet network interface on the modems. The
internal data transmission structure at Telus uses ATM to encapsulate
the modem traffic to be delivered to an aggregation point where it
enters their Internet backbone. The ATM encapsulation is transparent to
the ethernet link and it does not affect the MTU.

Now, while MTU may not be a problem here, you may have gotten me looking
towards something that may be a problem. While trying to verify that I
could do LDAP queries to the remote server from both sides of the link I
had troubles using AD Users and Computers from the remote server when I
tried to connect to the mainserver domain controller by right clicking
on the domain name and choosing "Connect to domain controller...". The
error message that came up is something about RPC. I tried it a second
time and this time it worked. It has worked every time since the
initial RPC error. I started digging around and found when I ran
windows update and showed the hidden updates, that there is an update
available for the network card drivers on the main machine. Normally I
would think it's a shot in the dark but I installed the newest version
and that's where I am now. I can't run anymore tests on it right now
but I will follow up a bit later on my current status.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler (mike-newsgroup@-deletethispart-.upcraft.com)
Ace Fekay [MVP]
2005-08-23 10:56:55 UTC
Permalink
In news:V%yOe.70894$***@fe11.news.easynews.com,
Mike Drechsler - SPAM PROTECTED EMAIL
Post by Mike Drechsler - SPAM PROTECTED EMAIL
1500 is not the MTU of TCP/IP. It is the MTU of Ethernet. The IP
protocol can go higher than this on a physical interface that supports
larger packet sizes such as gigabit ethernet. It can also go lower on
interfaces like ATM that have a smaller frame size. Ethernet has just
become so common that you might assume that 1500 is somehow tied to
basic TCP/IP specifications. LDAP will work just fine with MTU
values lower than 1500 unless you are thinking about the problem
experienced with Windows 2003 sp1 or the MS05-019 patch resolved in a
hotfix referred to in KB article 898060? This only applies to the
2003 environment under those specific conditions mentioned in the KB
article. We are using Windows 2000 here so will not be subject to
that particular bug. As long as the MTU is larger than the minimum
segment size but equal or smaller than the actual MTU of the network
path, things will work.
I was jus trying to point out issues with MTUs and domain communication. If
this is not the cause, we need to look elsewhere.
Post by Mike Drechsler - SPAM PROTECTED EMAIL
The ADSL connection is routed. The ISP is Telus, specifically it is
their advanced communications division (TAC) or whatever they now call
it. The product is called managed business internet. It is separate
from their consumer service which uses DHCP for address assignment. In
either case of managed or consumer service the MTU available to the
customer is 1500 on the ethernet network interface on the modems. The
internal data transmission structure at Telus uses ATM to encapsulate
the modem traffic to be delivered to an aggregation point where it
enters their Internet backbone. The ATM encapsulation is transparent
to the ethernet link and it does not affect the MTU.
That's interesting. I haven't heard of a routed ADSL connection and not
using PPPoE up until now.
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Now, while MTU may not be a problem here, you may have gotten me
looking towards something that may be a problem. While trying to
verify that I could do LDAP queries to the remote server from both
sides of the link I had troubles using AD Users and Computers from
the remote server when I tried to connect to the mainserver domain
controller by right clicking on the domain name and choosing "Connect
to domain controller...". The error message that came up is
something about RPC. I tried it a second time and this time it
worked. It has worked every time since the initial RPC error. I
started digging around and found when I ran windows update and showed
the hidden updates, that there is an update available for the network
card drivers on the main machine. Normally I would think it's a shot
in the dark but I installed the newest version and that's where I am
now. I can't run anymore tests on it right now but I will follow up
a bit later on my current status.
Keep us informed. Maybe you can try some packet sniffing to see exactly
what's going on.

Ace
Mike Drechsler - SPAM PROTECTED EMAIL
2005-08-23 22:38:11 UTC
Permalink
Post by Ace Fekay [MVP]
Mike Drechsler - SPAM PROTECTED EMAIL
Post by Mike Drechsler - SPAM PROTECTED EMAIL
1500 is not the MTU of TCP/IP. It is the MTU of Ethernet. The IP
protocol can go higher than this on a physical interface that supports
larger packet sizes such as gigabit ethernet. It can also go lower on
interfaces like ATM that have a smaller frame size. Ethernet has just
become so common that you might assume that 1500 is somehow tied to
basic TCP/IP specifications. LDAP will work just fine with MTU
values lower than 1500 unless you are thinking about the problem
experienced with Windows 2003 sp1 or the MS05-019 patch resolved in a
hotfix referred to in KB article 898060? This only applies to the
2003 environment under those specific conditions mentioned in the KB
article. We are using Windows 2000 here so will not be subject to
that particular bug. As long as the MTU is larger than the minimum
segment size but equal or smaller than the actual MTU of the network
path, things will work.
I was jus trying to point out issues with MTUs and domain communication. If
this is not the cause, we need to look elsewhere.
Post by Mike Drechsler - SPAM PROTECTED EMAIL
The ADSL connection is routed. The ISP is Telus, specifically it is
their advanced communications division (TAC) or whatever they now call
it. The product is called managed business internet. It is separate
from their consumer service which uses DHCP for address assignment. In
either case of managed or consumer service the MTU available to the
customer is 1500 on the ethernet network interface on the modems. The
internal data transmission structure at Telus uses ATM to encapsulate
the modem traffic to be delivered to an aggregation point where it
enters their Internet backbone. The ATM encapsulation is transparent
to the ethernet link and it does not affect the MTU.
That's interesting. I haven't heard of a routed ADSL connection and not
using PPPoE up until now.
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Now, while MTU may not be a problem here, you may have gotten me
looking towards something that may be a problem. While trying to
verify that I could do LDAP queries to the remote server from both
sides of the link I had troubles using AD Users and Computers from
the remote server when I tried to connect to the mainserver domain
controller by right clicking on the domain name and choosing "Connect
to domain controller...". The error message that came up is
something about RPC. I tried it a second time and this time it
worked. It has worked every time since the initial RPC error. I
started digging around and found when I ran windows update and showed
the hidden updates, that there is an update available for the network
card drivers on the main machine. Normally I would think it's a shot
in the dark but I installed the newest version and that's where I am
now. I can't run anymore tests on it right now but I will follow up
a bit later on my current status.
Keep us informed. Maybe you can try some packet sniffing to see exactly
what's going on.
Ace
Still not working. Darn.

Perhaps I will go grab the other server and bring it on site.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler (mike-newsgroup@-deletethispart-.upcraft.com)
Ace Fekay [MVP]
2005-08-24 00:15:57 UTC
Permalink
In news:nzNOe.90680$***@fe11.news.easynews.com,
Mike Drechsler - SPAM PROTECTED EMAIL
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Still not working. Darn.
Perhaps I will go grab the other server and bring it on site.
Sorry to hear that. If you can bring it to the same site, and it does work,
then we'll know where the problem lies. Don't forget to update Sites and
Services (if using Sites) to reflect the new IP of the DC when you bring it
online locally.

Ace
Mike Drechsler - SPAM PROTECTED EMAIL
2005-09-13 08:22:09 UTC
Permalink
Post by Ace Fekay [MVP]
Mike Drechsler - SPAM PROTECTED EMAIL
Post by Mike Drechsler - SPAM PROTECTED EMAIL
Still not working. Darn.
Perhaps I will go grab the other server and bring it on site.
Sorry to hear that. If you can bring it to the same site, and it does work,
then we'll know where the problem lies. Don't forget to update Sites and
Services (if using Sites) to reflect the new IP of the DC when you bring it
online locally.
Ace
Just wanted to follow up on this thread with the eventual "resolution".

I never determined an exact cause for FRS failing other than to blame it
on some bad hardware. At some point in time the RAID controller went
sour and started infrequently corrupting data read or written to the
hard drive. All indications were that things were operating normally
other than the FRS replication problems. Then when creating a zip
backup of a large number of files to move to another machine we found
the zip file was damaged. Doing it again resulted in another corrupted
zip file but a different file was corrupted in the zip archive this
time. Doing it a third time and the zip file was fine.

I have had to remove the RAID controller and rebuild the software on the
server to get it working. Also found it corrupted a table in an SQL
database that required some manual rebuilding to fix some bad records.

Very ugly.
--
WARNING! Email address has been altered for spam resistance.
Please remove the -deletethispart-. section before replying directly.
Mike Drechsler (mike-newsgroup@-deletethispart-.upcraft.com)
Ace Fekay [MVP]
2005-09-13 10:24:54 UTC
Permalink
This post might be inappropriate. Click to display it.
Continue reading on narkive:
Loading...