FAQ

Frequently Asked Questions

Purpose

To enable operators to quickly diagnose Loggregator related issues.

This FAQ will try and consolidate some helpful troubleshooting steps to acknowledge some common questions that Loggregator has received.

Questions

TODO: How do I enable syslog forwarding for a job?
TODO: How can I debug my Loggregator components?
How do I get etcd data when it is in TLS mode?
How do I disable UAA for Traffic Controller?
What do the Doppler properties mean?
What do the Metron properties mean?
What do the Traffic Controller properties mean?
Why is the DEA Logging Agent run as root?
Why do I get this can't forward message: loggregator client pool is empty error?

Q: How can I debug my Loggregator components?

Loggregator is a complex subcomponent of Cloud Foundry with many components on its own. We'll try to describe how we can better help you troubleshoot Loggregator in case you are having problems seeing your logs.

Rough thoughts/ideas for further expansion. Topics to expand:

Datadog
- visualize metrics
- Datadog Firehose Nozzle
- Datadog Config OSS
Number of connections opened by component
- lsof -c doppler
- lsof -c trafficco ...
Pprof
- Add SHA or release version from when this feature will be provided
- curl http://<IP>:{6060|6061}/debug/pprof/
- go tool pprof http://<IP>:{6060|6061}/debug/pprof/heap
- Memory Dump, Goroutine dump, CPU profile.
Goroutine dump
- SIGUSR1 signal to process
--debug flag to the process
- Not efficient because it requires process restart
Calls to CC and UAA are timing out
- Check the access log in GoRouter to see if the request to CC and UAA are making it through. If you don't see it, it could be an IaaS issue. Provide AWS example. Soln: Switch from NAT gateway to NAT instance in AWS.
etcd
- Check if Doppler's are advertising and Metron's are listening
- Check the health of the etcd cluster

Back to Top

Q: How do I get etcd data when it is in TLS mode?

If your CF environment has etcd deployed in TLS mode, you will no longer be able to simply curl the data out. Here are a few steps in order to get the data out to help troubleshoot.

bosh ssh etcd_z1/0
cd /var/vcap/packages/etcd/
In order to get the list of available keys,

./etcdctl \
--cert-file /var/vcap/jobs/etcd/config/certs/client.crt \
--key-file /var/vcap/jobs/etcd/config/certs/client.key \
--ca-file /var/vcap/jobs/etcd/config/certs/server-ca.crt \
-C https://etcd-z1-0.cf-etcd.service.cf.internal:4001 \
ls doppler/meta --recursive

You should see output similar to the output below

/doppler/meta/z1
/doppler/meta/z1/doppler_z1
/doppler/meta/z1/doppler_z1/e27e8ab6-e29c-446d-a0dd-c692c7d16dd1
/doppler/meta/z1/doppler_z1/63af35d8-d233-422f-a389-e893f4d5b7ee
/doppler/meta/z1/doppler_z1/3a45b944-24dc-4563-bbae-fc53d5bacc43
/doppler/meta/z1/doppler_z1/51737ccd-5e14-4439-8dd1-c0e3ce2aca56

Get the value of a key,

./etcdctl \ 
--cert-file /var/vcap/jobs/etcd/config/certs/client.crt \ 
--key-file /var/vcap/jobs/etcd/config/certs/client.key \
--ca-file /var/vcap/jobs/etcd/config/certs/server-ca.crt \
-C https://etcd-z1-0.cf-etcd.service.cf.internal:4001 \
get /doppler/meta/z1/doppler_z1/e27e8ab6-e29c-446d-a0dd-c692c7d16dd1

Note: The value https://etcd-z1-0.cf-etcd.service.cf.internal:4001 can be found within the EtcdUrls property in the config files. For example, /var/vcap/jobs/doppler/config/doppler.json

Back to Top

Q: How do I disable UAA for the Traffic Controller?

Traffic Controller has a property in its spec called traffic_controller.disable_access_control.

By default this is false. This is not a config property but rather a flag passed in to the traffic controller. See here.

Setting this property will make the logAccessAuthorizer and the adminAuthorizer always allow access to the app logs and firehose.

This feature was originally created so that Loggregator could be used in Lattice.

Back to Top

Q: Why is the DEA Logging Agent run as root?

DEA Logging Agent runs as root because it needs to read the stdout and stderr unix sockets created for the jailed container application by warden.

Back to Top

Q: Why do I get this can't forward message: loggregator client pool is empty error?

This error message shows up in the Metron logs if it doesn't have any registered Dopplers in its client pool.

Issue 1 - Can't find ETCD

It could be that Metron or Doppler cannot communicate with its Key-Value store ETCD.

Look for the error message Failed to connect to etcd in the logs.
Verify you can access ETCD.

Verify ETCD urls in the Metron config /var/vcap/jobs/metron_agent/config/metron_agent.json.
Try pinging ETCD to see if Doppler has advertised itself correctly.

# Old Doppler Endpoint
curl http://<your_etcd_ip>:<port/4001>/v2/keys/healthstatus/doppler?recursive=true

# New Doppler Endpoint
curl http://<your_etcd_ip>:<port/4001>/v2/keys/doppler/meta?recursive=true

The older endpoint will contain just the Doppler IP. The newer endpoint will contain json that may look like this.

{ "version": 1, "endpoints":["udp://<doppler_ip>:<port>", "tls://<doppler_ip:<port>"]}

Issue 2 - Mismatch ETCD keys

If you see values being populated in either of the endpoints then it means your Doppler and Metron can both see ETCD and read/write to it.

Look at the ETCD key that Doppler is advertising. It should have the following structure.
```
# Old
/healthstatus/doppler/<zone>/<job_name>/<index>

# New
/doppler/meta/<zone>/<job_name>/<index>
```
Compare each of these properties to the config within Metron - they should match.

We have come across scenarios where Doppler was on a different zone and was advertising zone1 whereas Metron was configured with property "Zone": "zone2",.

This makes Metron look for a different key and thus unable to find the Doppler IP and protocol.

Issue 3 - ETCD is in a weird state

We came across a situation where ETCD got into a weird state and its process needed to be restarted. The tracker story is here and should be resolved.

Basically killall etcd

Back to Top

Home
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FAQ

Frequently Asked Questions

Purpose

Questions

Q: How can I debug my Loggregator components?

Q: How do I get etcd data when it is in TLS mode?

Q: How do I disable UAA for the Traffic Controller?

Q: Why is the DEA Logging Agent run as root?

Q: Why do I get this can't forward message: loggregator client pool is empty error?

Issue 1 - Can't find ETCD

Issue 2 - Mismatch ETCD keys

Issue 3 - ETCD is in a weird state

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally