Experiences with Tracing Causality in Networked Services
Experiences with Tracing Causality in Networked Services Rodrigo Fonseca, Brown Michael Freedman, Princeton George Porter, UCSD April 2010 INM/WREN San Jose, CA Which way to Bangalore? Troubleshooting Networked Systems Hard to develop, debug, deploy, troubleshoot No standard way to integrate debugging, monitoring, diagnostics Status quo: device centric
statement: select oid... Status quo: device centric Determining paths: Join logs on time and ad-hoc identifiers Relies on well synchronized clocks extensive application knowledge Requires all operations logged to guarantee complete paths This talk Causality Tracking: an alternative Many previous frameworks: X-Trace, PIP, Whodunit, Magpie, Googles Dapper
Experiences integrating and using X-Trace Outline Tracing causality with X-Trace Case studies 802.1X Authentication Service CoralCDN and OASIS anycast service Challenges Conclusion Outline Tracing causality with X-Trace Case studies 802.1X Authentication Service CoralCDN and OASIS anycast service
Challenges Conclusion X-Trace X-Trace records events in a distributed execution and their causal relationship Events are grouped into tasks Well defined starting event and all that is causally related Each event generates a report, binding it to one or more preceding events Captures full happens-before relation X-Trace Output HTTP Client
HTTP Proxy TCP 1 Start IP TCP 1 End IP Router IP HTTP Server
TCP 2 Start IP TCP 2 End IP Router Task graph capturing task execution Nodes: events across layers, devices Edges: causal relations between events IP Router
IP Basic Mechanism a g [T, a] HTTP Client HTTP Proxy [T, a]
b TCP 1 End d IP TCP 2 Start i e IP Router HTTP
Server h f TCP 1 Start c n [T, g] IP IP
X-Trace Report TaskID: T EventID: g j kf Edge: from a, IP Router IP Router m TCP 2 End
Outline Tracing causality with X-Trace Case studies 802.1X Authentication Service CoralCDN and OASIS anycast service Challenges Conclusion 802.1X Authentication Service Authenticator e.g. Acc. Point Client EAP L2
RADIUS Over UDP Auth Server RADIUS Identity Store e.g. LDAP LDAP Identified 5 common authentication issues from vendor logs Added a few X-Trace instrumentation points sufficient to differentiate these faults Introduced faults in a test environment
802.1X Authentication Service Instrumentation was easy: Nested invocations No in-task concurrency Extensible protocols (RADIUS, LDAP) Modular, request-oriented server software 802.1X Example Faults Misconfigured Firewall: no LDAP Miscalibrated Timeout Value Key: multiple correlated vantage points Can help tune timeout values CoralCDN and OASIS Instrumented production deployment Heavy use of sampling: 0.1% of requests to CoralCDN traced
Leveraged libasync, libarpc X-Trace instrumentation Much more complex program flow E.g. windowed parallel RPC calls, variable timeouts Found bugs, performance problems, clock skews CoralCDN CoralCDN Distributed HTTP Cache CoralCDN Response Times 189s: Linux TCP Timeout connecting
to origin Slow connection Proxy -> Client Slow connection Origin -> Proxy Timeout in RPC, due to slow Planetlab node!
189 seconds Same symptoms, very different causes B/ s K 1 /s B K 10 /s KB 0 10 /s B 1M
Outline Brief X-Trace Intro Case studies 802.1X Authentication Service CoralCDN OASIS Anycast Service Challenges Conclusion Hidden Channels Example: CoralCDN DNS Calls DNS Resolver Tasks A
resolve(foo,*) B Send C foo DNS resolve * Receive In general: deferral structures E.g., queues, thread pools, continuations
Store metadata with the structure Often encapsulated in libraries, high leverage Incidental vs. Semantic Concurrency Forks and joins tricky for nave instrumentation Non-intuitive fork Incorrect join Incidental vs. Semantic Concurrency Extra code annotation fixes the problem Manually change parent of do() events Manually add edges from done() to end Dealing with Black Boxes client
proxy server Ideal scenario: all components instrumented with X-Trace Log all events Dealing with Black Boxes client proxy server Gray-box proxy: passes X-Trace metadata on Log events on the client and server Layering does this automatically
Dealing with Black Boxes client proxy server Black box proxy: drops X-Trace metadata No X-Trace events on proxy or server Can always trace around black box, in client Outline Brief X-Trace Intro Case studies 802.1X Authentication Service CoralCDN OASIS Anycast Service
Challenges Conclusion Revisiting Troubleshooting Device-centric Logs Depends on well syncd clocks Joins on ad-hoc identifiers Needs all ops logged for complete traces No modifications to existing code Task-centric traces Does not depend on clocks (can actually fix
them) Deterministic joins on standardized ids Sample-based tracing possible Requires instrumentation X-Trace Instrumentation Instrumenting is easy in most cases A few key libraries go a long way Can be done iteratively Refining expectations (a la Pip) Partial annotation still useful Independent instrumentation feasible Huge benefits Conclusions
Simple, uniform task graphs useful in debugging, troubleshooting, diagnostics Instrumentation is feasible Causal tracing should be a first-class concept in networked systems Thank you More details on paper For more info: www.x-trace.net www.coralcdn.org
New Jersey. Massachusetts. Connecticut. How did you do and Section 2! Well, how did you do at identifying the states?? Are you ready for the next part…Capitals and Abbreviations!! The following slides will have a state name on them.
Variable Frequency AC Source (VFACS) Capable of delivering 208 Vrms and 5 A (35.5V rms for initial testing) ... Use IGBT pairs and DC rails to amplify PWM signal. ... IRF520 MOSFET* FMG2G75US60 IGBT Pair with anti-parallel diodes.
Se organizan las "Brigadas mixtas". Milicias Brigadas Internacionales Partidos y fuerzas políticas Zona rebelde. Zona republicana. Zona rebelde Falange Española y de las JONS: surge de la fusión en 1934 de la Falange Española fundada por José Antonio Primo de...
Katalyzované reakce Kinetika katalyzovaných reakcí, vliv vnitřní difuze na selektivitu Heterogenní katalýza Chemisorpce Reakce látky s aktivním místem na povrchu katalyzátoru Adsorpční izoterma Závislost rovnovážného množství adsorbované látky na složení směsi Langmuirova izoterma adsorpce v jediné vrstvě stupeň pokrytí povrchu...
CORPORATISM (mixed autonomy): fewer groups compete, usually one for each interest sector (ex: labor, agriculture, management) Group monopoly over its sector is officially approved by state State Corporatism. Interest groups take the lead and dominate state Societal Corporatism or Neo-Corporatism
Criterion-Related Validity: Predictive Validity. Test scores are related to a criterion. Criterion is behavior that test scores are used to predict. Correlation coefficient X is test score and Y is criterion score. predictive or concurrent validity estimate is established by...
to explore ways and means for greater collaboration and coordination between ITU and relevant organizations2 involved in the development of IP-based networks and the future Internet, through cooperation agreements, as appropriate, in order to increase the role of ITU in...
Ready to download the document? Go ahead and hit continue!