Parsing for XML Developers - xFront

Parsing for XML Developers - xFront

Parsing for XML Developers Roger L. Costello 28 September 2014 Flat XML Document You might receive an XML document that has no structure. For example, this XML document contains a flat (linear) list of Book data: Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 2

Give it structure to facilitate processing Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications Parsing Techniques Dick Grune Ceriel J.H. Jacobs

2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 3 Thats parsing! Parsing is taking a flat (linear) sequence of items and adding structure so that the result conforms to a grammar.

4 Parsing Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications parse Parsing Techniques

Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 5

From the book: Parsing Techniques Parsing is the process of structuring a linear representation in accordance with a given grammar. The linear representation may be: A flat sequence of XML elements a sentence a computer program a knitting pattern a sequence of geological strata a piece of music actions of ritual behavior 6 Grammar A grammar is a succinct description of the structure. Here is a grammar for Books: Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text

Date text ISBN text Publisher text 7 Parsing Linear representation Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications Grammar Books Book+

Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Structured representation parser Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9

Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 8 Parsing Techniques Over the last 50 years many parsing techniques have been created. Some parsing techniques work from the starting grammar rule to the bottom. These are called top-down parsing techniques. Other parsing techniques work from the bottom grammar rules to the starting grammar rule. These are called bottom-up parsing techniques. The following slides show how to apply a powerful bottom-up parsing technique to the Books example. 9 What does powerful mean? The previous slide said, following slides show how to apply a

powerful bottom-up parsing technique Powerful means the technique can be used with lots of grammars, i.e., it can be used to generate lots of different structures. 10 Suppose we were to structure the XML from scratch. We might follow these steps: Parsing Techniques Parsing Techniques

continued on next slide 11 Follow these steps (cont.): Parsing Techniques Dick Grune Parsing Techniques Dick Grune Ceriel J.H. Jacobs

Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 continued on next slide 12 Follow these steps (cont.): Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007

978-0-387-20248-8 Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Dover Publications Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Dover Publications

and so forth, filling in the second Book then the third Book 13 Last step: add the last Books Publisher Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993

0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau

1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications last step adds this 14 Alternate view of the steps (a tree view) Books Books Books Books

Books Books Book Book Book Book Book Title Title Authors Title Authors Author Title continued

on next slide Authors Author Author 15 Alternate view (cont.) Title Books Books Books Book Book Book Authors Date

Author Author Title Authors Date Author Author ISBN Title Authors Date continued on next slide ISBN Publisher Author Author 16

Alternate view (cont.) Books Book Book Title Authors Date ISBN Publisher and so forth, filling in the second Book then the third Book Author Author 17 Last step: add the last Books Publisher Books

Book Title Authors Author Date Book Book ISBN Publisher Title Authors Date ISBN Publisher

Title Date ISBN Author Author Author Authors last step adds this Books Book Title Authors Author Author Date Book

Book ISBN Publisher Title Authors Author Date ISBN Publisher Title Authors Author Date ISBN

Publisher 18 Terminology: Production Step Parsing Techniques Parsing Techniques Each step is called a production step

21 Top down The previous slides showed the generation of the structured XML by starting from the top (root element) down to the bottom (leaf nodes). 19 Bottom-up parsing In bottom-up parsing we work backward: from the last step to the first step. 20 Lets begin One production step must have been the last and its result must be visible in the linear representation. We recognize the rule Publisher text in This gives us the final step in the production process (and the first step in bottom-up parsing): Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau

1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 22 Next We recognize the rule ISBN text in This gives us the next-to-last step in the production process (and the second step in bottom-up parsing): Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012

0-486-66697-2 Dover Publications 23 Next We recognize the rule Date text in This gives us the third step in bottom-up parsing: Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 24 Next We recognize the rule Author text in

This gives us the fourth step in bottom-up parsing: Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 25 Next We recognize the rule Authors Author+ in This gives us the fifth step in bottom-up parsing: Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer

Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 26 Next We recognize the rule Title text in This gives us the sixth step in bottom-up parsing: Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications

Introduction to Formal Languages Gyorgy E. Revesz 2012 0-486-66697-2 Dover Publications 27 Next We recognize the rule Book Title Authors Date ISBN Publisher in This gives us the seventh step in bottom-up parsing: Parsing Techniques Dick Grune Ceriel J.H. Jacobs 2007 978-0-387-20248-8 Springer Introduction to Graph Theory Richard J. Trudeau 1993 0-486-67870-9 Dover Publications Introduction to Formal Languages Gyorgy E. Revesz

2012 0-486-66697-2 Dover Publications 28 See the algorithm? See how we are working backwards, from the bottom grammar rules up to the starting grammar rule? In the process we are adding structure to the flat (linear) XML neat! 29 Terminology: Reduction In bottom-up parsing a collection of symbols are recognized as derived from a symbol. For example, Title, Authors, Date, ISBN, Publisher is derived from Book: Book Title Authors Date ISBN

Publisher Title, Authors, Date, ISBN, Publisher is reduced to Book So the bottom-up parsing process is a reduction process. 30 Build your own bottom up parser! You now have enough knowledge that you can go off and build your own bottom-up parser. 31 I implemented a bottom-up parser I used XSLT to implement a bottom-up parser. If you would like to give my implementation a go, here is the XSLT program and a sample flat (linear) input XML document: http:// www.xfront.com/parsing-techniques/bottom-up-parser/bottom-up-parser-for -Books.xsl http://www.xfront.com/parsing-techniques/bottom-up-parser/Books.xml 32

Recently Viewed Presentations

  • British North America Act - santagata.weebly.com

    British North America Act - santagata.weebly.com

    The leader of the Clear Grits wanted to put an end to ministerial instability. So, he proposed an alliance to the Conservative leaders of Upper and Lower Canada. This alliance was created in June 1864 and was called the "Great...
  • Schema Refinement and Normalization

    Schema Refinement and Normalization

    Schema Refinement, Normalization, and Tuning
  • The Contextual-Functional Model of Clinical Supervision

    The Contextual-Functional Model of Clinical Supervision

    Orienting and organizing supervision efforts -- remind supervisors to attend to, and weigh, multiple aspects of supervision at one time. we have to walk, chew gum, breathe, look where we're going, talk to the person we're out walking with, and...
  • The Bill of Rights The Bill of Rights

    The Bill of Rights The Bill of Rights

    The Bill of Rights The Bill of Rights The First 10 Amendments to the Constitution Flip chart- Amendment number on side one/amendment info behind the amendment Draw pictures for all ten amendments 1st Amendment The 1st Amendment guarantees freedom of...
  • Title of Presentation - University of South Florida

    Title of Presentation - University of South Florida

    Anticipated Intervention Outcomes: Case Closure - Child Safety, Permanency and Well-being. Hotline Assessment seeks to screen in cases and determine response times where reported information indicates that maltreatment may have occurred and/or there are indications that children may be unsafe...
  • UT PowerPoint Template 2015 ver 1 - University of Tennessee

    UT PowerPoint Template 2015 ver 1 - University of Tennessee

    TN-ELAP is a creation of the Software Development and Systems Engineering Unit at UT SWORPS. We developed it for the Tennessee Department of Education, to track student participation in extended learning programs funded by the DOE's 21st Century Community Learning...
  • Harnessing the Semantic Web to Answer Scientific Questions:

    Harnessing the Semantic Web to Answer Scientific Questions:

    Harnessing the Semantic Web to Answer Scientific Questions: A Health Care and Life Sciences Interest Group demo Susie Stephens, Principal Research Scientist, Lilly
  • Welcome NET Students - Carleton School of Information Technology

    Welcome NET Students - Carleton School of Information Technology

    Carleton student card: 407 University Centre. Can be used almost everywhere on campus. Algonquin student card: Get proof of BIT student status from Algonquin's Registrar's Office. Take that and your Carleton student ID card to room C044 to obtain an...