Beautiful soup: extracting xml value with cerain attribute


I am using this script to scrape boardgamegeek.

This is working fine and is getting the info out of the xml data

I want to extract one more element to the output csv. This one:

<items>
 <item>
   <link type="boardgamepublisher" id="1001" value="(Web published)"/>
   <link type="boardgamepublisher" id="1341" value="something else"/>
 </item>  
</items>

the id's are random, what I want is the value of the link tag of the type attribute "boardgamepublisher" and add them to a csv field (preferably all boardgamepublishers in one cell). sometimes the is one boardgamepublisher, sometimes more. there are a lot of link elements, so I need to filter by them

- - Source

Answers

answered 4 mon ago Nivatius #1

 soup = BeautifulSoup(req.content, 'xml')
    items = soup.find_all('item')
    for item in items:

This code needs to be added to add the publisher(s)

 publishers=item.find_all(type="boardgamepublisher" )
        gpublishers=""
        for publisher in publishers:
           gpublishers += publisher["value"]+"," 

first line returns the a list with

   <link type="boardgamepublisher" id="1001" value="(Web published)"/>

and

   <link type="boardgamepublisher" id="1341" value="something else"/>

the publisher["value"] extracts the content of the Value attribute.

I am still looking for better sugestions, because I am afriaght this solution will be really slow.

comments powered by Disqus