analitics

Pages

Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Wednesday, April 5, 2017

How to parse the OPML file.

For example, the Feedly (stylized as Feedly) is a news aggregator application for various web browsers and mobile devices can let you export and import the OPML file.

What is XML?
The Extensible Markup Language (XML) is a markup language much like HTML or SGML. This is recommended by the World Wide Web Consortium and available as an open standard.

Today I will show you how to parse the OPML file type with python 2.7 version and XML python module.
This is the source script:
from xml.etree import ElementTree
import sys

file_opml = sys.argv[1]
def extract_rss_urls_from_opml(filename):
    urls = []
    with open(filename, 'rt') as f:
        tree = ElementTree.parse(f)
    for node in tree.findall('.//outline'):
        url = node.attrib.get('xmlUrl')
        if url:
            urls.append(url)
    return urls
urls = extract_rss_urls_from_opml(file_opml)
print urls
The result is a list with all your RSS links.

Friday, February 14, 2014

Parsing feeds - get by attribute and value - part 2

Most developers use REST services or other data feeds that move data using XML.
This is a simple script to read online the xml file.
I used minidom but you can also use etree with ElementTree or cElementTree from etree.
I don't know if the ElementTree or cElementTree are more faster like minidom.
The script use urllib2 to open the file.
The file will show us the currency from each country.
The main goal of this script is : how to deal with attribute and value from xml files.
You can also see first part of this issue.
The structure of the xml file has also some attributes - currency.
Basicaly is something like this :
<!--xml version="1.0" encoding="UTF-8"?-->
-<dataset xsi:schemaLocation="http://www.bnr.ro/xsd nbrfxrates.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.bnr.ro/xsd">
-<header>
<publisher>National Bank of Romania</Publisher>
<publishingdate>2014-02-14</PublishingDate>
<messagetype>DR</MessageType>
</Header>
-<body>
<subject>Reference rates</Subject>
<origcurrency>RON</OrigCurrency>
-<cube date="2014-02-14">
<rate currency="AED">0.8909</Rate>
<rate currency="AUD">2.9529</Rate>
<rate currency="BGN">2.2913</Rate>
...
Now let see the script :
from xml.dom import minidom as dom
import urllib2

def fetchPage(url):
    a = urllib2.urlopen(url)
    return ''.join(a.readlines())

def extract(page):
    a = dom.parseString(page)

    item = a.getElementsByTagName('Rate')

    for i in item:
        if i.hasChildNodes() == True:
                print i.getAttribute('currency')+"-"+ i.firstChild.nodeValue

if __name__=='__main__':
    page = fetchPage("http://www.bnro.ro/nbrfxrates.xml")
    extract(page)
and the output is this :
AED-0.8909
AUD-2.9529
BGN-2.2913
BRL-1.3665
CAD-2.9879
CHF-3.6655
CNY-0.5394
CZK-0.1636
DKK-0.6005
EGP-0.4701
EUR-4.4813
GBP-5.4630
HUF-1.4517
INR-0.0527
JPY-3.2148
KRW-0.3078
MDL-0.2434
MXN-0.2467
NOK-0.5365
NZD-2.7388
PLN-1.0786
RSD-0.0387
RUB-0.0932
SEK-0.5074
TRY-1.4950
UAH-0.3865
USD-3.2721
XAU-137.6798
XDR-5.0505
ZAR-0.2981

Thursday, February 4, 2010

Parsing feeds - part 1

From time to time I used conky. Is good for me, because i have all i need on my desktop.
How helped me python in this case?
For example i use one script to parse a feed from this url:
"http://www.bnro.ro/nbrfxrates.xml"
The example is simple to understand :
from xml.dom import minidom as dom
import urllib
def fetchPage(url):
a = urllib.urlopen(url)
return ''.join(a.readlines())

def extract(webpage):
a = dom.parseString(webpage)
item2 = a.getElementsByTagName('SendingDate')[0].firstChild.wholeText
print "DATA ",item2
item = a.getElementsByTagName('Cube')
for i in item:
if i.hasChildNodes() == True:
eur = i.getElementsByTagName('Rate')[10].firstChild.wholeText
dol = i.getElementsByTagName('Rate')[26].firstChild.wholeText
print "EURO  ",eur
print "DOLAR ",dol

if __name__=='__main__':
webpage = fetchPage("http://www.bnro.ro/nbrfxrates.xml")
extract(webpage)
The result is:
$python xmlparse.py
DATA  2010-02-04
EURO   4.1214
DOLAR  2.9749
With "urllib" package I read the url.
The result is parsing with functions from "dom" package.
I used this functions "parseString" and "getElementsByTagName".
More about this functions you will see on:
http://docs.python.org/library/xml.dom.minidom.html
This is all.

Friday, February 27, 2009

Python and xml

Python is an excellent choice for writing programs that process XML data.
XML's is a simplified subset of the Standard Generalized Markup Language (SGML).
He set of tools helps developers in creating web pages.
This is one simple example which I use with "conky":

from xml.dom import minidom as dom
import urllib2
def fetchPage(url):
a = urllib2.urlopen(url)
return ''.join(a.readlines())
def extract(page):
a = dom.parseString(page)
item2 = a.getElementsByTagName('SendingDate')[0].firstChild.wholeText
print "DATA ",item2
item = a.getElementsByTagName('Cube')
for i in item:
if i.hasChildNodes() == True:
e = i.getElementsByTagName('Rate')[8].firstChild.wholeText
d = i.getElementsByTagName('Rate')[18].firstChild.wholeText
print "EURO ",e
print "DOLAR ",d

if __name__=='__main__':
page = fetchPage("http://www.bnro.ro/nbrfxrates.xml")
extract(page)