python-catalin: 03/06/17

Monday, March 6, 2017

The pattern python module - part 001.

This is a very short presentation of pattern python module.
This python module is full of options and features.
I will try to show you some parts useful for most python users.
About pattern python module:
Pattern is a web mining module for the Python programming language.
It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and visualization.
Pattern developer documentation

Module	Functionality
pattern.web	Asynchronous requests, web services, web crawler, HTML DOM parser.
pattern.db	Wrappers for databases (MySQL, SQLite) and CSV-files.
pattern.text	Base classes for parsers, parse trees and sentiment analysis.
pattern.search	Pattern matching algorithm for parsed text (syntax & semantics).
pattern.vector	Vector space model, clustering, classification.
pattern.graph	Graph analysis & visualization.

I used with Fedora linux and you can see the instalation of this python module:

[root@localhost ~]# pip install pattern
Collecting pattern
  Downloading pattern-2.6.zip (24.6MB)
    100% |████████████████████████████████| 24.6MB 61kB/s 
Installing collected packages: pattern
  Running setup.py install for pattern ... done
Successfully installed pattern-2.6

Frequently used single character variable names:

Variable	Meaning	Example
a	array, all	a = [normalize(w) for w in words]
b	boolean	while b is False:
d	distance, document	d = distance(v1, v2)
e	element	e = html.find('#nav')
f	file, filter, function	f = open('data.csv', 'r')
i	index	for i in range(len(matrix)):
j	index	for j in range(len(matrix[i])):
k	key	for k in vector.keys():
n	list length	n = len(a)
p	parser, pattern	p = pattern.search.compile('NN')
q	query	for r in twitter.search(q):
r	result, row	for r in csv('data.csv):
s	string	s = s.decode('utf-8').strip()
t	time	t = time.time() - t0
v	value, vector	for k, v in vector.items():
w	word	for i, w in enumerate(sentence.words):
x	horizontal position	node.x = 0
y	vertical position	node.y = 0

Pattern contains part-of-speech taggers for a number of languages (including English, Spanish, German, French and Dutch). Part-of-speech tagging is useful in many data mining tasks. A part-of-speech tagger takes a string of text and identifies the sentences and the words in the text along with their word type.

Language	Code	Speakers	Example countries
Spanish	es	350M	Argentina (40), Colombia (40), Mexico (100), Spain (45)
English	en	340M	Canada (30), United Kingdom (60), United States (300)
German	de	100M	Austria (10), Germany (80), Switzerland (7)
French	fr	70M	France (65), Côte d'Ivoire (20)
Italian	it	60M	Italy (60)
Dutch	nl	27M	The Netherlands (25), Belgium (6), Suriname (1)

import pattern.en  
import pattern.es
import pattern.du  
import pattern.de

You can deal with many websites, see examples:

from pattern.web import Wikipedia
from pattern.web import Yahoo
from pattern.web import Twitter
from pattern.web import Facebook
from pattern.web import Flickr
from pattern.web import GMAIL
from pattern.web import GOOGLE

Now, about pattern.db.
The pattern.db module contains wrappers for databases (SQLite, MySQL), Unicode CSV files and Python's datetime. It offers a convenient way to work with tabular data, for example retrieved with the pattern.web module.

import pattern 
from pattern.db import Database, field, pk, STRING, BOOLEAN, DATE, NOW 
db = Database('people')
db.create('area_people',fields=(
pk(),
field('name', STRING(80), index=True),
field('type', STRING(20)),
field('date_birth', DATE, default=None),
field('date_created', DATE, default=NOW)
))
db.area_people.append(name=u'George', type='male')
1
print db.area_people.rows()[0]
(1, u'George', u'male', None, Date('2017-03-06 22:38:13'))

python-catalin

analitics

Pages

Monday, March 6, 2017

The pattern python module - part 001.