Nout van Deijck
|
150fc5bea7
|
added comments
|
2014-04-23 16:17:23 +02:00 |
|
Nout van Deijck
|
9cefd336e0
|
Cleaning up code and added log messages
|
2014-04-23 16:02:37 +02:00 |
|
Nout van Deijck
|
507006889b
|
Fixed problem with strange urls, now adds all external identifiers as requests
|
2014-04-23 15:49:23 +02:00 |
|
Bas Vb
|
62475d965d
|
Cleaning up code
|
2014-04-23 15:24:57 +02:00 |
|
Nout van Deijck
|
3e1b33164e
|
Some comments and trying different for loop for adding requests
|
2014-04-23 13:48:44 +02:00 |
|
Nout van Deijck
|
1ced65e2b6
|
Parser now adds extra requests for every identifier to an external source that is in the Wikipedia chembox
|
2014-04-23 13:18:50 +02:00 |
|
Nout van Deijck
|
b5c83125f7
|
Added extra request for chemspider link retreived from Wikipedia
|
2014-04-23 12:27:53 +02:00 |
|
Bas Vb
|
f926f86d7d
|
Small fix because the cleaned up items were not send back
|
2014-04-23 12:14:20 +02:00 |
|
Nout van Deijck
|
6dd03c293a
|
Added check for already visited redirects of compounds
|
2014-04-23 12:08:33 +02:00 |
|
Bas Vb
|
cb299df96f
|
Added log statements
|
2014-04-23 11:46:43 +02:00 |
|
Bas Vb
|
fd5faf22e4
|
Added empty reliability and condition to prevent errors for now
|
2014-04-23 11:12:58 +02:00 |
|
Bas Vb
|
1c518af5a6
|
Remove per attribute getfunctions
|
2014-04-23 11:06:59 +02:00 |
|
Bas Vb
|
b0146cdce8
|
Added regular expressions to clean up temperature data
|
2014-04-22 09:46:19 +02:00 |
|
Bas Vb
|
be63315ca2
|
regex
|
2014-04-16 17:01:35 +02:00 |
|
Jip J. Dekker
|
873231439c
|
Merge branch 'develop' into feature/Wikipedia
|
2014-04-16 16:59:25 +02:00 |
|
Jip J. Dekker
|
d603e388e6
|
Merge branch 'hotfix/1-searchable' into develop
|
2014-04-16 16:58:53 +02:00 |
|
Jip J. Dekker
|
ab2a3fdc08
|
typo!
|
2014-04-16 16:57:27 +02:00 |
|
Jip J. Dekker
|
f0d10902b5
|
Searchable can't be a list!
|
2014-04-16 16:57:08 +02:00 |
|
Jip J. Dekker
|
efacc08a3d
|
Merge branch 'develop' into feature/Wikipedia
Conflicts:
Fourmi.py
|
2014-04-16 16:49:03 +02:00 |
|
Bas Vb
|
6f82b117c9
|
new function to clean up the datapoints
|
2014-04-16 16:23:33 +02:00 |
|
Bas Vb
|
74aa446f40
|
minor edits (comments etc.)
|
2014-04-16 15:27:36 +02:00 |
|
Bas Vb
|
34c3a8b4d6
|
remove empty data points
|
2014-04-16 15:22:47 +02:00 |
|
Bas Vb
|
ce3105f3c1
|
went to a general loop over all values, this way getting all elements from the Wikipedia infobox (except for those with a colspan, because these mess up)
|
2014-04-16 14:56:32 +02:00 |
|
Bas Vb
|
f1280dd66d
|
get value not list from xpath
|
2014-04-16 13:23:50 +02:00 |
|
Bas Vb
|
d99548e3b6
|
Added density, molar entropy and heat capacity
|
2014-04-16 11:14:02 +02:00 |
|
Bas Vb
|
d778050f36
|
Able to parse the weblinks to other databases, one example done
|
2014-04-16 10:37:57 +02:00 |
|
Bas Vb
|
cd1637b0fe
|
Both Boiling point and melting point are now parsed from chemical Wikipedia pages, there's one error about different types of attributes in the Result-items, this needs to be fixed by cleaning up the retrieved data.
|
2014-04-16 00:50:50 +02:00 |
|
Bas Vb
|
1ca3593ae1
|
Parse is runnable now.
|
2014-04-16 00:35:19 +02:00 |
|
Jip J. Dekker
|
6799a1a956
|
Merge branch 'release/v0.1.0' into develop
1-searchable
|
2014-04-15 19:49:07 +02:00 |
|
Jip J. Dekker
|
2d5e39de81
|
Merge branch 'release/v0.1.0'
v0.1.0
|
2014-04-15 19:48:55 +02:00 |
|
Jip J. Dekker
|
972e5da0d2
|
Removed debug code and typos.
|
2014-04-15 19:48:27 +02:00 |
|
Jip J. Dekker
|
d770f79a7a
|
Bumped version number
|
2014-04-15 19:46:10 +02:00 |
|
Jip J. Dekker
|
878d8e5efb
|
Merge branch 'feature/CLI' into develop
|
2014-04-15 19:44:41 +02:00 |
|
Jip J. Dekker
|
61ca2520e3
|
Added feed export functionality
|
2014-04-15 19:40:54 +02:00 |
|
Jip J. Dekker
|
e65d3a6898
|
Added the options for the Feed exports
|
2014-04-15 18:57:51 +02:00 |
|
Jip J. Dekker
|
ffb3861034
|
Search for single compound, filename should be lowercase
|
2014-04-15 18:49:30 +02:00 |
|
Jip J. Dekker
|
91ed053ac5
|
Stopped log from interfering with STDOUT
|
2014-04-15 18:17:35 +02:00 |
|
Jip J. Dekker
|
a4dd6e1835
|
Made logging work
|
2014-04-14 21:31:20 +02:00 |
|
Jip J. Dekker
|
2ad33080c6
|
First setup of the CLI, decided on a structure
|
2014-04-14 20:45:07 +02:00 |
|
Jip J. Dekker
|
ee01e697d3
|
Added Docopt as an CLI framework
|
2014-04-14 20:21:41 +02:00 |
|
Bas Vb
|
f9799c30d8
|
Parse is runnable now.
|
2014-04-08 14:59:09 +02:00 |
|
Jip J. Dekker
|
e10ac12d04
|
Merge branch 'develop' into feature/Wikipedia
|
2014-04-08 11:45:23 +02:00 |
|
Jip J. Dekker
|
debbc5e62a
|
Merge branch 'hotfix/none-requests' into develop
|
2014-04-08 11:44:42 +02:00 |
|
Jip J. Dekker
|
199fa5419e
|
Merge branch 'hotfix/none-requests'
|
2014-04-08 11:44:26 +02:00 |
|
Jip J. Dekker
|
622dd4ad00
|
Small fix to ensure unique classes and load all parsers
|
2014-04-08 11:43:32 +02:00 |
|
Jip J. Dekker
|
da17a149c0
|
Spider is now able to handle none-request from parsers while handling new
compounds
|
2014-04-08 11:42:43 +02:00 |
|
Jip J. Dekker
|
4b0c4acf96
|
Updated the wikipedia parser as an rightful subclass of Parser
|
2014-04-08 11:40:30 +02:00 |
|
Bas Vb
|
f3807c3018
|
Fixed the errors, but still not able to run/test the parse() function
|
2014-04-06 20:28:03 +02:00 |
|
Bas Vb
|
add4a13a4d
|
Trying to make a start with the WikipediaParser, but I can't find out with the Scrapy website (or another way) what the structure of the file should be, and how I can test/run the crawling on a page.
|
2014-04-06 18:02:09 +02:00 |
|
Nout van Deijck
|
81a93c44bb
|
added author
|
2014-04-03 12:19:17 +02:00 |
|