Jip J. Dekker
|
e0e64bd65a
|
Implemented source exclusion
|
2014-04-16 11:03:59 +02:00 |
|
Jip J. Dekker
|
d823c105e6
|
Implemented source inclusion
|
2014-04-16 10:48:29 +02:00 |
|
Bas Vb
|
d778050f36
|
Able to parse the weblinks to other databases, one example done
|
2014-04-16 10:37:57 +02:00 |
|
Jip J. Dekker
|
7b57d86178
|
Removed redundant source loader
|
2014-04-16 10:36:46 +02:00 |
|
Jip J. Dekker
|
9dcb150356
|
Merge branch 'develop' into feature/chemspider-parser
|
2014-04-16 10:24:52 +02:00 |
|
Jip J. Dekker
|
a06bf643f1
|
Made sourceloader a class and implemented the listing of all sources
|
2014-04-16 10:14:29 +02:00 |
|
Jip J. Dekker
|
8b7cfac2de
|
Added an new command to the CLI, implementation will follow.
|
2014-04-16 09:33:07 +02:00 |
|
Bas Vb
|
cd1637b0fe
|
Both Boiling point and melting point are now parsed from chemical Wikipedia pages, there's one error about different types of attributes in the Result-items, this needs to be fixed by cleaning up the retrieved data.
|
2014-04-16 00:50:50 +02:00 |
|
Bas Vb
|
1ca3593ae1
|
Parse is runnable now.
|
2014-04-16 00:35:19 +02:00 |
|
Jip J. Dekker
|
6799a1a956
|
Merge branch 'release/v0.1.0' into develop
1-searchable
|
2014-04-15 19:49:07 +02:00 |
|
Jip J. Dekker
|
2d5e39de81
|
Merge branch 'release/v0.1.0'
v0.1.0
|
2014-04-15 19:48:55 +02:00 |
|
Jip J. Dekker
|
972e5da0d2
|
Removed debug code and typos.
|
2014-04-15 19:48:27 +02:00 |
|
Jip J. Dekker
|
d770f79a7a
|
Bumped version number
|
2014-04-15 19:46:10 +02:00 |
|
Jip J. Dekker
|
878d8e5efb
|
Merge branch 'feature/CLI' into develop
|
2014-04-15 19:44:41 +02:00 |
|
Jip J. Dekker
|
61ca2520e3
|
Added feed export functionality
|
2014-04-15 19:40:54 +02:00 |
|
Jip J. Dekker
|
e65d3a6898
|
Added the options for the Feed exports
|
2014-04-15 18:57:51 +02:00 |
|
RTB
|
8e46762a9e
|
fix: if no experimental data, return predicted acd/labs data instead of None
|
2014-04-15 18:56:38 +02:00 |
|
Jip J. Dekker
|
ffb3861034
|
Search for single compound, filename should be lowercase
|
2014-04-15 18:49:30 +02:00 |
|
Jip J. Dekker
|
91ed053ac5
|
Stopped log from interfering with STDOUT
|
2014-04-15 18:17:35 +02:00 |
|
Jip J. Dekker
|
a4dd6e1835
|
Made logging work
|
2014-04-14 21:31:20 +02:00 |
|
Jip J. Dekker
|
2ad33080c6
|
First setup of the CLI, decided on a structure
|
2014-04-14 20:45:07 +02:00 |
|
Jip J. Dekker
|
ee01e697d3
|
Added Docopt as an CLI framework
|
2014-04-14 20:21:41 +02:00 |
|
RTB
|
ff0eb309da
|
ChemSpider parser now handles the Predicted - ACD/Labs tab for scraping properties
|
2014-04-14 17:27:02 +02:00 |
|
RTB
|
2ae3ac9c51
|
added parse_properties to scrape the Experimental Physico-chemical Properties table if it exists
|
2014-04-14 13:09:14 +02:00 |
|
RTB
|
31a63829f8
|
chemspider parser now makes new synonym requests with the scraped synonyms
|
2014-04-14 01:23:15 +02:00 |
|
RTB
|
e95df8eaa3
|
ignore_list now contains the intended names instead of Result objects
|
2014-04-14 01:20:24 +02:00 |
|
RTB
|
564dbc3292
|
added ignore list to new_compound_request for synonyms found by chemspider parser
|
2014-04-14 00:33:25 +02:00 |
|
RTB
|
b1b969a16c
|
corrected usage of __spider variable
|
2014-04-14 00:28:47 +02:00 |
|
RTB
|
0ad98905e3
|
added scraping for wikipedia links in synonym tab
|
2014-04-13 23:35:25 +02:00 |
|
RTB
|
5565c28a1e
|
moved parsing of synonyms to 'parse_synonyms' function
|
2014-04-13 23:14:23 +02:00 |
|
RTB
|
859a18c61a
|
added parsing of synonyms
|
2014-04-12 22:27:28 +02:00 |
|
RTB
|
22fa67735d
|
added parse_searchrequest function
|
2014-04-12 19:41:36 +02:00 |
|
RTB
|
246463b450
|
simplified debug output, WARNING label should be temporary
|
2014-04-12 19:19:56 +02:00 |
|
RTB
|
423cb90a6a
|
Merge branch 'develop' into feature/chemspider-parser
|
2014-04-12 19:13:02 +02:00 |
|
RTB
|
0e3ef9a792
|
hardcoded ChemSpider API token into ChemSpider.py
|
2014-04-08 16:14:47 +02:00 |
|
Bas Vb
|
f9799c30d8
|
Parse is runnable now.
|
2014-04-08 14:59:09 +02:00 |
|
RTB
|
a4dc8c8711
|
corrected Chemspider parser to be a subclass of Parser
|
2014-04-08 13:10:02 +02:00 |
|
RTB
|
0da286c907
|
created basic structure of ChemSpider search parser
|
2014-04-08 12:08:45 +02:00 |
|
Jip J. Dekker
|
e10ac12d04
|
Merge branch 'develop' into feature/Wikipedia
|
2014-04-08 11:45:23 +02:00 |
|
Jip J. Dekker
|
debbc5e62a
|
Merge branch 'hotfix/none-requests' into develop
|
2014-04-08 11:44:42 +02:00 |
|
Jip J. Dekker
|
199fa5419e
|
Merge branch 'hotfix/none-requests'
|
2014-04-08 11:44:26 +02:00 |
|
Jip J. Dekker
|
622dd4ad00
|
Small fix to ensure unique classes and load all parsers
|
2014-04-08 11:43:32 +02:00 |
|
Jip J. Dekker
|
da17a149c0
|
Spider is now able to handle none-request from parsers while handling new
compounds
|
2014-04-08 11:42:43 +02:00 |
|
Jip J. Dekker
|
4b0c4acf96
|
Updated the wikipedia parser as an rightful subclass of Parser
|
2014-04-08 11:40:30 +02:00 |
|
Bas Vb
|
f3807c3018
|
Fixed the errors, but still not able to run/test the parse() function
|
2014-04-06 20:28:03 +02:00 |
|
Bas Vb
|
add4a13a4d
|
Trying to make a start with the WikipediaParser, but I can't find out with the Scrapy website (or another way) what the structure of the file should be, and how I can test/run the crawling on a page.
|
2014-04-06 18:02:09 +02:00 |
|
Nout van Deijck
|
81a93c44bb
|
added author
|
2014-04-03 12:19:17 +02:00 |
|
Bas Vb
|
60c409da3d
|
New file and branch for the Wikipedia parser
|
2014-04-03 12:05:06 +02:00 |
|
Bas Vb
|
b4ff4a3c3b
|
New file and branch for the Wikipedia parser
|
2014-04-03 12:00:27 +02:00 |
|
Jip J. Dekker
|
3a074467e6
|
Merge branch 'hotfix/No_TABs' into develop
|
2014-04-02 14:22:13 +02:00 |
|