bd7fb38497Merge branch 'feature/chemspider-parser' of code.giphouse.nl:giphouse/descartes-2 into feature/chemspider-parser
Jip J. Dekker
2014-04-16 17:01:09 +02:00
3aaed985feMerge branch 'develop' into feature/chemspider-parser
Jip J. Dekker
2014-04-16 17:00:59 +02:00
873231439cMerge branch 'develop' into feature/Wikipedia
Jip J. Dekker
2014-04-16 16:59:25 +02:00
d603e388e6Merge branch 'hotfix/1-searchable' into develop
Jip J. Dekker
2014-04-16 16:58:53 +02:00
8cf307127aMerge branch 'hotfix/1-searchable'
Jip J. Dekker
2014-04-16 16:57:43 +02:00
ab2a3fdc08typo!
Jip J. Dekker
2014-04-16 16:57:27 +02:00
f0d10902b5Searchable can't be a list!
Jip J. Dekker
2014-04-16 16:57:08 +02:00
efacc08a3dMerge branch 'develop' into feature/Wikipedia
Jip J. Dekker
2014-04-16 16:49:03 +02:00
6f82b117c9new function to clean up the datapoints
Bas Vb
2014-04-16 16:23:33 +02:00
9a78e186bcchemspider parser now grabs data from ExtendedCompoundInfo() of chemspider API (no units)
Rob tB
2014-04-16 16:22:47 +02:00
74aa446f40minor edits (comments etc.)
Bas Vb
2014-04-16 15:27:36 +02:00
caf7d3df4efixed ExtendedCompoundInfo url to have csid parameter instead of query
Rob tB
2014-04-16 15:27:10 +02:00
34c3a8b4d6remove empty data points
Bas Vb
2014-04-16 15:22:47 +02:00
2d314aee6acreated stub to parse ExtendedCompoundInfo from ChemSpider MassSpec API
Rob tB
2014-04-16 15:21:33 +02:00
7fc980befechemspider should now only generate new Requests for wikipedia links from 'expert confirmed' synonyms
Rob tB
2014-04-16 15:02:37 +02:00
ce3105f3c1went to a general loop over all values, this way getting all elements from the Wikipedia infobox (except for those with a colspan, because these mess up)
Bas Vb
2014-04-16 14:56:32 +02:00
87282fc572new properties in parse_properties now use dictionary syntax
Rob tB
2014-04-16 14:26:27 +02:00
93a6f098a9log messages are now DEBUG instead of WARNING
Rob tB
2014-04-16 13:28:59 +02:00
f1280dd66dget value not list from xpath
Bas Vb
2014-04-16 13:23:50 +02:00
c1b5f810cbunused Result properties are now empty string instead of None
Rob tB
2014-04-16 11:53:59 +02:00
92a74de9e0Added the include and exclude options.
Jip J. Dekker
2014-04-16 11:17:48 +02:00
d99548e3b6Added density, molar entropy and heat capacity
Bas Vb
2014-04-16 11:14:02 +02:00
e0e64bd65aImplemented source exclusion
Jip J. Dekker
2014-04-16 11:03:59 +02:00
d823c105e6Implemented source inclusion
Jip J. Dekker
2014-04-16 10:48:29 +02:00
d778050f36Able to parse the weblinks to other databases, one example done
Bas Vb
2014-04-16 10:37:57 +02:00
7b57d86178Removed redundant source loader
Jip J. Dekker
2014-04-16 10:36:46 +02:00
9dcb150356Merge branch 'develop' into feature/chemspider-parser
Jip J. Dekker
2014-04-16 10:24:52 +02:00
a06bf643f1Made sourceloader a class and implemented the listing of all sources
Jip J. Dekker
2014-04-16 10:14:29 +02:00
8b7cfac2deAdded an new command to the CLI, implementation will follow.
Jip J. Dekker
2014-04-16 09:33:07 +02:00
cd1637b0feBoth Boiling point and melting point are now parsed from chemical Wikipedia pages, there's one error about different types of attributes in the Result-items, this needs to be fixed by cleaning up the retrieved data.
Bas Vb
2014-04-16 00:50:50 +02:00
1ca3593ae1Parse is runnable now.
Bas Vb
2014-04-16 00:35:19 +02:00
6799a1a956Merge branch 'release/v0.1.0' into develop
1-searchable
Jip J. Dekker
2014-04-15 19:49:07 +02:00
2d5e39de81Merge branch 'release/v0.1.0'
v0.1.0
Jip J. Dekker
2014-04-15 19:48:55 +02:00
972e5da0d2Removed debug code and typos.
Jip J. Dekker
2014-04-15 19:48:27 +02:00
d770f79a7aBumped version number
Jip J. Dekker
2014-04-15 19:46:10 +02:00
878d8e5efbMerge branch 'feature/CLI' into develop
Jip J. Dekker
2014-04-15 19:44:41 +02:00
61ca2520e3Added feed export functionality
Jip J. Dekker
2014-04-15 19:40:54 +02:00
e65d3a6898Added the options for the Feed exports
Jip J. Dekker
2014-04-15 18:57:51 +02:00
8e46762a9efix: if no experimental data, return predicted acd/labs data instead of None
RTB
2014-04-15 18:56:38 +02:00
ffb3861034Search for single compound, filename should be lowercase
Jip J. Dekker
2014-04-15 18:49:30 +02:00
91ed053ac5Stopped log from interfering with STDOUT
Jip J. Dekker
2014-04-15 18:17:35 +02:00
a4dd6e1835Made logging work
Jip J. Dekker
2014-04-14 21:31:20 +02:00
2ad33080c6First setup of the CLI, decided on a structure
Jip J. Dekker
2014-04-14 20:45:07 +02:00
ee01e697d3Added Docopt as an CLI framework
Jip J. Dekker
2014-04-14 20:21:41 +02:00
ff0eb309daChemSpider parser now handles the Predicted - ACD/Labs tab for scraping properties
RTB
2014-04-14 17:27:02 +02:00
2ae3ac9c51added parse_properties to scrape the Experimental Physico-chemical Properties table if it exists
RTB
2014-04-14 13:09:14 +02:00
31a63829f8chemspider parser now makes new synonym requests with the scraped synonyms
RTB
2014-04-14 01:23:15 +02:00
e95df8eaa3ignore_list now contains the intended names instead of Result objects
RTB
2014-04-14 01:20:24 +02:00
564dbc3292added ignore list to new_compound_request for synonyms found by chemspider parser
RTB
2014-04-14 00:33:25 +02:00
b1b969a16ccorrected usage of __spider variable
RTB
2014-04-14 00:28:47 +02:00
0ad98905e3added scraping for wikipedia links in synonym tab
RTB
2014-04-13 23:35:25 +02:00
5565c28a1emoved parsing of synonyms to 'parse_synonyms' function
RTB
2014-04-13 23:14:23 +02:00
859a18c61aadded parsing of synonyms
RTB
2014-04-12 22:27:28 +02:00
22fa67735dadded parse_searchrequest function
RTB
2014-04-12 19:41:36 +02:00
246463b450simplified debug output, WARNING label should be temporary
RTB
2014-04-12 19:19:56 +02:00
423cb90a6aMerge branch 'develop' into feature/chemspider-parser
RTB
2014-04-12 19:13:02 +02:00
0e3ef9a792hardcoded ChemSpider API token into ChemSpider.py
RTB
2014-04-08 16:14:47 +02:00
f9799c30d8Parse is runnable now.
Bas Vb
2014-04-08 14:59:09 +02:00
a4dc8c8711corrected Chemspider parser to be a subclass of Parser
RTB
2014-04-08 13:10:02 +02:00
e10ac12d04Merge branch 'develop' into feature/Wikipedia
Jip J. Dekker
2014-04-08 11:45:23 +02:00
debbc5e62aMerge branch 'hotfix/none-requests' into develop
Jip J. Dekker
2014-04-08 11:44:42 +02:00
199fa5419eMerge branch 'hotfix/none-requests'
Jip J. Dekker
2014-04-08 11:44:26 +02:00
622dd4ad00Small fix to ensure unique classes and load all parsers
Jip J. Dekker
2014-04-08 11:43:32 +02:00
da17a149c0Spider is now able to handle none-request from parsers while handling new compounds
Jip J. Dekker
2014-04-08 11:42:43 +02:00
4b0c4acf96Updated the wikipedia parser as an rightful subclass of Parser
Jip J. Dekker
2014-04-08 11:40:30 +02:00
f3807c3018Fixed the errors, but still not able to run/test the parse() function
Bas Vb
2014-04-06 20:28:03 +02:00
add4a13a4dTrying to make a start with the WikipediaParser, but I can't find out with the Scrapy website (or another way) what the structure of the file should be, and how I can test/run the crawling on a page.
Bas Vb
2014-04-06 18:02:09 +02:00
81a93c44bbadded author
Nout van Deijck
2014-04-03 12:19:17 +02:00
60c409da3dNew file and branch for the Wikipedia parser
Bas Vb
2014-04-03 12:05:06 +02:00
b4ff4a3c3bNew file and branch for the Wikipedia parser
Bas Vb
2014-04-03 12:00:27 +02:00
3a074467e6Merge branch 'hotfix/No_TABs' into develop
Jip J. Dekker
2014-04-02 14:22:13 +02:00
9805bb5adbMerge branch 'hotfix/No_TABs'
Jip J. Dekker
2014-04-02 14:21:34 +02:00
f6981057dfChanged everything to spaces
Jip J. Dekker
2014-04-02 14:20:05 +02:00
595f0253e2Merge branch 'release/v0.0.1' into develop
Jip J. Dekker
2014-04-01 21:44:31 +02:00
254e8db3aaMerge branch 'release/v0.0.1'
v0.0.1
Jip J. Dekker
2014-04-01 21:44:08 +02:00
c9e09f8ab9Added an version message
Jip J. Dekker
2014-04-01 21:42:54 +02:00
2e8017c590Merge branch 'feature/parsing-scheme' into develop
Jip J. Dekker
2014-04-01 21:40:26 +02:00
7bc160f676The spider is now able to start using the synonym generator
Jip J. Dekker
2014-04-01 21:38:11 +02:00
cd421cc2fbReplaced literal for testing with a variable fix.
Jip J. Dekker
2014-04-01 21:24:04 +02:00
0bf2d102c6Fixed parser importation, so it doesn't import imported classes.
Jip J. Dekker
2014-04-01 21:21:30 +02:00
683f8c09d4Quick fix, python errors
Jip J. Dekker
2014-04-01 21:12:54 +02:00
f93dc2d160Added an structure to get requests for all websites for a new synonym
Jip J. Dekker
2014-04-01 21:07:36 +02:00
e39ed3b681Added a way for parsers to access the spider.
Jip J. Dekker
2014-04-01 20:56:32 +02:00
4d9e5307bfWritten an loader for all parsers in the parser directory.
Jip J. Dekker
2014-03-31 00:48:45 +02:00
0cc1b23353Added the functionality to add parsers and automatically use them.
Jip J. Dekker
2014-03-30 23:37:42 +02:00
6e2df64fe4Merge branch 'hotfix/spider-import-error' into develop
Jip J. Dekker
2014-03-30 23:08:14 +02:00
e210ce8558Merge branch 'develop', remote-tracking branch 'origin/develop' into develop
Jip J. Dekker
2014-03-30 22:08:21 +02:00
6bbee865c4Merge branch 'feature/basic-structure' into develop
Jip J. Dekker
2014-03-28 14:46:43 +01:00
1e730e77ceMerge branch 'feature/basic-structure' of code.giphouse.nl:giphouse/descartes-2 into feature/basic-structure
Jip J. Dekker
2014-03-28 14:44:29 +01:00
32cedecf2eAdded an basic parser class to extend, next step implementing the global function
Jip J. Dekker
2014-03-28 14:43:22 +01:00
325febe834Added an basic parser class to extend, next step implementing the global function
Jip J. Dekker
2014-03-28 14:43:22 +01:00
d91706d6e5The script should stop sometime, added a stopping signal
Jip J. Dekker
2014-03-28 14:14:39 +01:00
87d1041517Made all Python files PEP-8 Compatible
Jip J. Dekker
2014-03-28 14:11:36 +01:00