RTB
|
217fb3e9cd
|
ChemSpider now uses the token from sources.cfg with checks
|
2014-06-06 16:17:46 +02:00 |
|
RTB
|
df4ba2f784
|
changed __init__ of all sources to have an empty dictionary as default config value
|
2014-06-06 12:48:30 +02:00 |
|
RTB
|
ff3b81b813
|
each source now receives a configuration dictionary
|
2014-06-05 16:30:48 +02:00 |
|
Jip J. Dekker
|
bf1822059f
|
Merge branch 'develop' into feature/PubChem
|
2014-06-04 19:54:56 +02:00 |
|
Jip J. Dekker
|
242e0bf628
|
Code inspection
|
2014-06-04 19:43:33 +02:00 |
|
Jip J. Dekker
|
046fbed3cd
|
Code reformat
|
2014-06-04 19:34:23 +02:00 |
|
Jip J. Dekker
|
eb727bd6c4
|
No two requests shall be the same!
|
2014-06-04 19:12:08 +02:00 |
|
Jip J. Dekker
|
0c9862d836
|
Damn you semicolon!
|
2014-06-04 18:54:29 +02:00 |
|
Jip J. Dekker
|
f128c54312
|
Sources don't need to be mangled
|
2014-06-04 18:34:31 +02:00 |
|
Jip J. Dekker
|
86a00b1572
|
Merge branch 'develop' into feature/PubChem
|
2014-06-04 18:00:27 +02:00 |
|
Jip J. Dekker
|
75c0be1fea
|
Added tests for the pipline
|
2014-06-04 16:50:14 +02:00 |
|
Jip J. Dekker
|
c48c4ec697
|
None pipeline doesn't need a set
|
2014-06-04 16:09:55 +02:00 |
|
Nout van Deijck
|
291547a5ad
|
now returns good results, with property values and corresponding sources
|
2014-06-04 15:44:53 +02:00 |
|
Jip J. Dekker
|
d4a0ffdff3
|
Optimized imports
|
2014-06-04 12:01:05 +02:00 |
|
Nout van Deijck
|
ba8f845178
|
now also (finally) scrapes property values and names, but not yet coupled together and not yet returned.
|
2014-06-02 09:26:36 +02:00 |
|
Jip J. Dekker
|
ecee4a5f45
|
Merge branch 'develop' of github.com:Recondor/Fourmi into develop
|
2014-06-01 20:30:50 +02:00 |
|
Jip J. Dekker
|
aac0a7c79c
|
References to the main Scrapy documentation
|
2014-06-01 20:29:51 +02:00 |
|
Jip J. Dekker
|
f81b1c9500
|
Fixed a typo
|
2014-06-01 20:25:46 +02:00 |
|
Jip J. Dekker
|
f7d0fb4a45
|
Added documentation to the basic Source
|
2014-06-01 20:24:54 +02:00 |
|
Jip J. Dekker
|
c27a875d68
|
Parser/Source consistency
|
2014-06-01 20:18:03 +02:00 |
|
Jip J. Dekker
|
3499946e97
|
Fixed a typo
|
2014-06-01 20:15:15 +02:00 |
|
Jip J. Dekker
|
c4876f029b
|
Added documentation to the FourmiSpider
|
2014-06-01 20:14:47 +02:00 |
|
Jip J. Dekker
|
ace4393a8f
|
Merge branch 'feature/NIST-source' into develop
|
2014-05-23 13:01:06 +02:00 |
|
Jip J. Dekker
|
0e7e4cbe61
|
Merge branch 'develop' of github.com:Recondor/Fourmi into develop
|
2014-05-22 12:17:56 +02:00 |
|
Jip J. Dekker
|
98f91a1aa9
|
Added a pipeline to replace None values with empty strings
|
2014-05-22 12:15:43 +02:00 |
|
Nout van Deijck
|
8083d0c7bc
|
PubChem scrapes synonyms, gets custom url to get data on properties from
|
2014-05-21 16:11:48 +02:00 |
|
Nout van Deijck
|
fb41d772f2
|
Added custom user-agent because otherwise it would block, because not amused by scraper
|
2014-05-21 16:11:02 +02:00 |
|
Nout van Deijck
|
4b377bb9a9
|
PubChem now scrapes its synonyms
|
2014-05-21 15:25:55 +02:00 |
|
Nout van Deijck
|
84f2e3dbea
|
Testing search function PubChem
|
2014-05-21 14:53:51 +02:00 |
|
Rob tB
|
6ce5ff2335
|
replaced name variable with summary variable
|
2014-05-21 10:40:44 +02:00 |
|
Rob tB
|
6cd8edaf22
|
included summary variable in call to transition_table, antoine table and generic table
|
2014-05-21 10:36:42 +02:00 |
|
Rob tB
|
c0af24644b
|
added summary variable in parse()
|
2014-05-21 10:31:19 +02:00 |
|
Rob tB
|
429ffd7422
|
renamed tables to table in parse()
|
2014-05-21 10:28:54 +02:00 |
|
Rob tB
|
95565042ca
|
removed unused variable symbol_table from parse_transition_table
|
2014-05-21 10:22:03 +02:00 |
|
RTB
|
81719a38fb
|
Added comments for the class and functions
|
2014-05-20 19:32:06 +02:00 |
|
RTB
|
472aae86be
|
synonyms are now scraped
|
2014-05-17 19:32:20 +02:00 |
|
RTB
|
b46c7a309d
|
if synonym name matched in search instead of primary name, emit primary name as synonym
|
2014-05-17 14:21:11 +02:00 |
|
RTB
|
afc1106838
|
NIST now logs an error if chemical name is not found
|
2014-05-17 14:11:03 +02:00 |
|
RTB
|
56ee6b1ad3
|
added ignore list
|
2014-05-17 14:09:10 +02:00 |
|
Rob tB
|
98f58ea4e2
|
added scraping for generic info except for synonyms
|
2014-05-15 14:29:28 +02:00 |
|
Rob tB
|
50c79e3b1f
|
conditions in name (split by ' at ') are now moved to condition field for individual value page and aggregate data table
|
2014-05-14 13:44:43 +02:00 |
|
Nout van Deijck
|
f728dff6b0
|
Developing PubChem parser, first draft, not tested nor finished completely
|
2014-05-14 12:01:05 +02:00 |
|
Jip J. Dekker
|
c7ad35239e
|
Merge pull request #4 from Recondor/feature/wikipediafixes
Feature/wikipediafixes
|
2014-05-13 21:37:53 +02:00 |
|
Bas Vb
|
b54568bab0
|
Small fixes
|
2014-05-13 16:18:32 +02:00 |
|
Jip J. Dekker
|
afaa0d903f
|
Merge pull request #3 from Recondor/feature/chemspider-parser-fixes
Feature/chemspider-parser-fixes
|
2014-05-09 23:06:43 +02:00 |
|
RTB
|
7e984b60d8
|
added uncertainty to results from scraping individual data points urls
|
2014-05-09 14:24:08 +02:00 |
|
RTB
|
775a920b9b
|
NIST scraper now handles urls with individual data points
|
2014-05-09 13:00:22 +02:00 |
|
RTB
|
5e067fd572
|
altered scraping of aggregate data to test for and request url to individual data points
|
2014-05-09 12:36:54 +02:00 |
|
Jip J. Dekker
|
f193aac24a
|
Fixed Duplicate Pipeline + rename
|
2014-05-08 15:45:42 +02:00 |
|
Rob tB
|
74dddace88
|
removed logging of Result objects in debug messages because pointless
|
2014-05-08 15:42:53 +02:00 |
|