Skip to content

Commit 8fe314c

Browse files
lboemanwholmgren
authored andcommitted
Add parsing for non 1-minute data to UO SRML parser (#711)
* parse 5minute, 15 minute and hourly srml data * fix akward comment * update note about index shift, update whatsnew * add network decorator to new tests * comment tweak * fix tests for py27-min * more readable interval_length expression * comment date parsing logic * more clarification of shifting logic
1 parent e5bcf68 commit 8fe314c

File tree

3 files changed

+71
-11
lines changed

3 files changed

+71
-11
lines changed

docs/sphinx/source/whatsnew/v0.6.2.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ Bug fixes
5353
:py:func:`~pvlib.irradiance.klucher` and
5454
:py:func:`~pvlib.pvsystem.calcparams_desoto`. (:issue:`698`)
5555
* Fix :py:class:`~pvlib.forecast.NDFD` model by updating variables.
56+
* Fix :py:func:`~pvlib.iotools.srml.format_index` to parse non
57+
one-minute data correctly. (:issue:`709`)
5658

5759

5860
Testing
@@ -69,3 +71,4 @@ Contributors
6971
* Kevin Anderson (:ghuser:`kevinsa5`)
7072
* :ghuser:`bentomlinson`
7173
* Jonathan Gaffiot (:ghuser:`jgaffiot`)
74+
* Leland Boeman (:ghuser: `lboeman`)

pvlib/iotools/srml.py

Lines changed: 44 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -42,10 +42,11 @@ def read_srml(filename):
4242
4343
Notes
4444
-----
45-
The time index is shifted back one minute to account for 2400 hours,
46-
and to avoid time parsing errors on leap years. The returned data
47-
values should be understood to occur during the interval from the
48-
time of the row until the time of the next row. This is consistent
45+
The time index is shifted back by one interval to account for the
46+
daily endtime of 2400, and to avoid time parsing errors on leap
47+
years. The returned data values are labeled by the left endpoint of
48+
interval, and should be understood to occur during the interval from
49+
the time of the row until the time of the next row. This is consistent
4950
with pandas' default labeling behavior.
5051
5152
See SRML's `Archival Files`_ page for more information.
@@ -134,11 +135,27 @@ def format_index(df):
134135
year = int(df.columns[1])
135136
df_doy = df[df.columns[0]]
136137
# Times are expressed as integers from 1-2400, we convert to 0-2359 by
137-
# subracting one and then correcting the minutes at each former hour.
138-
df_time = df[df.columns[1]] - 1
139-
fifty_nines = df_time % 100 == 99
140-
times = df_time.where(~fifty_nines, df_time - 40)
141-
138+
# subracting the length of one interval and then correcting the times
139+
# at each former hour. interval_length is determined by taking the
140+
# difference of the first two rows of the time column.
141+
# e.g. The first two rows of hourly data are 100 and 200
142+
# so interval_length is 100.
143+
interval_length = df[df.columns[1]][1] - df[df.columns[1]][0]
144+
df_time = df[df.columns[1]] - interval_length
145+
if interval_length == 100:
146+
# Hourly files do not require fixing the former hour timestamps.
147+
times = df_time
148+
else:
149+
# Because hours are represented by some multiple of 100, shifting
150+
# results in invalid values.
151+
#
152+
# e.g. 200 (for 02:00) shifted by 15 minutes becomes 185, the
153+
# desired result is 145 (for 01:45)
154+
#
155+
# So we find all times with minutes greater than 60 and remove 40
156+
# to correct to valid times.
157+
old_hours = df_time % 100 > 60
158+
times = df_time.where(~old_hours, df_time - 40)
142159
times = times.apply(lambda x: '{:04.0f}'.format(x))
143160
doy = df_doy.apply(lambda x: '{:03.0f}'.format(x))
144161
dts = pd.to_datetime(str(year) + '-' + doy + '-' + times,
@@ -161,14 +178,30 @@ def read_srml_month_from_solardat(station, year, month, filetype='PO'):
161178
month: int
162179
Month to request data for.
163180
filetype: string
164-
SRML file type to gather. 'RO' and 'PO' are the
165-
only minute resolution files.
181+
SRML file type to gather. See notes for explanation.
166182
167183
Returns
168184
-------
169185
data: pd.DataFrame
170186
One month of data from SRML.
171187
188+
Notes
189+
-----
190+
File types designate the time interval of a file and if it contains
191+
raw or processed data. For instance, `RO` designates raw, one minute
192+
data and `PO` designates processed one minute data. The availability
193+
of file types varies between sites. Below is a table of file types
194+
and their time intervals. See [1] for site information.
195+
196+
============= ============ ==================
197+
time interval raw filetype processed filetype
198+
============= ============ ==================
199+
1 minute RO PO
200+
5 minute RF PF
201+
15 minute RQ PQ
202+
hourly RH PH
203+
============= ============ ==================
204+
172205
References
173206
----------
174207
[1] University of Oregon Solar Radiation Measurement Laboratory

pvlib/test/test_srml.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,3 +73,27 @@ def test_read_srml_month_from_solardat():
7373
file_data = srml.read_srml(url)
7474
requested = srml.read_srml_month_from_solardat('EU', 2018, 1)
7575
assert file_data.equals(requested)
76+
77+
78+
@network
79+
def test_15_minute_dt_index():
80+
data = srml.read_srml_month_from_solardat('TW', 2019, 4, 'RQ')
81+
start = pd.Timestamp('20190401 00:00')
82+
start = start.tz_localize('Etc/GMT+8')
83+
end = pd.Timestamp('20190430 23:45')
84+
end = end.tz_localize('Etc/GMT+8')
85+
assert data.index[0] == start
86+
assert data.index[-1] == end
87+
assert (data.index[3::4].minute == 45).all()
88+
89+
90+
@network
91+
def test_hourly_dt_index():
92+
data = srml.read_srml_month_from_solardat('CD', 1986, 4, 'PH')
93+
start = pd.Timestamp('19860401 00:00')
94+
start = start.tz_localize('Etc/GMT+8')
95+
end = pd.Timestamp('19860430 23:00')
96+
end = end.tz_localize('Etc/GMT+8')
97+
assert data.index[0] == start
98+
assert data.index[-1] == end
99+
assert (data.index.minute == 0).all()

0 commit comments

Comments
 (0)